ui-tars

Package Overview

Dependencies

Maintainers

Alerts

File Explorer

Advanced tools

License

Install Socket

Detect and block malicious and high-risk dependencies

Install

ui-tars

Parsing LLM-generated GUI action instructions, automatically generating pyautogui scripts, and supporting coordinate conversion and smart image resizing.

PyPI

Version: 0.2.3

Maintainers: 2

ui-tars

A python package for parsing VLM-generated GUI action instructions into executable pyautogui codes.

Introduction

ui-tars is a Python package for parsing VLM-generated GUI action instructions, automatically generating pyautogui scripts, and supporting coordinate conversion and smart image resizing.

Supports multiple VLM output formats (e.g., Qwen-VL, Seed-VL)
Automatically handles coordinate scaling and format conversion
One-click generation of pyautogui automation scripts

Quick Start

Installation

pip install ui-tars
# or
uv pip install ui-tars

Parse output into structured actions

from ui_tars.action_parser import parse_action_to_structure_output, parsing_response_to_pyautogui_code

response = "Thought: Click the button\nAction: click(point='<point>200 300</point>')"
original_image_width, original_image_height = 1920, 1080
parsed_dict = parse_action_to_structure_output(
    response,
    factor=1000,
    origin_resized_height=original_image_height,
    origin_resized_width=original_image_width,
    model_type="doubao"
)
print(parsed_dict)
parsed_pyautogui_code = parsing_response_to_pyautogui_code(
    responses=parsed_dict,
    image_height=original_image_height,
    image_width=original_image_width
)
print(parsed_pyautogui_code)

Generate pyautogui automation script

from ui_tars.action_parser import parsing_response_to_pyautogui_code

pyautogui_code = parsing_response_to_pyautogui_code(parsed_dict, original_image_height, original_image_width)
print(pyautogui_code)

Visualize coordinates on the image (optional)

from PIL import Image, ImageDraw
import numpy as np
import matplotlib.pyplot as plt

image = Image.open("your_image_path.png")
start_box = parsed_dict[0]["action_inputs"]["start_box"]
coordinates = eval(start_box)
x1 = int(coordinates[0] * original_image_width)
y1 = int(coordinates[1] * original_image_height)
draw = ImageDraw.Draw(image)
radius = 5
draw.ellipse((x1 - radius, y1 - radius, x1 + radius, y1 + radius), fill="red", outline="red")
plt.imshow(np.array(image))
plt.axis("off")
plt.show()

API Documentation

parse_action_to_structure_output

def parse_action_to_structure_output(
    text: str,
    factor: int,
    origin_resized_height: int,
    origin_resized_width: int,
    model_type: str = "qwen25vl",
    max_pixels: int = 16384 * 28 * 28,
    min_pixels: int = 100 * 28 * 28
) -> list[dict]:
    ...

Description: Parses output action instructions into structured dictionaries, automatically handling coordinate scaling and box/point format conversion.

Parameters:

text: The output string
factor: Scaling factor
origin_resized_height/origin_resized_width: Original image height/width
model_type: Model type (e.g., "qwen25vl", "doubao")
max_pixels/min_pixels: Image pixel upper/lower limits

Returns: A list of structured actions, each as a dict with fields like action_type, action_inputs, thought, etc.

parsing_response_to_pyautogui_code

def parsing_response_to_pyautogui_code(
    responses: dict | list[dict],
    image_height: int,
    image_width: int,
    input_swap: bool = True
) -> str:
    ...

Description: Converts structured actions into a pyautogui script string, supporting click, type, hotkey, drag, scroll, and more.

Parameters:

responses: Structured actions (dict or list of dicts)
image_height/image_width: Image height/width
input_swap: Whether to use clipboard paste for typing (default True)

Returns: A pyautogui script string, ready for automation execution.

Contribution

Contributions, issues, and suggestions are welcome!

License

Apache-2.0 License

FAQs

What is ui-tars?

Is ui-tars well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

ui-tars

ui-tars

Introduction

Quick Start

Installation

Parse output into structured actions

Generate pyautogui automation script

Visualize coordinates on the image (optional)

API Documentation

parse_action_to_structure_output

parsing_response_to_pyautogui_code

Contribution

License

Related posts

Identifying and Preventing Fraudulent Engineering Candidates: An Investigation into 80 Confirmed Cases

Updated and Ongoing Supply Chain Attack Targets CrowdStrike npm Packages