
Research
Malicious fezbox npm Package Steals Browser Passwords from Cookies via Innovative QR Code Steganographic Technique
A malicious package uses a QR code as steganography in an innovative technique.
Parsing LLM-generated GUI action instructions, automatically generating pyautogui scripts, and supporting coordinate conversion and smart image resizing.
A python package for parsing VLM-generated GUI action instructions into executable pyautogui codes.
ui-tars
is a Python package for parsing VLM-generated GUI action instructions, automatically generating pyautogui scripts, and supporting coordinate conversion and smart image resizing.
pip install ui-tars
# or
uv pip install ui-tars
from ui_tars.action_parser import parse_action_to_structure_output, parsing_response_to_pyautogui_code
response = "Thought: Click the button\nAction: click(point='<point>200 300</point>')"
original_image_width, original_image_height = 1920, 1080
parsed_dict = parse_action_to_structure_output(
response,
factor=1000,
origin_resized_height=original_image_height,
origin_resized_width=original_image_width,
model_type="doubao"
)
print(parsed_dict)
parsed_pyautogui_code = parsing_response_to_pyautogui_code(
responses=parsed_dict,
image_height=original_image_height,
image_width=original_image_width
)
print(parsed_pyautogui_code)
from ui_tars.action_parser import parsing_response_to_pyautogui_code
pyautogui_code = parsing_response_to_pyautogui_code(parsed_dict, original_image_height, original_image_width)
print(pyautogui_code)
from PIL import Image, ImageDraw
import numpy as np
import matplotlib.pyplot as plt
image = Image.open("your_image_path.png")
start_box = parsed_dict[0]["action_inputs"]["start_box"]
coordinates = eval(start_box)
x1 = int(coordinates[0] * original_image_width)
y1 = int(coordinates[1] * original_image_height)
draw = ImageDraw.Draw(image)
radius = 5
draw.ellipse((x1 - radius, y1 - radius, x1 + radius, y1 + radius), fill="red", outline="red")
plt.imshow(np.array(image))
plt.axis("off")
plt.show()
def parse_action_to_structure_output(
text: str,
factor: int,
origin_resized_height: int,
origin_resized_width: int,
model_type: str = "qwen25vl",
max_pixels: int = 16384 * 28 * 28,
min_pixels: int = 100 * 28 * 28
) -> list[dict]:
...
Description: Parses output action instructions into structured dictionaries, automatically handling coordinate scaling and box/point format conversion.
Parameters:
text
: The output stringfactor
: Scaling factororigin_resized_height
/origin_resized_width
: Original image height/widthmodel_type
: Model type (e.g., "qwen25vl", "doubao")max_pixels
/min_pixels
: Image pixel upper/lower limitsReturns:
A list of structured actions, each as a dict with fields like action_type
, action_inputs
, thought
, etc.
def parsing_response_to_pyautogui_code(
responses: dict | list[dict],
image_height: int,
image_width: int,
input_swap: bool = True
) -> str:
...
Description: Converts structured actions into a pyautogui script string, supporting click, type, hotkey, drag, scroll, and more.
Parameters:
responses
: Structured actions (dict or list of dicts)image_height
/image_width
: Image height/widthinput_swap
: Whether to use clipboard paste for typing (default True)Returns: A pyautogui script string, ready for automation execution.
Contributions, issues, and suggestions are welcome!
Apache-2.0 License
FAQs
Parsing LLM-generated GUI action instructions, automatically generating pyautogui scripts, and supporting coordinate conversion and smart image resizing.
We found that ui-tars demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 2 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
A malicious package uses a QR code as steganography in an innovative technique.
Research
/Security News
Socket identified 80 fake candidates targeting engineering roles, including suspected North Korean operators, exposing the new reality of hiring as a security function.
Application Security
/Research
/Security News
Socket detected multiple compromised CrowdStrike npm packages, continuing the "Shai-Hulud" supply chain attack that has now impacted nearly 500 packages.