You're Invited:Meet the Socket Team at BlackHat and DEF CON in Las Vegas, Aug 4-6.RSVP →

Book a Demo Install Sign in

vlm-robot-agent

Package Overview

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

vlm-robot-agent

Agente robótico basado en VLM que navega e interactúa con personas según objetivos visuales

0.3.2

PyPI

Maintainers: 1

🤖 `vlm_robot_agent`

Author: Edison Bejarano

Library designed to use VLMs in the context of robotics actions for planning and interactions

Un agente robótico inteligente basado en modelos de lenguaje visual (VLM), que puede percibir el entorno desde una imagen, planificar acciones y decidir entre navegar o interactuar con personas para alcanzar un objetivo dado (como entrar en una habitación, buscar un baño, etc).

🚀 Características

📷 Percepción visual usando un VLM de OpenAI
🧠 Razonamiento basado en objetivos (macro y micro-goals)
🧭 Acciones de navegación:
- forward, left, right, forward_left, forward_right
🙋 Acciones de interacción:
- Conversar con una persona que bloquea el paso
- Hacer gestos para pedir que se mueva
💾 Memoria de interacciones y lectura/ejecución de prompts desde un folder

📦 Instalación

pip install vlm_robot_agent

🛠 Uso básico

from vlm_robot_agent import VLMRobotAgent

agent = VLMRobotAgent(prompt_folder="./prompts")

image = obtener_imagen_de_tu_robot()
goal = "entrar a la oficina 3"

# Loop de ejecución
while True:
    action = agent.step(image, goal)
    ejecutar_action_en_robot(action)
    if objetivo_cumplido():
        break

📁 Estructura de prompts

Los prompts se almacenan como archivos .json dentro del folder configurado, y puedes cargarlos con:

prompts = agent.load_prompts()

🧩 Integración con robots

Puede usarse en sistemas ROS, simuladores como Gazebo, o cualquier entorno de robots.
El agente necesita:
- Imagen actual del entorno (image)
- Objetivo a cumplir (goal)
- Una función que ejecute la acción devuelta (Navigate, Interact)

📚 Ejemplo de acciones

from vlm_robot_agent import Navigate, Interact

# Navegar hacia adelante
Navigate(direction="forward")

# Pedir a una persona que se mueva
Interact(strategy="ask_to_move")

📄 Licencia

MIT

🧠 Futuras mejoras

Seguimiento de progreso con StateTracker
Manejo de múltiples agentes o flujos conversacionales
Soporte para entrada multimodal (texto + imagen)

Edison Bejarano

Keywords

FAQs

What is vlm-robot-agent?

Is vlm-robot-agent well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

vlm-robot-agent

Author: Edison Bejarano

🚀 Características

📦 Instalación

🛠 Uso básico

📁 Estructura de prompts

🧩 Integración con robots

📚 Ejemplo de acciones

📄 Licencia

🧠 Futuras mejoras

Keywords

Related posts

Introducing License Overlays: Smarter License Management for Real-World Code

Introducing Rust Support in Socket