New Research: Supply Chain Attack on Axios Pulls Malicious Dependency from npm.Details
Socket
Book a DemoSign in
Socket

owa

Package Overview
Dependencies
Maintainers
0
Versions
37
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

owa

Everything you need to build state-of-the-art foundation multimodal desktop agent, end-to-end.

pipPyPI
Version
0.6.5
Maintainers
0
Open World Agents

🚀 Open World Agents

Everything you need to build state-of-the-art foundation multimodal desktop agent, end-to-end.

Documentation License: MIT Python 3.11+ GitHub stars

⚠️ Active Development Notice: This codebase is under active development. APIs and components may change, and some may be moved to separate repositories. Documentation may be incomplete or reference features still in development.

📄 Research Paper: This project was first introduced and developed for the D2E project. For more details, see D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI. If you find this work useful, please cite our paper.

Quick Start

💡 This is a conceptual overview. See the Quick Start Guide for detailed instructions.

# 1. Record desktop interaction
$ ocap my-session.mcap

# 2. Process to training format
$ python scripts/01_raw_to_event.py --train-dir ./

# 3. Train your model (coming soon)
$ python train.py --dataset ./event-dataset

Installation

# For video recording, install GStreamer first. Skip if you only need data processing.
$ conda install open-world-agents::gstreamer-bundle

# Install OWA
$ pip install owa

Documentation

ResourceDescription
🏠 Full DocumentationComplete docs with all guides and references
📖 Quick Start GuideComplete tutorial: Record → Process → Train
🤗 Community DatasetsBrowse and share datasets

Core Components

  • 🌍 Environment Framework: "USB-C of desktop agents" - universal interface for native desktop automation with pre-built plugins for desktop control, high-performance screen capture, and zero-configuration plugin system
  • 📊 Data Infrastructure: Complete desktop agent data pipeline from recording to training with OWAMcap format - a universal standard powered by MCAP
  • 🛠️ CLI Tools: Command-line utilities (owl) for recording, analyzing, and managing agent data
  • 🤖 Examples: Complete implementations and training pipelines for multimodal agents

Contributing

We welcome contributions! See our Contributing Guide.

License

MIT License. See LICENSE.

Citation

@article{choi2025d2e,
  title={D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI},
  author={Choi, Suwhan and Jung, Jaeyoon and Seong, Haebin and Kim, Minchan and Kim, Minyeong and Cho, Yongjun and Kim, Yoonshik and Park, Yubeen and Yu, Youngjae and Lee, Yunsung},
  journal={arXiv preprint arXiv:2510.05684},
  year={2025}
}

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts