You're Invited:Meet the Socket Team at BlackHat and DEF CON in Las Vegas, Aug 4-6.RSVP
Socket
Book a DemoInstallSign in
Socket

spatial-reasoning

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

spatial-reasoning

A PyPI package for object detection using advanced vision models

0.1.9
pipPyPI
Maintainers
1

Spatial Reasoning

A powerful Python package for object detection using advanced vision and reasoning models, including OpenAI's models and Google's Gemini.

Example Results Comparison of detection results across different models - showing the superior performance of the advanced reasoning model

Features

  • Multiple Detection Models:

    • Advanced Reasoning Model (OpenAI) - Reasoning model that leverages tools and other foundation models to perform object detection
    • Vanilla Reasoning Model - Directly using a reasoning model to perform object detection
    • Vision Model - GroundingDino + SAM
    • Gemini Model (Google) - Fine-tuned LMM for object detection
  • Tool-Use Reasoning: Our advanced model uses innovative grid-based reasoning for precise object detection

    Internal Workings How the advanced reasoning model works under the hood - using grid cells for precise localization

  • Simple API: One function for all your detection needs

  • CLI Support: Command-line interface for quick testing

Installation

pip install spatial-reasoning

Or install from source:

git clone https://github.com/QasimWani/spatial-reasoning.git
cd spatial_reasoning
pip install -e .

Optional: Flash Attention (for better performance)

For improved performance with transformer models, you can optionally install Flash Attention:

pip install flash-attn --no-build-isolation

Note: Flash Attention requires CUDA development tools and must be compiled for your specific PyTorch/CUDA version. The package will work without it, just with slightly reduced performance.

Setup

Create a .env file in your project root:

# .env
OPENAI_API_KEY=your-openai-api-key-here
GEMINI_API_KEY=your-google-gemini-api-key-here

Get your API keys:

  • OpenAI: https://platform.openai.com/api-keys
  • Gemini: https://makersuite.google.com/app/apikey

Quick Start

Python API

from spatial_reasoning import detect

# Detect objects in an image
result = detect(
    image_path="https://ix-cdn.b2e5.com/images/27094/27094_3063d356a3a54cc3859537fd23c5ba9d_1539205710.jpeg",  # or image-path
    object_of_interest="farthest scooter in the image",
    task_type="advanced_reasoning_model"
)

# Access results
bboxes = result['bboxs']
visualized_image = result['visualized_image']
print(f"Found {len(bboxes)} objects")

# Save the result
visualized_image.save("output.jpg")

Command Line

# Basic usage
spatial-reasoning --image-path "image.jpg" --object-of-interest "person"  # "advanced_reasoning_model" used by default

# With specific model
spatial-reasoning --image-path "image.jpg" --object-of-interest "cat" --task-type "gemini"

# From URL with custom parameters
vision-evals \
  --image-path "https://example.com/image.jpg" \
  --object-of-interest "text in image" \
  --task-type "advanced_reasoning_model" \
  --task-kwargs '{"nms_threshold": 0.7}'

Available Models

  • advanced_reasoning_model (default) - Best accuracy, uses tool-use reasoning
  • vanilla_reasoning_model - Faster, standard detection
  • vision_model - Uses GroundingDino + (optional) SAM2 for segmentation
  • gemini - Google's Gemini model

License

MIT License

Keywords

computer vision

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

About

Packages

Stay in touch

Get open source security insights delivered straight into your inbox.

  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc

U.S. Patent No. 12,346,443 & 12,314,394. Other pending.