🚀 Big News:Socket Has Acquired Secure Annex.Learn More →
Socket
Book a DemoSign in
Socket

gemini-multimodal

Package Overview
Dependencies
Maintainers
1
Versions
4
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

gemini-multimodal

Gemini multimodal skill for Claude Code - video, PDF, image analysis & generation via browser cookies

latest
Source
npmnpm
Version
1.0.3
Version published
Weekly downloads
3
-25%
Maintainers
1
Weekly downloads
 
Created
Source

gemini-multimodal

Gemini multimodal skill for Claude Code. Video, PDF, image analysis & generation via browser cookies - no API key required.

Installation

npx gemini-multimodal

This installs the skill to ~/.claude/skills/gemini/ and sets up the Python environment automatically.

Prerequisites:

  • Python 3.8+
  • Chrome logged into gemini.google.com
  • On macOS, allow Keychain access when prompted (first run)

Features

CapabilityDescription
Text QueriesComplex reasoning with "Thinking with 3 Pro" mode
Video AnalysisUpload MP4 files for summarization, timestamps, insights
YouTube AnalysisAnalyze videos via URL (uses YouTube extension)
Document AnalysisPDF and document Q&A
Image AnalysisDescribe, OCR, analyze uploaded images
Image GenerationCreate images from text prompts
Image EditingModify images with natural language
Google SearchAutomatic grounding for current information

How It Works

User Request → webapi CLI → gemini-webapi → Gemini Web (cookies) → Response

Authentication uses Chrome browser cookies - no API key needed. Just be logged into gemini.google.com.

Usage

Text Queries

# Complex reasoning (Thinking with 3 Pro)
webapi "Explain the implications of quantum computing for cryptography"

# Show thinking process
webapi "Solve step by step: What is 15% of 240?" --show-thoughts

File Analysis

# Video analysis
webapi "Summarize this video with timestamps" --file meeting.mp4

# Document analysis
webapi "Extract key findings" --file research.pdf

# Image analysis
webapi "What's in this image?" --file photo.png

YouTube Analysis

webapi "What are the main points discussed?" --youtube "https://youtube.com/watch?v=VIDEO_ID"

Requires YouTube extension enabled in gemini.google.com settings.

Image Generation

# Generate image
webapi "A cyberpunk cityscape at night" --generate-image city.png

# With aspect ratio
webapi "Mountain landscape" --generate-image landscape.png --aspect 16:9

# Edit existing image
webapi "Make the sky purple" --edit photo.jpg --output edited.png

Current Information (Grounded)

webapi "What are the latest AI news this week? Search the web."

Google Search grounding is automatic when queries need current information.

CLI Options

OptionDescription
--file, -f FILEInput file (MP4, PDF, PNG, JPG, etc.)
--youtube URLYouTube video URL
--generate-image FILEGenerate and save image
--edit IMAGEEdit image (with --output)
--output, -o FILEOutput path for images
--aspect RATIOAspect ratio (16:9, 1:1, 4:3, 3:4)
--show-thoughtsDisplay thinking process
--model MODELModel to use (default: gemini-3.0-pro)
--jsonJSON output
--help, -hShow help

File Structure

gemini/
├── SKILL.md           # Claude Code skill definition
├── README.md          # This file
├── requirements.txt   # Python dependencies
├── .venv/             # Virtual environment
├── webapi             # Bash wrapper
└── webapi.py          # Python implementation

Troubleshooting

"Error initializing client"

  • Log into gemini.google.com in Chrome
  • On macOS, allow Keychain access when prompted

"No images generated"

  • Rephrase prompt, some content is filtered
  • Be more explicit about what you want

"Module not found"

  • Activate venv: source .venv/bin/activate
  • Install deps: pip install -r requirements.txt

"YouTube not working"

  • Enable YouTube extension in gemini.google.com settings

License

MIT

Keywords

claude

FAQs

Package last updated on 14 Dec 2025

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts