🚀 Big News:Socket Has Acquired Secure Annex.Learn More
Socket
Book a DemoSign in
Socket

@genspark/cli

Package Overview
Dependencies
Maintainers
2
Versions
14
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

@genspark/cli

CLI tool for Genspark Tool API - search, crawl, analyze images, generate media

latest
npmnpm
Version
1.0.15
Version published
Weekly downloads
19K
-27.58%
Maintainers
2
Weekly downloads
 
Created
Source

Genspark CLI (gsk)

One CLI. Every AI capability. Search, generate, analyze, communicate — all from your terminal.

gsk is the command-line interface for the Genspark AI platform. It unifies 90+ AI tools behind a single binary: web search, image/video/audio generation with 40+ models, document analysis, media transcription, cloud file management, email (Gmail & Outlook), calendar, GitHub, Slack, Notion, Microsoft Teams, OneDrive, SharePoint, AI phone calls, stock data, social media data (Twitter, Instagram, Reddit), and autonomous AI agents — all with clean JSON output for seamless integration with AI coding assistants, automation pipelines, and scripts.

Capability Map

CategoryWhat You Get
🔍 SearchWeb search, image search
📄 DocumentsCrawl pages, summarize PDFs/docs
🎨 Images16 models: GPT Image, Gemini, Flux 2, Imagen 4, Recraft, Ideogram, Seedream ...
🎬 Videos14 models: Kling V3, Veo 3.1, Sora 2, Hailuo, Wan, Runway, PixVerse, Seedance ...
🎵 Audio14 models: Gemini TTS, ElevenLabs, MiniMax, Mureka, CassetteAI, Lyria 2 ...
🧠 AnalysisImage/video/audio understanding, OCR, video style replication
📝 TranscribeWhisper, Gemini, ElevenLabs Scribe
☁️ AI DriveCloud file storage, download, compress
📧 EmailGmail & Outlook: read, search, send, reply, forward, archive, labels, attachments
📅 CalendarGoogle & Outlook: list, create, delete events
💬 CollaborationSlack, Microsoft Teams, Notion — send messages, search, manage channels/pages
📂 Cloud StorageGoogle Drive, OneDrive, SharePoint, Google Sheets, Google Docs, Google Contacts
🐙 GitHubList repos, search/create/update issues
📞 PhoneAI-powered phone calls to businesses
📈 StocksReal-time stock prices
📱 Social MediaTwitter/X, Instagram, Reddit — search posts/users, get comments, connections, and more (30 APIs)
🤖 AgentsPodcasts, docs, slides, deep research, fact-checking, websites, batch media generation
🔊 VoiceVoice cloning, voice changer

Table of Contents

Installation

npm install -g @genspark/cli

Requires Node.js >= 18.

Quick Start

# Log in via browser
gsk login

# Search the web
gsk search "latest AI news"

# Generate an image
gsk img "A beautiful sunset over mountains" -o ./sunset.png

# Crawl a webpage
gsk crawl "https://example.com/article"

Authentication

Log in with your Genspark account:

gsk login

This opens a browser for authentication and saves the API key to ~/.genspark-tool-cli/config.json.

Alternatively, provide an API key directly:

# Via environment variable
export GSK_API_KEY="gsk_..."

# Via CLI option
gsk search "query" --api-key "gsk_..."

To check your current identity:

gsk login-info
gsk me          # shorthand

To log out:

gsk logout

Configuration

Config is loaded from three sources (highest priority first):

  • CLI options--api-key, --base-url, etc.
  • Environment variablesGSK_API_KEY, GSK_BASE_URL, GSK_PROJECT_ID
  • Config file~/.genspark-tool-cli/config.json
{
  "api_key": "gsk_...",
  "base_url": "https://www.genspark.ai",
  "project_id": "project_abc123",
  "debug": false,
  "timeout": 300000
}

Global Options

OptionEnv VarDefaultDescription
--api-key <key>GSK_API_KEYAPI key (required)
--base-url <url>GSK_BASE_URLhttps://www.genspark.aiAPI base URL
--project-id <id>GSK_PROJECT_IDProject ID for access control
--debugfalseEnable debug output
--timeout <ms>300000 (5 min)Request timeout
--output <format>jsonOutput format: json or text
--refreshForce refresh cached tool schemas

Commands

list-tools (alias: ls)

List all available tools.

gsk list-tools
gsk ls

login-info (alias: me)

Show your current account info — email, name, and membership plan.

gsk login-info
gsk me

init-opencode

Generate an .opencode.json config file for OpenCode, pre-configured to use Genspark's LLM proxy with your API key.

# Generate with default model (claude-opus-4-6-1m)
gsk init-opencode

# Specify a different default model
gsk init-opencode --model claude-sonnet-4-6

# Write to a custom path
gsk init-opencode -o ./my-project/.opencode.json
OptionDefaultDescription
--model <name>claude-opus-4-6-1mDefault model for OpenCode
-o, --out <path>.opencode.json (cwd)Output file path

init-skills

Sync GSK skill documents into the current project for AI agent discovery. Copies all skill docs and generates a CONTEXT.md entry point that AI agents (Claude Code, Gemini, etc.) can load automatically.

# Copy skills to .gsk/skills/ and generate CONTEXT.md
gsk init-skills

# Also generate .claude/ config for Claude Code
gsk init-skills --agent claude

# Generate config for all supported agents (Claude, Gemini)
gsk init-skills --agent all

# Custom output directory
gsk init-skills -o ./docs/gsk-skills
OptionDefaultDescription
-o, --out <dir>.gsk/skills (cwd)Output directory for skills
--agent <type>Generate agent config: claude, gemini, or all

Search & Crawl

Search the web.

gsk search "latest AI news"

crawler (alias: crawl)

Extract content from a web page or document.

gsk crawl "https://example.com/article"

summarize_large_document (alias: summarize)

Analyze a document and answer questions about it.

gsk summarize "https://example.com/report.pdf" --question "What are the key findings?"
OptionDescription
<url>Document URL (required, positional)
--question <text>Question about the document

Search for images on the web.

gsk img-search "modern architecture"

Media Analysis & Transcription

understand_images (alias: analyze)

Analyze images with AI vision model.

gsk analyze "Describe this image" -i "https://example.com/image.jpg"
gsk analyze "Extract all text" -i "https://img1.jpg" "https://img2.jpg"
gsk analyze "What's in this photo?" -i ./photo.jpg
OptionDefaultDescription
-i, --image_urls <url...>Image URL(s) or local file path(s) to analyze (required)
-r, --instruction <text>Custom analysis instruction

Image Generation

image_generation (alias: img)

Generate images using AI. Supports text-to-image and image-to-image.

# Text-to-image
gsk img "A beautiful sunset over mountains" -r "16:9" -o ./sunset.png
gsk img "Modern office at night" -s "4k" -r "1:1"

# Image-to-image (reference-based)
gsk img "A portrait in similar style" -i ./reference.png
OptionDefaultDescription
-r, --aspect_ratio <ratio>1:1Aspect ratio (1:1, 16:9, 9:16)
-s, --image_size <size>autoImage size: auto, 2k, 4k
-m, --model <name>Model to use (optional)
-i, --image_urls <url...>Reference image URL(s) or local file(s) for image-to-image
-o, --output-file <path>Download the generated file to a local path

Video Generation

video_generation (alias: video)

Generate videos using AI.

gsk video "A cat playing with yarn" -m "kling/v1.6/standard" -d 5 -o ./cat.mp4
gsk video "Sunrise over a beach" -m "minimax/hailuo-02/standard" -r "16:9" -d 8

# Image-to-video
gsk video "Camera pan around the subject" -m "kling/v1.6/standard" -i ./photo.jpg
OptionDefaultDescription
-m, --model <name>Model (required). e.g., kling/v1.6/standard, minimax/hailuo-02/standard
-r, --aspect_ratio <ratio>16:9Aspect ratio
-d, --duration <sec>5Duration in seconds (2-15)
-i, --image_urls <url...>Reference image URL(s) or local file(s)
-a, --audio_url <url>Audio URL for soundtrack
-o, --output-file <path>Download the generated file to a local path

Audio Generation

audio_generation (alias: audio)

Generate audio: TTS, music, or sound effects.

# Text-to-speech
gsk audio "Hello, welcome to Genspark!" -m "google/gemini-2.5-pro-preview-tts" -r "professional female voice"
gsk audio "Hello, welcome to Genspark!" -m "google/gemini-2.5-pro-preview-tts" -o ./hello.mp3

# Music with lyrics
gsk audio "A pop song" -m "fal-ai/minimax/speech-2.6-hd" -l "Verse 1: ..." -d 120

# Sound effect
gsk audio "Door creaking slowly open" -m "sfx-model"
OptionDefaultDescription
-m, --model <name>Model (required). e.g., elevenlabs/v3-tts, fal-ai/minimax/speech-2.6-hd
-d, --duration <sec>0 (auto)Duration in seconds
-r, --requirements <text>Voice requirements for TTS
-l, --lyrics <text>Lyrics for song generation
-o, --output-file <path>Download the generated file to a local path

File Transfer

upload

Upload a local file and get a URL for use in other commands.

gsk upload "./image.png"
gsk upload "./document.pdf"

download

Download a file from a file wrapper URL.

# Get download URL only
gsk download "/api/files/s/abc123"

# Download and save to local file
gsk download "/api/files/s/abc123" -s "./downloaded.png"
OptionDescription
-s, --save <path>Download and save to local file path

analyze_media (alias: media-analyze)

Analyze various types of media content including images, audio, and video.

gsk media-analyze -i "https://example.com/image.jpg" -r "Describe the content"
gsk media-analyze -i "https://example.com/video.mp4" -r "Summarize the video"
OptionDefaultDescription
-i, --media_urls <urls>Media URL(s) to analyze (required)
-r, --requirements <text>Analysis instructions

audio_transcribe (alias: transcribe)

Transcribe audio files to text.

gsk transcribe -i "https://example.com/audio.mp3"
gsk transcribe -i ./meeting.wav -m "whisper-large-v3"
OptionDefaultDescription
-i, --audio_urls <url...>Audio URL(s) or local file(s) to transcribe (required)
-m, --model <name>Transcription model to use

AI Drive (Cloud Storage)

aidrive (alias: drive)

AI-Drive file storage and management. List, create, delete, move files and directories. Download videos, audio, and files from URLs directly to AI-Drive.

# List files in root directory
gsk drive ls
gsk drive ls -p "/documents" -f file

# Create directory
gsk drive mkdir -p "/my-folder"

# Move file
gsk drive move -p "/old-path/file.txt" --target_path "/new-path/file.txt"

# Download video/audio/file to AI-Drive
gsk drive download_video --video_url "https://example.com/video.mp4" --target_folder "/videos"
gsk drive download_file --file_url "https://example.com/doc.pdf" --target_folder "/docs"

# Upload inline text content to AI-Drive
gsk drive upload --file_content "Hello World" --upload_path "/notes/hello.txt"

# Upload a local file directly to AI-Drive (streaming, supports 100MB+ files)
gsk drive upload --local_file ./report.pdf --upload_path /docs/report.pdf
gsk drive upload --local_file ./video.mp4 --upload_path /videos/demo.mp4
gsk drive upload --local_file ./photo.png              # upload_path defaults to /photo.png
gsk drive upload --local_file ./doc.pdf --upload_path /docs/doc.pdf --override  # overwrite existing

# Get readable URL for a file
gsk drive get_readable_url -p "/documents/report.pdf"

OptionDefaultDescription
-p, --path <path>File or directory path in AI-Drive
-f, --filter_type <type>allFilter: all, file, directory
--file_type <type>allFile type filter: all, audio, video, image
--target_path <path>Target path for move operations
--target_folder <path>Target folder for downloads
--video_url <url>Video URL for download_video action
--audio_url <url>Audio URL for download_audio action
--file_url <url>File URL for download_file action
--file_name <name>Custom file name for downloads
--file_content <text>Inline text content to upload
--local_file <path>Local file path to upload directly to AI-Drive (streaming, no size limit)
--upload_path <path>Destination path for upload (defaults to /<filename> for --local_file)
--overridefalseOverwrite an existing file at the destination path

AI Agents & Tasks

create_task (alias: task)

Create and execute tasks using specialized AI agents.

# Create a podcast
gsk task podcasts --task_name "AI Trends" --query "Create a podcast about AI trends" --instructions "Focus on practical applications"

# Create a document
gsk task docs --task_name "Quantum Report" --query "Write a report on quantum computing" --instructions "Include recent breakthroughs"

# Create slides
gsk task slides --task_name "Q4 Results" --query "Create a Q4 results presentation" --instructions "Use charts and data"

# Create a spreadsheet (returns file wrapper URL, use `gsk download` to save)
gsk task sheets --task_name "Sales Report" --query "Create a quarterly sales report with formulas" --instructions "Use formulas and formatting"

# Deep research
gsk task deep_research --task_name "Fusion Energy" --query "Research fusion energy advances" --instructions "Cover public and private sector"

# Fact-check a claim
gsk task cross_check --task_name "Earth shape" --query "The Earth is flat" --instructions "Verify this claim with evidence"
OptionDefaultDescription
--task_name <name>Name for the task (required)
--query <text>Query describing what to create (required)
--instructions <text>Detailed instructions (required)
--acpfalseStart as ACP (Agent Client Protocol) stdio agent for multi-turn use with Genspark Claw

Supported task types: super_agent, podcasts, docs, slides, sheets, deep_research, website, video_generation, audio_generation, meeting_notes, cross_check

ACP Mode

Use --acp to start a task agent as an Agent Client Protocol stdio server. This enables AI agent platforms like Genspark Claw to natively discover and interact with GSK agents, with multi-turn conversation support.

# Start an ACP agent for slides (used by acpx, not typically run manually)
gsk task slides --acp

# Start an ACP agent for documents
gsk task docs --acp

acpx configuration (~/.acpx/config.json):

{
  "agents": {
    "gsk-slides": { "command": "gsk task slides --acp" },
    "gsk-docs":   { "command": "gsk task docs --acp" },
    "gsk-sheets": { "command": "gsk task sheets --acp" }
  }
}

Then in Genspark Claw: /acp spawn gsk-slides to create and iterate on presentations via natural language.

Stock Prices

stock_price (alias: stock)

Retrieve stock price information and financial data.

gsk stock AAPL
gsk stock MSFT

Service-Level Tools

External service integrations are available as service-level tools — each service is a single command with an action parameter that dispatches to the underlying operation.

Requirements: Connect services in Genspark Account Settings → Integrations.

gmail

Gmail operations: search, read, send, reply, forward, delete, archive, move, mark_as_read, add_label, remove_label, create_label, get_attachment, list_send_as.

gsk gmail search --query "from:boss subject:report"
gsk gmail read --id 19cbfecd7fb14d46
gsk gmail send --to user@example.com --subject "Hello" --body "<p>Hi!</p>"
gsk gmail forward --message_id 19cbfecd7fb14d46 --to colleague@example.com
gsk gmail archive --message_id 19cbfecd7fb14d46

outlook_email

Outlook Email operations: search, read, send, reply, reply_draft, forward, delete, archive, move, mark_as_read, add_category, remove_category, get_attachment, group_list, group_search, group_read, group_reply.

gsk outlook_email search --queryString "quarterly report"
gsk outlook_email read --messageId AAMkAG...
gsk outlook_email send --to user@example.com --subject "Update" --body "Hi!"

google_calendar

Google Calendar operations: list, create, delete.

gsk google_calendar list
gsk google_calendar create --summary "Team Sync" --start_time "2026-04-20T10:00:00Z" --end_time "2026-04-20T11:00:00Z"

outlook_calendar

Outlook Calendar operations: list, create, delete.

gsk outlook_calendar list

meeting

Meeting notes operations: list, search, get.

gsk meeting list
gsk meeting search --keyword "quarterly planning"
gsk meeting get --task_id "e02fd0f1-..."

google_drive

Google Drive file operations: search, read, upload.

gsk google_drive search --query "budget 2026"
gsk google_drive read --file_id 1hq9kH63sc...

google_sheets

Google Sheets operations: create, read, write, append, search, export.

gsk google_sheets search --query "sales report"
gsk google_sheets read --spreadsheet_id 1ABC... --range "Sheet1!A1:D10"

google_docs

Google Docs operations: create, read, append, search.

gsk google_docs search --query "meeting notes"
gsk google_docs read --document_id 1ABC...

google_contacts

Google Contacts operations: search, get, create, update.

gsk google_contacts search --query "John"

github

GitHub operations: list_repos, search_issues, create_issue, update_issue.

gsk github list_repos
gsk github search_issues --q "repo:owner/repo is:open label:bug"
gsk github create_issue --owner myorg --repo myrepo --title "Bug report" --body "Description..."

slack

Slack messaging operations: send, search, lookup.

gsk slack search --query "deployment update"
gsk slack lookup --lookup_type channels --search_query "engineering"
gsk slack send --recipient "#general" --message "Hello team!"

notion

Notion page operations: search, read, create.

gsk notion search --query "project roadmap"
gsk notion read --page_id 2ce8b6a5-...

microsoft_teams

Microsoft Teams operations: send, list_channels, list_chats, list_teams, search, search_users, create_chat.

gsk microsoft_teams list_teams
gsk microsoft_teams list_channels --team_id 6c0db3a9-...
gsk microsoft_teams search --query "release notes"

onedrive

OneDrive file operations: list, search, read.

gsk onedrive search --query "presentation"
gsk onedrive list --folder_path "/Documents"

sharepoint

SharePoint operations: list, search, read_content, read_file.

gsk sharepoint search --query "company wiki"
gsk sharepoint list --site_id abc123

outlook_contacts

Outlook Contacts operations: search.

gsk outlook_contacts search --query "John"

AI Phone Calls

phone-call (alias: call-for-me)

Make an AI phone call on your behalf. The AI validates prerequisites, resolves contact info, and initiates the call.

# Call a business by phone number
gsk phone-call "Pizza Hut" -c "+1-555-123-4567" -p "Check if they deliver to my area"

# Call a business by Google Maps place_id
gsk phone-call "Joe's Pizza" -c "ChIJxxxxxxxx" --is_place_id -p "Reserve a table for 4"

# Dry run: validate and resolve contact info without initiating the call
gsk phone-call "Pizza Hut" -c "+1-555-123-4567" -p "Check hours" --dry-run
OptionDefaultDescription
<recipient>Name of the person or business to call (required, positional)
-c, --contact_info <value>Phone number or Google Maps place_id (required)
--is_place_idfalseTreat contact_info as a Google Maps place_id
-p, --purpose <value>Purpose of the call (required)
--dry-runOnly validate and resolve contact info, do not initiate the call

Social Media

Retrieve data from Twitter/X, Instagram, and Reddit. All social commands are grouped under gsk social.

social twitter

Search and retrieve data from Twitter/X. 12 actions available.

# Search tweets
gsk social twitter search_posts -q "artificial intelligence" --start_date 2026-03-01 --language en

# Search users
gsk social twitter search_users -q "openai" --limit 5

# Get tweets by a specific author
gsk social twitter get_posts_by_author -q "elonmusk" --start_date 2026-01-01

# Get tweets by IDs
gsk social twitter get_posts_by_ids --post_ids "123456789,987654321"

# Get user profile
gsk social twitter get_user -q "elonmusk"

# Get followers or following
gsk social twitter get_user_connections -q "elonmusk" --connection_type followers

# Get users by keywords (mentioned in tweets)
gsk social twitter get_users_by_keywords -q "machine learning" --start_date 2026-01-01

# Get comments on a tweet
gsk social twitter get_comments -p "123456789" --start_date 2026-03-01

# Get quotes of a tweet
gsk social twitter get_quotes -p "123456789"

# Get retweets of a tweet
gsk social twitter get_retweets -p "123456789"

# Get users who interacted with a tweet
gsk social twitter get_post_interacting_users -p "123456789" --interaction_type retweeters

# Count posts matching a query
gsk social twitter count_posts -q "AI" --start_date 2026-03-01 --end_date 2026-03-10
OptionDefaultDescription
<action>Action to perform (required, positional)
-q, --query <text>Search query, username, or identifier
-p, --post_id <id>Tweet/post ID
--post_ids <ids>Comma-separated tweet IDs
--connection_type <type>followersfollowers or following
--interaction_type <type>retweeterscommenters, quoters, or retweeters
--start_date <YYYY-MM-DD>Start date filter
--end_date <YYYY-MM-DD>End date filter
--language <code>Language filter (e.g., en, zh)
--limit <n>Max number of results

Actions: search_posts, search_users, get_posts_by_author, get_posts_by_ids, get_user, get_user_connections, get_users_by_keywords, get_comments, get_quotes, get_retweets, get_post_interacting_users, count_posts

social instagram

Search and retrieve data from Instagram. 9 actions available.

# Search posts
gsk social instagram search_posts -q "travel photography" --start_date 2026-01-01

# Search users
gsk social instagram search_users -q "natgeo" --limit 5

# Get posts by a specific user
gsk social instagram get_posts_by_user -q "natgeo" --start_date 2026-03-01

# Get posts by IDs
gsk social instagram get_posts_by_ids --post_ids "abc123,def456"

# Get user profile
gsk social instagram get_user -q "natgeo"

# Get followers or following
gsk social instagram get_user_connections -q "natgeo" --connection_type following

# Get users by keywords
gsk social instagram get_users_by_keywords -q "landscape photographer"

# Get comments on a post
gsk social instagram get_comments -p "abc123" --start_date 2026-03-01

# Get users who liked or commented on a post
gsk social instagram get_post_interacting_users -p "abc123" --interaction_type likers
OptionDefaultDescription
<action>Action to perform (required, positional)
-q, --query <text>Search query, username, or identifier
-p, --post_id <id>Post ID
--post_ids <ids>Comma-separated post IDs
--connection_type <type>followersfollowers or following
--interaction_type <type>likerslikers or commenters
--start_date <YYYY-MM-DD>Start date filter
--end_date <YYYY-MM-DD>End date filter
--limit <n>Max number of results

Actions: search_posts, search_users, get_posts_by_user, get_posts_by_ids, get_user, get_user_connections, get_users_by_keywords, get_comments, get_post_interacting_users

social reddit

Search and retrieve data from Reddit. 9 actions available.

# Search posts (with sort and time filters)
gsk social reddit search_posts -q "rust programming" --sort top --time week -s "programming"

# Search comments
gsk social reddit search_comments -q "async await" -s "rust"

# Search users
gsk social reddit search_users -q "spez" --limit 5

# Search subreddits
gsk social reddit search_subreddits -q "machine learning" --limit 10

# Get a post with its comments
gsk social reddit get_post_with_comments -p "1abc2de"

# Get subreddit info with recent posts
gsk social reddit get_subreddit_with_posts -q "programming"

# Get subreddits by keywords
gsk social reddit get_subreddits_by_keywords -q "artificial intelligence"

# Get user profile
gsk social reddit get_user -q "spez"

# Get users by keywords (active in discussions)
gsk social reddit get_users_by_keywords -q "neural networks" -s "MachineLearning"
OptionDefaultDescription
<action>Action to perform (required, positional)
-q, --query <text>Search query, username, or subreddit name
-p, --post_id <id>Post ID
-s, --subreddit <name>Subreddit name filter
--sort <order>Sort: relevance, hot, top, new, comments
--time <range>Time filter: hour, day, week, month, year, all
--start_date <YYYY-MM-DD>Start date filter
--end_date <YYYY-MM-DD>End date filter
--limit <n>Max number of results

Actions: search_posts, search_comments, search_users, search_subreddits, get_post_with_comments, get_subreddit_with_posts, get_subreddits_by_keywords, get_user, get_users_by_keywords

Local File Handling

Most commands that accept URLs also accept local file paths. The CLI automatically uploads local files before passing them to the API:

# These are equivalent:
gsk analyze "Describe this" -i ./photo.jpg
gsk img "Enhance this" -i ./photo.png -o ./result.png
gsk video "Animate this" -i ./frame.jpg -o ./video.mp4

Use -o / --output-file to save generated results directly to a local file.

Auto-Update

The CLI checks for updates every 4 hours and installs new versions in the background.

To disable auto-update:

# Via environment variable
export GSK_NO_AUTO_UPDATE=1

# Via config file
# Add "auto_update": false to ~/.genspark-tool-cli/config.json

Output Conventions

StreamContentConsumer
stdoutJSON resultPrograms / AI agents
stderrProgress, debug, error messagesHuman / logs

This separation allows programs to parse clean JSON from stdout while humans can follow progress on stderr.

Available Models

Image Generation Models — gsk img -m <model>
ModelDescription
nano-banana-2Gemini 3.1 Flash Image - Fast and efficient with advanced reasoning. Multi-image fusion with up to 14 references. Supports 0.5K-4K resolution
fal-ai/gpt-image-1.5GPT Image 1.5 - Supports text-to-image and image editing with multi-image input
imagen4Latest high quality image generation model, upgrade from Imagen 3
recraft-v3Realistic image generation model
fal-ai/bytedance/seedream/v5/liteBytedance Seedream v5 Lite - Text-to-image and image editing with native 2K resolution and excellent text layout
fal-ai/flux-2Flux 2 - Text-to-image and image editing with enhanced realism and crisp text generation. Supports up to 3 images for edit mode
fal-ai/flux-2-proFlux 2 Pro - Higher quality version of Flux 2 with professional-grade output
fal-ai/z-image/turboZ-Image Turbo - Optimized for speed. Good for quick iterations, bulk generation, and style transfer
ideogram/V_3Ideogram V3 - Character reference specialist with superior facial feature preservation and character consistency
qwen-imageChinese poster specialist with outstanding Chinese text rendering and cultural context mastery
bbox-segmentExtract subjects from images based on bounding box region
fal-bria-rmbgRemove background from image
fal-ai/recraft-clarity-upscaleUpscale image
fal-ai/image-editing/text-removalRemove text and watermarks from images while preserving background
flux-pro/outpaintExpand image to a specific aspect ratio
Video Generation Models — gsk video -m <model>
ModelCapabilitiesAspect RatiosDurationNotes
kling/v3Text/Image-to-video16:9, 9:16, 1:13-15sLatest Kling V3 with audio. Pro/Standard quality modes
gemini/veo3.1Text/Image-to-video16:9, 9:168sLatest Veo with enhanced quality. Supports fast_mode and hd_mode (1080p)
gemini/veo3.1/reference-to-videoReference-to-video16:9, 9:168sGenerate video using 1+ reference images. Supports fast_mode and hd_mode
gemini/veo3.1/first-last-frame-to-videoFrame transition16:9, 9:168sPrecise transitions from first to last frame. Requires exactly 2 images
minimax/hailuo-2.3/standardText/Image-to-video16:9, 9:166s, 10sFast (~4min), cost-effective. Supports first & last frame control
wan/v2.6Text/Image/Video-to-video16:9, 9:16, 1:1, 4:3, 3:45s, 10s, 15s1080p with audio. Supports reference-to-video with 1-3 reference videos
vidu/q3Text/Image-to-video16:9, 9:16, 4:3, 3:4, 1:11-16sEnhanced quality with audio generation. Resolution: 720p, 1080p
runway/gen4_turboImage-to-video5:3, 3:55s, 10sFast, high quality. Requires reference image
pixverse/v5Text/Image-to-video16:9, 9:16, 4:3, 1:1, 3:45sFast (~30s). Supports start/end frame transitions
fal-ai/bytedance/seedance/v1.5/proText/Image-to-video21:9, 16:9, 4:3, 1:1, 3:4, 9:164-12sSeedance v1.5 Pro with native audio support. Supports first & last frame control
sora-2Text/Image/Video-to-video16:9, 9:164s, 8s, 12sOpenAI Sora 2 for fast, creative videos. Supports video remixing
sora-2-proText/Image-to-video16:9, 9:164s, 8sSora 2 Pro - Higher fidelity, cinematic quality. 720p and 1080p
fal-ai/bytedance-upscaler/upscale/videoVideo upscalingUpscale existing videos to 2K. Requires video_url parameter
xai/grok-imagine-videoText/Image-to-video16:9, 4:3, 1:1, 3:4, 9:16, 21:9, 9:211-15sxAI Grok Imagine Video. 720p HD output
Audio Generation Models — gsk audio -m <model>

Text-to-Speech (TTS)

ModelDescription
google/gemini-2.5-pro-preview-ttsBest, high-quality, realistic TTS. Supports one or multiple speakers with speaker prefixes (e.g., Speaker1: text, Speaker2: text)
elevenlabs/v3-ttsAdvanced multilingual TTS with multi-speaker dialogue support. Supports emotional tags like [excited], [whispers], [laughs]
fal-ai/elevenlabs/tts/multilingual-v2High-quality multilingual TTS. Preferred for English
fal-ai/minimax/speech-2.8-hdHigh-quality multilingual TTS. Preferred for Chinese, Cantonese, Japanese, Korean. One speaker per generation

Sound Effects

ModelDescription
elevenlabs/sound-effectsSound effect generation. Duration: 0.1-22 seconds

Music Generation

ModelDescription
elevenlabs/musicElevenLabs music generation with vocals/singing. Lyrics auto-generated (no custom lyrics). Duration: 10s-5min
CassetteAI/music-generatorBackground music generation. Duration: 10-180 seconds
mureka/song-generatorProfessional song generation with lyrics. Supports style prompts, reference tracks, vocal and melody inputs. Max: 180s
mureka/instrumental-generatorInstrumental music generation without vocals. Supports style prompts and reference tracks. Max: 180s
fal-ai/lyria2Google Lyria 2 text-to-music. Good for sound effects and lyrics-free music. Max: 30 seconds
fal-ai/minimax-music/v2.6Song generation with lyrics using MiniMax Music 2.6. Supports markers (Verse), (Chorus), (Bridge), etc. Requires style prompt and lyrics

Voice Cloning & Transformation

ModelDescription
elevenlabs/voice-cloneClone a voice from audio samples. Returns voice ID for use in TTS generation
elevenlabs/voice-changerTransform audio from one voice to another. Requires source audio and target voice ID
fal-ai/minimax/voice-cloneClone a voice from a sample audio and generate speech from text prompts (gated feature)

License

MIT

Keywords

genspark

FAQs

Package last updated on 06 May 2026

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts