Genspark CLI (gsk)
One CLI. Every AI capability. Search, generate, analyze, communicate — all from your terminal.
gsk is the command-line interface for the Genspark AI platform. It unifies 90+ AI tools behind a single binary: web search, image/video/audio generation with 40+ models, document analysis, media transcription, cloud file management, email (Gmail & Outlook), calendar, GitHub, Slack, Notion, Microsoft Teams, OneDrive, SharePoint, AI phone calls, stock data, social media data (Twitter, Instagram, Reddit), and autonomous AI agents — all with clean JSON output for seamless integration with AI coding assistants, automation pipelines, and scripts.
Capability Map
| 🔍 Search | Web search, image search |
| 📄 Documents | Crawl pages, summarize PDFs/docs |
| 🎨 Images | 16 models: GPT Image, Gemini, Flux 2, Imagen 4, Recraft, Ideogram, Seedream ... |
| 🎬 Videos | 14 models: Kling V3, Veo 3.1, Sora 2, Hailuo, Wan, Runway, PixVerse, Seedance ... |
| 🎵 Audio | 14 models: Gemini TTS, ElevenLabs, MiniMax, Mureka, CassetteAI, Lyria 2 ... |
| 🧠 Analysis | Image/video/audio understanding, OCR, video style replication |
| 📝 Transcribe | Whisper, Gemini, ElevenLabs Scribe |
| ☁️ AI Drive | Cloud file storage, download, compress |
| 📧 Email | Gmail & Outlook: read, search, send, reply, forward, archive, labels, attachments |
| 📅 Calendar | Google & Outlook: list, create, delete events |
| 💬 Collaboration | Slack, Microsoft Teams, Notion — send messages, search, manage channels/pages |
| 📂 Cloud Storage | Google Drive, OneDrive, SharePoint, Google Sheets, Google Docs, Google Contacts |
| 🐙 GitHub | List repos, search/create/update issues |
| 📞 Phone | AI-powered phone calls to businesses |
| 📈 Stocks | Real-time stock prices |
| 📱 Social Media | Twitter/X, Instagram, Reddit — search posts/users, get comments, connections, and more (30 APIs) |
| 🤖 Agents | Podcasts, docs, slides, deep research, fact-checking, websites, batch media generation |
| 🔊 Voice | Voice cloning, voice changer |
Table of Contents
Installation
npm install -g @genspark/cli
Requires Node.js >= 18.
Quick Start
gsk login
gsk search "latest AI news"
gsk img "A beautiful sunset over mountains" -o ./sunset.png
gsk crawl "https://example.com/article"
Authentication
Log in with your Genspark account:
gsk login
This opens a browser for authentication and saves the API key to ~/.genspark-tool-cli/config.json.
Alternatively, provide an API key directly:
export GSK_API_KEY="gsk_..."
gsk search "query" --api-key "gsk_..."
To check your current identity:
gsk login-info
gsk me
To log out:
gsk logout
Configuration
Config is loaded from three sources (highest priority first):
- CLI options —
--api-key, --base-url, etc.
- Environment variables —
GSK_API_KEY, GSK_BASE_URL, GSK_PROJECT_ID
- Config file —
~/.genspark-tool-cli/config.json
{
"api_key": "gsk_...",
"base_url": "https://www.genspark.ai",
"project_id": "project_abc123",
"debug": false,
"timeout": 300000
}
Global Options
--api-key <key> | GSK_API_KEY | — | API key (required) |
--base-url <url> | GSK_BASE_URL | https://www.genspark.ai | API base URL |
--project-id <id> | GSK_PROJECT_ID | — | Project ID for access control |
--debug | — | false | Enable debug output |
--timeout <ms> | — | 300000 (5 min) | Request timeout |
--output <format> | — | json | Output format: json or text |
--refresh | — | — | Force refresh cached tool schemas |
Commands
list-tools (alias: ls)
List all available tools.
gsk list-tools
gsk ls
login-info (alias: me)
Show your current account info — email, name, and membership plan.
gsk login-info
gsk me
init-opencode
Generate an .opencode.json config file for OpenCode, pre-configured to use Genspark's LLM proxy with your API key.
gsk init-opencode
gsk init-opencode --model claude-sonnet-4-6
gsk init-opencode -o ./my-project/.opencode.json
--model <name> | claude-opus-4-6-1m | Default model for OpenCode |
-o, --out <path> | .opencode.json (cwd) | Output file path |
init-skills
Sync GSK skill documents into the current project for AI agent discovery. Copies all skill docs and generates a CONTEXT.md entry point that AI agents (Claude Code, Gemini, etc.) can load automatically.
gsk init-skills
gsk init-skills --agent claude
gsk init-skills --agent all
gsk init-skills -o ./docs/gsk-skills
-o, --out <dir> | .gsk/skills (cwd) | Output directory for skills |
--agent <type> | — | Generate agent config: claude, gemini, or all |
Search & Crawl
web_search (alias: search)
Search the web.
gsk search "latest AI news"
crawler (alias: crawl)
Extract content from a web page or document.
gsk crawl "https://example.com/article"
summarize_large_document (alias: summarize)
Analyze a document and answer questions about it.
gsk summarize "https://example.com/report.pdf" --question "What are the key findings?"
<url> | Document URL (required, positional) |
--question <text> | Question about the document |
image_search (alias: img-search)
Search for images on the web.
gsk img-search "modern architecture"
Media Analysis & Transcription
understand_images (alias: analyze)
Analyze images with AI vision model.
gsk analyze "Describe this image" -i "https://example.com/image.jpg"
gsk analyze "Extract all text" -i "https://img1.jpg" "https://img2.jpg"
gsk analyze "What's in this photo?" -i ./photo.jpg
-i, --image_urls <url...> | — | Image URL(s) or local file path(s) to analyze (required) |
-r, --instruction <text> | — | Custom analysis instruction |
Image Generation
image_generation (alias: img)
Generate images using AI. Supports text-to-image and image-to-image.
gsk img "A beautiful sunset over mountains" -r "16:9" -o ./sunset.png
gsk img "Modern office at night" -s "4k" -r "1:1"
gsk img "A portrait in similar style" -i ./reference.png
-r, --aspect_ratio <ratio> | 1:1 | Aspect ratio (1:1, 16:9, 9:16) |
-s, --image_size <size> | auto | Image size: auto, 2k, 4k |
-m, --model <name> | — | Model to use (optional) |
-i, --image_urls <url...> | — | Reference image URL(s) or local file(s) for image-to-image |
-o, --output-file <path> | — | Download the generated file to a local path |
Video Generation
video_generation (alias: video)
Generate videos using AI.
gsk video "A cat playing with yarn" -m "kling/v1.6/standard" -d 5 -o ./cat.mp4
gsk video "Sunrise over a beach" -m "minimax/hailuo-02/standard" -r "16:9" -d 8
gsk video "Camera pan around the subject" -m "kling/v1.6/standard" -i ./photo.jpg
-m, --model <name> | — | Model (required). e.g., kling/v1.6/standard, minimax/hailuo-02/standard |
-r, --aspect_ratio <ratio> | 16:9 | Aspect ratio |
-d, --duration <sec> | 5 | Duration in seconds (2-15) |
-i, --image_urls <url...> | — | Reference image URL(s) or local file(s) |
-a, --audio_url <url> | — | Audio URL for soundtrack |
-o, --output-file <path> | — | Download the generated file to a local path |
Audio Generation
audio_generation (alias: audio)
Generate audio: TTS, music, or sound effects.
gsk audio "Hello, welcome to Genspark!" -m "google/gemini-2.5-pro-preview-tts" -r "professional female voice"
gsk audio "Hello, welcome to Genspark!" -m "google/gemini-2.5-pro-preview-tts" -o ./hello.mp3
gsk audio "A pop song" -m "fal-ai/minimax/speech-2.6-hd" -l "Verse 1: ..." -d 120
gsk audio "Door creaking slowly open" -m "sfx-model"
-m, --model <name> | — | Model (required). e.g., elevenlabs/v3-tts, fal-ai/minimax/speech-2.6-hd |
-d, --duration <sec> | 0 (auto) | Duration in seconds |
-r, --requirements <text> | — | Voice requirements for TTS |
-l, --lyrics <text> | — | Lyrics for song generation |
-o, --output-file <path> | — | Download the generated file to a local path |
File Transfer
upload
Upload a local file and get a URL for use in other commands.
gsk upload "./image.png"
gsk upload "./document.pdf"
download
Download a file from a file wrapper URL.
gsk download "/api/files/s/abc123"
gsk download "/api/files/s/abc123" -s "./downloaded.png"
-s, --save <path> | Download and save to local file path |
analyze_media (alias: media-analyze)
Analyze various types of media content including images, audio, and video.
gsk media-analyze -i "https://example.com/image.jpg" -r "Describe the content"
gsk media-analyze -i "https://example.com/video.mp4" -r "Summarize the video"
-i, --media_urls <urls> | — | Media URL(s) to analyze (required) |
-r, --requirements <text> | — | Analysis instructions |
audio_transcribe (alias: transcribe)
Transcribe audio files to text.
gsk transcribe -i "https://example.com/audio.mp3"
gsk transcribe -i ./meeting.wav -m "whisper-large-v3"
-i, --audio_urls <url...> | — | Audio URL(s) or local file(s) to transcribe (required) |
-m, --model <name> | — | Transcription model to use |
AI Drive (Cloud Storage)
aidrive (alias: drive)
AI-Drive file storage and management. List, create, delete, move files and directories. Download videos, audio, and files from URLs directly to AI-Drive.
gsk drive ls
gsk drive ls -p "/documents" -f file
gsk drive mkdir -p "/my-folder"
gsk drive move -p "/old-path/file.txt" --target_path "/new-path/file.txt"
gsk drive download_video --video_url "https://example.com/video.mp4" --target_folder "/videos"
gsk drive download_file --file_url "https://example.com/doc.pdf" --target_folder "/docs"
gsk drive upload --file_content "Hello World" --upload_path "/notes/hello.txt"
gsk drive upload --local_file ./report.pdf --upload_path /docs/report.pdf
gsk drive upload --local_file ./video.mp4 --upload_path /videos/demo.mp4
gsk drive upload --local_file ./photo.png
gsk drive upload --local_file ./doc.pdf --upload_path /docs/doc.pdf --override
gsk drive get_readable_url -p "/documents/report.pdf"
-p, --path <path> | — | File or directory path in AI-Drive |
-f, --filter_type <type> | all | Filter: all, file, directory |
--file_type <type> | all | File type filter: all, audio, video, image |
--target_path <path> | — | Target path for move operations |
--target_folder <path> | — | Target folder for downloads |
--video_url <url> | — | Video URL for download_video action |
--audio_url <url> | — | Audio URL for download_audio action |
--file_url <url> | — | File URL for download_file action |
--file_name <name> | — | Custom file name for downloads |
--file_content <text> | — | Inline text content to upload |
--local_file <path> | — | Local file path to upload directly to AI-Drive (streaming, no size limit) |
--upload_path <path> | — | Destination path for upload (defaults to /<filename> for --local_file) |
--override | false | Overwrite an existing file at the destination path |
AI Agents & Tasks
create_task (alias: task)
Create and execute tasks using specialized AI agents.
gsk task podcasts --task_name "AI Trends" --query "Create a podcast about AI trends" --instructions "Focus on practical applications"
gsk task docs --task_name "Quantum Report" --query "Write a report on quantum computing" --instructions "Include recent breakthroughs"
gsk task slides --task_name "Q4 Results" --query "Create a Q4 results presentation" --instructions "Use charts and data"
gsk task sheets --task_name "Sales Report" --query "Create a quarterly sales report with formulas" --instructions "Use formulas and formatting"
gsk task deep_research --task_name "Fusion Energy" --query "Research fusion energy advances" --instructions "Cover public and private sector"
gsk task cross_check --task_name "Earth shape" --query "The Earth is flat" --instructions "Verify this claim with evidence"
--task_name <name> | — | Name for the task (required) |
--query <text> | — | Query describing what to create (required) |
--instructions <text> | — | Detailed instructions (required) |
--acp | false | Start as ACP (Agent Client Protocol) stdio agent for multi-turn use with Genspark Claw |
Supported task types: super_agent, podcasts, docs, slides, sheets, deep_research, website, video_generation, audio_generation, meeting_notes, cross_check
ACP Mode
Use --acp to start a task agent as an Agent Client Protocol stdio server. This enables AI agent platforms like Genspark Claw to natively discover and interact with GSK agents, with multi-turn conversation support.
gsk task slides --acp
gsk task docs --acp
acpx configuration (~/.acpx/config.json):
{
"agents": {
"gsk-slides": { "command": "gsk task slides --acp" },
"gsk-docs": { "command": "gsk task docs --acp" },
"gsk-sheets": { "command": "gsk task sheets --acp" }
}
}
Then in Genspark Claw: /acp spawn gsk-slides to create and iterate on presentations via natural language.
Stock Prices
stock_price (alias: stock)
Retrieve stock price information and financial data.
gsk stock AAPL
gsk stock MSFT
Service-Level Tools
External service integrations are available as service-level tools — each service is a single command with an action parameter that dispatches to the underlying operation.
Requirements: Connect services in Genspark Account Settings → Integrations.
gmail
Gmail operations: search, read, send, reply, forward, delete, archive, move, mark_as_read, add_label, remove_label, create_label, get_attachment, list_send_as.
gsk gmail search --query "from:boss subject:report"
gsk gmail read --id 19cbfecd7fb14d46
gsk gmail send --to user@example.com --subject "Hello" --body "<p>Hi!</p>"
gsk gmail forward --message_id 19cbfecd7fb14d46 --to colleague@example.com
gsk gmail archive --message_id 19cbfecd7fb14d46
outlook_email
Outlook Email operations: search, read, send, reply, reply_draft, forward, delete, archive, move, mark_as_read, add_category, remove_category, get_attachment, group_list, group_search, group_read, group_reply.
gsk outlook_email search --queryString "quarterly report"
gsk outlook_email read --messageId AAMkAG...
gsk outlook_email send --to user@example.com --subject "Update" --body "Hi!"
google_calendar
Google Calendar operations: list, create, delete.
gsk google_calendar list
gsk google_calendar create --summary "Team Sync" --start_time "2026-04-20T10:00:00Z" --end_time "2026-04-20T11:00:00Z"
outlook_calendar
Outlook Calendar operations: list, create, delete.
gsk outlook_calendar list
meeting
Meeting notes operations: list, search, get.
gsk meeting list
gsk meeting search --keyword "quarterly planning"
gsk meeting get --task_id "e02fd0f1-..."
google_drive
Google Drive file operations: search, read, upload.
gsk google_drive search --query "budget 2026"
gsk google_drive read --file_id 1hq9kH63sc...
google_sheets
Google Sheets operations: create, read, write, append, search, export.
gsk google_sheets search --query "sales report"
gsk google_sheets read --spreadsheet_id 1ABC... --range "Sheet1!A1:D10"
google_docs
Google Docs operations: create, read, append, search.
gsk google_docs search --query "meeting notes"
gsk google_docs read --document_id 1ABC...
google_contacts
Google Contacts operations: search, get, create, update.
gsk google_contacts search --query "John"
github
GitHub operations: list_repos, search_issues, create_issue, update_issue.
gsk github list_repos
gsk github search_issues --q "repo:owner/repo is:open label:bug"
gsk github create_issue --owner myorg --repo myrepo --title "Bug report" --body "Description..."
slack
Slack messaging operations: send, search, lookup.
gsk slack search --query "deployment update"
gsk slack lookup --lookup_type channels --search_query "engineering"
gsk slack send --recipient "#general" --message "Hello team!"
notion
Notion page operations: search, read, create.
gsk notion search --query "project roadmap"
gsk notion read --page_id 2ce8b6a5-...
microsoft_teams
Microsoft Teams operations: send, list_channels, list_chats, list_teams, search, search_users, create_chat.
gsk microsoft_teams list_teams
gsk microsoft_teams list_channels --team_id 6c0db3a9-...
gsk microsoft_teams search --query "release notes"
onedrive
OneDrive file operations: list, search, read.
gsk onedrive search --query "presentation"
gsk onedrive list --folder_path "/Documents"
sharepoint
SharePoint operations: list, search, read_content, read_file.
gsk sharepoint search --query "company wiki"
gsk sharepoint list --site_id abc123
outlook_contacts
Outlook Contacts operations: search.
gsk outlook_contacts search --query "John"
AI Phone Calls
phone-call (alias: call-for-me)
Make an AI phone call on your behalf. The AI validates prerequisites, resolves contact info, and initiates the call.
gsk phone-call "Pizza Hut" -c "+1-555-123-4567" -p "Check if they deliver to my area"
gsk phone-call "Joe's Pizza" -c "ChIJxxxxxxxx" --is_place_id -p "Reserve a table for 4"
gsk phone-call "Pizza Hut" -c "+1-555-123-4567" -p "Check hours" --dry-run
<recipient> | — | Name of the person or business to call (required, positional) |
-c, --contact_info <value> | — | Phone number or Google Maps place_id (required) |
--is_place_id | false | Treat contact_info as a Google Maps place_id |
-p, --purpose <value> | — | Purpose of the call (required) |
--dry-run | — | Only validate and resolve contact info, do not initiate the call |
Social Media
Retrieve data from Twitter/X, Instagram, and Reddit. All social commands are grouped under gsk social.
Search and retrieve data from Twitter/X. 12 actions available.
gsk social twitter search_posts -q "artificial intelligence" --start_date 2026-03-01 --language en
gsk social twitter search_users -q "openai" --limit 5
gsk social twitter get_posts_by_author -q "elonmusk" --start_date 2026-01-01
gsk social twitter get_posts_by_ids --post_ids "123456789,987654321"
gsk social twitter get_user -q "elonmusk"
gsk social twitter get_user_connections -q "elonmusk" --connection_type followers
gsk social twitter get_users_by_keywords -q "machine learning" --start_date 2026-01-01
gsk social twitter get_comments -p "123456789" --start_date 2026-03-01
gsk social twitter get_quotes -p "123456789"
gsk social twitter get_retweets -p "123456789"
gsk social twitter get_post_interacting_users -p "123456789" --interaction_type retweeters
gsk social twitter count_posts -q "AI" --start_date 2026-03-01 --end_date 2026-03-10
<action> | — | Action to perform (required, positional) |
-q, --query <text> | — | Search query, username, or identifier |
-p, --post_id <id> | — | Tweet/post ID |
--post_ids <ids> | — | Comma-separated tweet IDs |
--connection_type <type> | followers | followers or following |
--interaction_type <type> | retweeters | commenters, quoters, or retweeters |
--start_date <YYYY-MM-DD> | — | Start date filter |
--end_date <YYYY-MM-DD> | — | End date filter |
--language <code> | — | Language filter (e.g., en, zh) |
--limit <n> | — | Max number of results |
Actions: search_posts, search_users, get_posts_by_author, get_posts_by_ids, get_user, get_user_connections, get_users_by_keywords, get_comments, get_quotes, get_retweets, get_post_interacting_users, count_posts
social instagram
Search and retrieve data from Instagram. 9 actions available.
gsk social instagram search_posts -q "travel photography" --start_date 2026-01-01
gsk social instagram search_users -q "natgeo" --limit 5
gsk social instagram get_posts_by_user -q "natgeo" --start_date 2026-03-01
gsk social instagram get_posts_by_ids --post_ids "abc123,def456"
gsk social instagram get_user -q "natgeo"
gsk social instagram get_user_connections -q "natgeo" --connection_type following
gsk social instagram get_users_by_keywords -q "landscape photographer"
gsk social instagram get_comments -p "abc123" --start_date 2026-03-01
gsk social instagram get_post_interacting_users -p "abc123" --interaction_type likers
<action> | — | Action to perform (required, positional) |
-q, --query <text> | — | Search query, username, or identifier |
-p, --post_id <id> | — | Post ID |
--post_ids <ids> | — | Comma-separated post IDs |
--connection_type <type> | followers | followers or following |
--interaction_type <type> | likers | likers or commenters |
--start_date <YYYY-MM-DD> | — | Start date filter |
--end_date <YYYY-MM-DD> | — | End date filter |
--limit <n> | — | Max number of results |
Actions: search_posts, search_users, get_posts_by_user, get_posts_by_ids, get_user, get_user_connections, get_users_by_keywords, get_comments, get_post_interacting_users
social reddit
Search and retrieve data from Reddit. 9 actions available.
gsk social reddit search_posts -q "rust programming" --sort top --time week -s "programming"
gsk social reddit search_comments -q "async await" -s "rust"
gsk social reddit search_users -q "spez" --limit 5
gsk social reddit search_subreddits -q "machine learning" --limit 10
gsk social reddit get_post_with_comments -p "1abc2de"
gsk social reddit get_subreddit_with_posts -q "programming"
gsk social reddit get_subreddits_by_keywords -q "artificial intelligence"
gsk social reddit get_user -q "spez"
gsk social reddit get_users_by_keywords -q "neural networks" -s "MachineLearning"
<action> | — | Action to perform (required, positional) |
-q, --query <text> | — | Search query, username, or subreddit name |
-p, --post_id <id> | — | Post ID |
-s, --subreddit <name> | — | Subreddit name filter |
--sort <order> | — | Sort: relevance, hot, top, new, comments |
--time <range> | — | Time filter: hour, day, week, month, year, all |
--start_date <YYYY-MM-DD> | — | Start date filter |
--end_date <YYYY-MM-DD> | — | End date filter |
--limit <n> | — | Max number of results |
Actions: search_posts, search_comments, search_users, search_subreddits, get_post_with_comments, get_subreddit_with_posts, get_subreddits_by_keywords, get_user, get_users_by_keywords
Local File Handling
Most commands that accept URLs also accept local file paths. The CLI automatically uploads local files before passing them to the API:
gsk analyze "Describe this" -i ./photo.jpg
gsk img "Enhance this" -i ./photo.png -o ./result.png
gsk video "Animate this" -i ./frame.jpg -o ./video.mp4
Use -o / --output-file to save generated results directly to a local file.
Auto-Update
The CLI checks for updates every 4 hours and installs new versions in the background.
To disable auto-update:
export GSK_NO_AUTO_UPDATE=1
Output Conventions
| stdout | JSON result | Programs / AI agents |
| stderr | Progress, debug, error messages | Human / logs |
This separation allows programs to parse clean JSON from stdout while humans can follow progress on stderr.
Available Models
Image Generation Models — gsk img -m <model>
nano-banana-2 | Gemini 3.1 Flash Image - Fast and efficient with advanced reasoning. Multi-image fusion with up to 14 references. Supports 0.5K-4K resolution |
fal-ai/gpt-image-1.5 | GPT Image 1.5 - Supports text-to-image and image editing with multi-image input |
imagen4 | Latest high quality image generation model, upgrade from Imagen 3 |
recraft-v3 | Realistic image generation model |
fal-ai/bytedance/seedream/v5/lite | Bytedance Seedream v5 Lite - Text-to-image and image editing with native 2K resolution and excellent text layout |
fal-ai/flux-2 | Flux 2 - Text-to-image and image editing with enhanced realism and crisp text generation. Supports up to 3 images for edit mode |
fal-ai/flux-2-pro | Flux 2 Pro - Higher quality version of Flux 2 with professional-grade output |
fal-ai/z-image/turbo | Z-Image Turbo - Optimized for speed. Good for quick iterations, bulk generation, and style transfer |
ideogram/V_3 | Ideogram V3 - Character reference specialist with superior facial feature preservation and character consistency |
qwen-image | Chinese poster specialist with outstanding Chinese text rendering and cultural context mastery |
bbox-segment | Extract subjects from images based on bounding box region |
fal-bria-rmbg | Remove background from image |
fal-ai/recraft-clarity-upscale | Upscale image |
fal-ai/image-editing/text-removal | Remove text and watermarks from images while preserving background |
flux-pro/outpaint | Expand image to a specific aspect ratio |
Video Generation Models — gsk video -m <model>
kling/v3 | Text/Image-to-video | 16:9, 9:16, 1:1 | 3-15s | Latest Kling V3 with audio. Pro/Standard quality modes |
gemini/veo3.1 | Text/Image-to-video | 16:9, 9:16 | 8s | Latest Veo with enhanced quality. Supports fast_mode and hd_mode (1080p) |
gemini/veo3.1/reference-to-video | Reference-to-video | 16:9, 9:16 | 8s | Generate video using 1+ reference images. Supports fast_mode and hd_mode |
gemini/veo3.1/first-last-frame-to-video | Frame transition | 16:9, 9:16 | 8s | Precise transitions from first to last frame. Requires exactly 2 images |
minimax/hailuo-2.3/standard | Text/Image-to-video | 16:9, 9:16 | 6s, 10s | Fast (~4min), cost-effective. Supports first & last frame control |
wan/v2.6 | Text/Image/Video-to-video | 16:9, 9:16, 1:1, 4:3, 3:4 | 5s, 10s, 15s | 1080p with audio. Supports reference-to-video with 1-3 reference videos |
vidu/q3 | Text/Image-to-video | 16:9, 9:16, 4:3, 3:4, 1:1 | 1-16s | Enhanced quality with audio generation. Resolution: 720p, 1080p |
runway/gen4_turbo | Image-to-video | 5:3, 3:5 | 5s, 10s | Fast, high quality. Requires reference image |
pixverse/v5 | Text/Image-to-video | 16:9, 9:16, 4:3, 1:1, 3:4 | 5s | Fast (~30s). Supports start/end frame transitions |
fal-ai/bytedance/seedance/v1.5/pro | Text/Image-to-video | 21:9, 16:9, 4:3, 1:1, 3:4, 9:16 | 4-12s | Seedance v1.5 Pro with native audio support. Supports first & last frame control |
sora-2 | Text/Image/Video-to-video | 16:9, 9:16 | 4s, 8s, 12s | OpenAI Sora 2 for fast, creative videos. Supports video remixing |
sora-2-pro | Text/Image-to-video | 16:9, 9:16 | 4s, 8s | Sora 2 Pro - Higher fidelity, cinematic quality. 720p and 1080p |
fal-ai/bytedance-upscaler/upscale/video | Video upscaling | — | — | Upscale existing videos to 2K. Requires video_url parameter |
xai/grok-imagine-video | Text/Image-to-video | 16:9, 4:3, 1:1, 3:4, 9:16, 21:9, 9:21 | 1-15s | xAI Grok Imagine Video. 720p HD output |
Audio Generation Models — gsk audio -m <model>
Text-to-Speech (TTS)
google/gemini-2.5-pro-preview-tts | Best, high-quality, realistic TTS. Supports one or multiple speakers with speaker prefixes (e.g., Speaker1: text, Speaker2: text) |
elevenlabs/v3-tts | Advanced multilingual TTS with multi-speaker dialogue support. Supports emotional tags like [excited], [whispers], [laughs] |
fal-ai/elevenlabs/tts/multilingual-v2 | High-quality multilingual TTS. Preferred for English |
fal-ai/minimax/speech-2.8-hd | High-quality multilingual TTS. Preferred for Chinese, Cantonese, Japanese, Korean. One speaker per generation |
Sound Effects
elevenlabs/sound-effects | Sound effect generation. Duration: 0.1-22 seconds |
Music Generation
elevenlabs/music | ElevenLabs music generation with vocals/singing. Lyrics auto-generated (no custom lyrics). Duration: 10s-5min |
CassetteAI/music-generator | Background music generation. Duration: 10-180 seconds |
mureka/song-generator | Professional song generation with lyrics. Supports style prompts, reference tracks, vocal and melody inputs. Max: 180s |
mureka/instrumental-generator | Instrumental music generation without vocals. Supports style prompts and reference tracks. Max: 180s |
fal-ai/lyria2 | Google Lyria 2 text-to-music. Good for sound effects and lyrics-free music. Max: 30 seconds |
fal-ai/minimax-music/v2.6 | Song generation with lyrics using MiniMax Music 2.6. Supports markers (Verse), (Chorus), (Bridge), etc. Requires style prompt and lyrics |
Voice Cloning & Transformation
elevenlabs/voice-clone | Clone a voice from audio samples. Returns voice ID for use in TTS generation |
elevenlabs/voice-changer | Transform audio from one voice to another. Requires source audio and target voice ID |
fal-ai/minimax/voice-clone | Clone a voice from a sample audio and generate speech from text prompts (gated feature) |
License
MIT