Fish Audio MCP Server

An MCP (Model Context Protocol) server that provides seamless integration between Fish Audio's Text-to-Speech API and LLMs like Claude, enabling natural language-driven speech synthesis.
What is Fish Audio?
Fish Audio is a cutting-edge Text-to-Speech platform that offers:
- π State-of-the-art voice synthesis with natural-sounding output
- π― Voice cloning capabilities to create custom voice models
- π Multilingual support including English, Japanese, Chinese, and more
- β‘ Low-latency streaming for real-time applications
- π¨ Fine-grained control over speech prosody and emotions
This MCP server brings Fish Audio's powerful capabilities directly to your LLM workflows.
Features
- ποΈ High-Quality TTS: Leverage Fish Audio's state-of-the-art TTS models
- π Streaming Support: Real-time audio streaming for low-latency applications
- π¨ Multiple Voices: Support for custom voice models via reference IDs
- π― Smart Voice Selection: Select voices by ID, name, or tags
- π Voice Library Management: Configure and manage multiple voice references
- π§ Flexible Configuration: Environment variable-based configuration
- π¦ Multiple Audio Formats: Support for MP3, WAV, PCM, and Opus
- π Easy Integration: Simple setup with any MCP-compatible client
Quick Start
Installation
You can run this MCP server directly using npx:
npx @alanse/fish-audio-mcp-server
Or install it globally:
npm install -g @alanse/fish-audio-mcp-server
Configuration
export FISH_API_KEY=your_fish_audio_api_key_here
- Add to your MCP settings configuration:
Single Voice Mode (Simple)
{
"mcpServers": {
"fish-audio": {
"command": "npx",
"args": ["-y", "@alanse/fish-audio-mcp-server"],
"env": {
"FISH_API_KEY": "your_fish_audio_api_key_here",
"FISH_MODEL_ID": "speech-1.6",
"FISH_REFERENCE_ID": "your_voice_reference_id_here",
"FISH_OUTPUT_FORMAT": "mp3",
"FISH_STREAMING": "false",
"FISH_LATENCY": "balanced",
"FISH_MP3_BITRATE": "128",
"FISH_AUTO_PLAY": "false",
"AUDIO_OUTPUT_DIR": "~/.fish-audio-mcp/audio_output"
}
}
}
}
Multiple Voice Mode (Advanced)
{
"mcpServers": {
"fish-audio": {
"command": "npx",
"args": ["-y", "@alanse/fish-audio-mcp-server"],
"env": {
"FISH_API_KEY": "your_fish_audio_api_key_here",
"FISH_MODEL_ID": "speech-1.6",
"FISH_REFERENCES": "[{'reference_id':'id1','name':'Alice','tags':['female','english']},{'reference_id':'id2','name':'Bob','tags':['male','japanese']},{'reference_id':'id3','name':'Carol','tags':['female','japanese','anime']}]",
"FISH_DEFAULT_REFERENCE": "id1",
"FISH_OUTPUT_FORMAT": "mp3",
"FISH_STREAMING": "false",
"FISH_LATENCY": "balanced",
"FISH_MP3_BITRATE": "128",
"FISH_AUTO_PLAY": "false",
"AUDIO_OUTPUT_DIR": "~/.fish-audio-mcp/audio_output"
}
}
}
}
Environment Variables
FISH_API_KEY | Your Fish Audio API key | - | Yes |
FISH_MODEL_ID | TTS model to use (s1, speech-1.5, speech-1.6) | s1 | Optional |
FISH_REFERENCE_ID | Default voice reference ID (single reference mode) | - | Optional |
FISH_REFERENCES | Multiple voice references (see below) | - | Optional |
FISH_DEFAULT_REFERENCE | Default reference ID when using multiple references | - | Optional |
FISH_OUTPUT_FORMAT | Default audio format (mp3, wav, pcm, opus) | mp3 | Optional |
FISH_STREAMING | Enable streaming mode (HTTP/WebSocket) | false | Optional |
FISH_LATENCY | Latency mode (normal, balanced) | balanced | Optional |
FISH_MP3_BITRATE | MP3 bitrate (64, 128, 192) | 128 | Optional |
FISH_AUTO_PLAY | Auto-play audio and enable real-time playback | false | Optional |
AUDIO_OUTPUT_DIR | Directory for audio file output | ~/.fish-audio-mcp/audio_output | Optional |
Configuring Multiple Voice References
You can configure multiple voice references in two ways:
JSON Array Format (Recommended)
Use the FISH_REFERENCES environment variable with a JSON array:
FISH_REFERENCES='[
{"reference_id":"id1","name":"Alice","tags":["female","english"]},
{"reference_id":"id2","name":"Bob","tags":["male","japanese"]},
{"reference_id":"id3","name":"Carol","tags":["female","japanese","anime"]}
]'
FISH_DEFAULT_REFERENCE="id1"
Individual Format (Backward Compatibility)
Use numbered environment variables:
FISH_REFERENCE_1_ID=id1
FISH_REFERENCE_1_NAME=Alice
FISH_REFERENCE_1_TAGS=female,english
FISH_REFERENCE_2_ID=id2
FISH_REFERENCE_2_NAME=Bob
FISH_REFERENCE_2_TAGS=male,japanese
Usage
Once configured, the Fish Audio MCP server provides two tools to LLMs.
Tool 1: fish_audio_tts
Generates speech from text using Fish Audio's TTS API.
Parameters
text (required): Text to convert to speech (max 10,000 characters)
reference_id (optional): Voice model reference ID
reference_name (optional): Select voice by name
reference_tag (optional): Select voice by tag
streaming (optional): Enable streaming mode
format (optional): Output format (mp3, wav, pcm, opus)
mp3_bitrate (optional): MP3 bitrate (64, 128, 192)
normalize (optional): Enable text normalization (default: true)
latency (optional): Latency mode (normal, balanced)
output_path (optional): Custom output file path
auto_play (optional): Automatically play the generated audio
websocket_streaming (optional): Use WebSocket streaming instead of HTTP
realtime_play (optional): Play audio in real-time during WebSocket streaming
Voice Selection Priority: reference_id > reference_name > reference_tag > default
Tool 2: fish_audio_list_references
Lists all configured voice references.
Parameters
No parameters required.
Returns
- List of configured voice references with their IDs, names, and tags
- Default reference ID
Examples
Basic Text-to-Speech
User: "Generate speech saying 'Hello, world! Welcome to Fish Audio TTS.'"
Claude: I'll generate speech for that text using Fish Audio TTS.
[Uses fish_audio_tts tool with text parameter]
Result: Audio file saved to ./audio_output/tts_2025-01-03T10-30-00.mp3
Using Custom Voice by ID
User: "Generate speech with voice model xyz123 saying 'This is a custom voice test'"
Claude: I'll generate speech using the specified voice model.
[Uses fish_audio_tts tool with text and reference_id parameters]
Result: Audio generated with custom voice model xyz123
Using Voice by Name
User: "Use Alice's voice to say 'Hello from Alice'"
Claude: I'll generate speech using Alice's voice.
[Uses fish_audio_tts tool with reference_name: "Alice"]
Result: Audio generated with Alice's voice
Using Voice by Tag
User: "Generate Japanese speech saying 'γγγ«γ‘γ―' with an anime voice"
Claude: I'll generate Japanese speech with an anime-style voice.
[Uses fish_audio_tts tool with reference_tag: "anime"]
Result: Audio generated with anime voice style
List Available Voices
User: "What voices are available?"
Claude: I'll list all configured voice references.
[Uses fish_audio_list_references tool]
Result:
- Alice (id: id1) - Tags: female, english [Default]
- Bob (id: id2) - Tags: male, japanese
- Carol (id: id3) - Tags: female, japanese, anime
HTTP Streaming Mode
User: "Generate a long speech in streaming mode about the benefits of AI"
Claude: I'll generate the speech in streaming mode for faster response.
[Uses fish_audio_tts tool with streaming: true]
Result: Streaming audio saved to ./audio_output/tts_2025-01-03T10-35-00.mp3
WebSocket Real-time Streaming
User: "Stream and play in real-time: 'Welcome to the future of AI'"
Claude: I'll stream the speech via WebSocket and play it in real-time.
[Uses fish_audio_tts tool with websocket_streaming: true, realtime_play: true]
Result: Audio streamed and played in real-time via WebSocket
Development
Local Development
git clone https://github.com/da-okazaki/mcp-fish-audio-server.git
cd mcp-fish-audio-server
npm install
cp .env.example .env
npm run build
npm run dev
Testing
Run the test suite:
npm test
Project Structure
mcp-fish-audio-server/
βββ src/
β βββ index.ts # MCP server entry point
β βββ tools/
β β βββ tts.ts # TTS tool implementation
β βββ services/
β β βββ fishAudio.ts # Fish Audio API client
β βββ types/
β β βββ index.ts # TypeScript definitions
β βββ utils/
β βββ config.ts # Configuration management
βββ tests/ # Test files
βββ audio_output/ # Default audio output directory
βββ package.json
βββ tsconfig.json
βββ README.md
API Documentation
Fish Audio Service
The service provides two main methods:
Error Handling
The server handles various error scenarios:
- INVALID_API_KEY: Invalid or missing API key
- NETWORK_ERROR: Connection issues with Fish Audio API
- INVALID_PARAMS: Invalid request parameters
- QUOTA_EXCEEDED: API rate limit exceeded
- SERVER_ERROR: Fish Audio server errors
Troubleshooting
Common Issues
-
"FISH_API_KEY environment variable is required"
- Ensure you've set the
FISH_API_KEY environment variable
- Check that the API key is valid
-
"Network error: Unable to reach Fish Audio API"
- Check your internet connection
- Verify Fish Audio API is accessible
- Check for proxy/firewall issues
-
"Text length exceeds maximum limit"
- Split long texts into smaller chunks
- Maximum supported length is 10,000 characters
-
Audio files not appearing
- Check the
AUDIO_OUTPUT_DIR path exists
- Ensure write permissions for the directory
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature)
- Commit your changes (
git commit -m 'Add some AmazingFeature')
- Push to the branch (
git push origin feature/AmazingFeature)
- Open a Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Fish Audio for providing the excellent TTS API
- Anthropic for creating the Model Context Protocol
- The MCP community for inspiration and examples
Support
For issues, questions, or contributions, please visit the GitHub repository.
Changelog
v0.6.0 (2025-01-03)
- Added multiple voice reference management system
- New tool:
fish_audio_list_references to list configured voices
- Voice selection by name or tag in addition to ID
- Support for configuring multiple voices with metadata
- Added FISH_REFERENCES and FISH_DEFAULT_REFERENCE environment variables
- Enhanced voice selection with priority: ID > Name > Tag > Default
v0.5.4 (2025-01-03)
- Fixed zod version compatibility issues
- Resolved dependency conflicts between MCP SDK and Fish Audio SDK
- Verified local dev and build functionality
v0.5.3 (2025-01-03)
- Fixed missing zod dependency causing module resolution errors
- Improved compatibility when running via npx
v0.5.2 (2025-01-03)
- Fixed audio playback issue with FISH_STREAMING=true
- Fixed tilde (~) expansion in AUDIO_OUTPUT_DIR
- Improved stability by separating HTTP and WebSocket streaming
v0.5.1 (2025-01-03)
- Improved documentation formatting and clarity
- Updated environment variables table for better readability
- Made documentation more generic for all MCP clients
v0.5.0 (2025-01-03)
- Simplified environment variables: removed FISH_WEBSOCKET_STREAMING and FISH_REALTIME_PLAY
- WebSocket streaming now controlled by FISH_STREAMING
- Real-time playback now controlled by FISH_AUTO_PLAY
- Cleaner configuration with unified controls
v0.4.1 (2025-01-03)
- Added intelligent environment variable mapping
- FISH_WEBSOCKET_STREAMING defaults to FISH_STREAMING
- FISH_REALTIME_PLAY defaults to FISH_AUTO_PLAY
- Simplified configuration with smart defaults
v0.4.0 (2025-01-03)
- Refactored to use official Fish Audio SDK
- Improved WebSocket streaming implementation
- Fixed auto-play functionality
- Better error handling and connection stability
- Latency parameter now properly supported (normal/balanced)
- Cleaner codebase with SDK integration
v0.3.0 (2025-01-03)
- Added WebSocket streaming support for real-time TTS
- Added real-time audio playback during WebSocket streaming
- New parameters:
websocket_streaming and realtime_play
- Support for both HTTP and WebSocket streaming modes
- Real-time player for immediate audio output
v0.2.0 (2025-01-03)
- Added automatic audio playback feature with
auto_play parameter
- Added FISH_AUTO_PLAY environment variable for default behavior
- Support for cross-platform audio playback (macOS, Windows, Linux)
- HTTP streaming mode implementation
v0.1.2 (2025-01-03)
- Changed npm package name to @alanse/fish-audio-mcp-server
v0.1.1 (2025-01-03)
- Fixed directory creation error when running via npx
- Changed default audio output to user's home directory
v0.1.0 (2025-01-03)
- Initial release
- Basic TTS functionality
- Streaming support
- Environment variable configuration
- Multiple audio format support