mult-fetch-mcp-server

This project implements an MCP-compliant client and server for communication between AI assistants and external tools.
English | δΈζζζ‘£
Project Structure
fetch-mcp/
βββ src/ # Source code directory
β βββ lib/ # Library files
β β βββ fetchers/ # Web fetching implementation
β β β βββ browser/ # Browser-based fetching
β β β β βββ BrowserFetcher.ts # Browser fetcher implementation
β β β β βββ BrowserInstance.ts # Browser instance management
β β β β βββ PageOperations.ts # Page interaction operations
β β β βββ node/ # Node.js-based fetching
β β β βββ common/ # Shared fetching utilities
β β βββ utils/ # Utility modules
β β β βββ ChunkManager.ts # Content chunking
β β β βββ ContentProcessor.ts # HTML to text conversion
β β β βββ ContentExtractor.ts # Intelligent content extraction
β β β βββ ContentSizeManager.ts # Content size limiting
β β β βββ ErrorHandler.ts # Error handling
β β βββ server/ # Server-related modules
β β β βββ index.ts # Server entry
β β β βββ browser.ts # Browser management
β β β βββ fetcher.ts # Web fetching logic
β β β βββ tools.ts # Tool registration and handling
β β β βββ resources.ts # Resource handling
β β β βββ prompts.ts # Prompt templates
β β β βββ types.ts # Server type definitions
β β βββ i18n/ # Internationalization support
β β βββ types.ts # Common type definitions
β βββ client.ts # MCP client implementation
β βββ mcp-server.ts # MCP server main entry
βββ index.ts # Server entry point
βββ tests/ # Test files
βββ dist/ # Compiled files
MCP Specification
The Model Context Protocol (MCP) defines two main transport methods:
- Standard Input/Output (Stdio): The client starts the MCP server as a child process, and they communicate through standard input (stdin) and standard output (stdout).
- Server-Sent Events (SSE): Used to pass messages between client and server.
This project implements the Standard Input/Output (Stdio) transport method.
Features
- Implementation based on the official MCP SDK
- Support for Standard Input/Output (Stdio) transport
- Multiple web scraping methods (HTML, JSON, text, Markdown, plain text conversion)
- Intelligent mode switching: automatic switching between standard requests and browser mode
- Content size management: automatically splits large content into manageable chunks to solve AI model context size limitations
- Chunked content retrieval: ability to request specific chunks of large content while maintaining context continuity
- Detailed debug logging to stderr
- Bilingual internationalization (English and Chinese)
- Modular design for easy maintenance and extension
- Intelligent Content Extraction: Based on Mozilla's Readability library, capable of extracting meaningful content from web pages while filtering out advertisements and navigation elements
- Metadata Support: Ability to extract webpage metadata such as title, author, publication date, and site information
- Smart Content Detection: Automatically detects if a page contains meaningful content, filtering out login pages, error pages, and other pages without substantial content
- Browser Automation Enhancements: Support for page scrolling, cookie management, selector waiting, and other advanced browser interactions
Installation
Installing via Smithery
To install Mult Fetch MCP Server for Claude Desktop automatically via Smithery:
npx -y @smithery/cli install @lmcc-dev/mult-fetch-mcp-server --client claude
Local Installation
pnpm install
Global Installation
pnpm add -g @lmcc-dev/mult-fetch-mcp-server
Or run directly with npx (no installation required):
npx @lmcc-dev/mult-fetch-mcp-server
Integration with Claude
To integrate this tool with Claude desktop, you need to add server configuration:
Configuration File Location
- MacOS:
~/Library/Application Support/Claude/claude_desktop_config.json
- Windows:
%APPDATA%/Claude/claude_desktop_config.json
Configuration Examples
Method 1: Using npx (Recommended)
This method is the simplest, doesn't require specifying the full path, and is suitable for global installation or direct use with npx:
{
"mcpServers": {
"mult-fetch-mcp-server": {
"command": "npx",
"args": ["@lmcc-dev/mult-fetch-mcp-server"],
"env": {
"MCP_LANG": "en"
}
}
}
}
Method 2: Specifying Full Path
If you need to use a specific installation location, you can specify the full path:
{
"mcpServers": {
"mult-fetch-mcp-server": {
"command": "path-to/bin/node",
"args": ["path-to/@lmcc-dev/mult-fetch-mcp-server/dist/index.js"],
"env": {
"MCP_LANG": "en"
}
}
}
}
Please replace path-to/bin/node
with the path to the Node.js executable on your system, and replace path-to/@lmcc-dev/mult-fetch-mcp-server
with the actual path to this project.
Usage Examples
Below is an example of using this tool in Claude desktop client:

The image shows how Claude can use the fetch tools to retrieve web content and process it according to your instructions.
Usage
After configuration, restart Claude desktop, and you can use the following tools in your conversation:
fetch_html
: Get HTML content of a webpage
fetch_json
: Get JSON data
fetch_txt
: Get plain text content
fetch_markdown
: Get Markdown formatted content
fetch_plaintext
: Get plain text content converted from HTML (strips HTML tags)
Build
pnpm run build
Run Server
pnpm run server
node dist/index.js
@lmcc-dev/mult-fetch-mcp-server
npx @lmcc-dev/mult-fetch-mcp-server
Client Demo Tools
Note: The following client.js functionality is provided for demonstration and testing purposes only. When used with Claude or other AI assistants, the MCP server is driven by the AI, which manages the chunking process automatically.
Command Line Client
The project includes a command-line client for testing and development purposes:
pnpm run client <method> <params_json>
pnpm run client fetch_html '{"url": "https://example.com", "debug": true}'
Demo Client Chunk Control Parameters
When testing with the command-line client, you can use these parameters to demonstrate content chunking capabilities:
--all-chunks
: Command line flag to automatically fetch all chunks in sequence (demonstration purpose only)
--max-chunks
: Command line flag to limit the maximum number of chunks to fetch (optional, default is 10)
Real-time Output Demo
The client.js demo tool provides real-time output capabilities:
node dist/src/client.js fetch_html '{"url":"https://example.com", "startCursor": 0, "contentSizeLimit": 500}' --all-chunks --debug
The demo client will automatically fetch all chunks in sequence and display them immediately, showcasing how large content can be processed in real-time.
Run Tests
npm run test:mcp
npm run test:mini4k
npm run test:direct
Language Settings
This project supports Chinese and English bilingual internationalization. You can set the language using environment variables:
Using Environment Variables
Set the MCP_LANG
environment variable to control the language:
export MCP_LANG=en
npm run server
export MCP_LANG=zh
npm run server
set MCP_LANG=zh
npm run server
Using environment variables ensures that all related processes (including the MCP server) use the same language settings.
Default Language
By default, the system will choose a language according to the following priority:
MCP_LANG
environment variable
- Operating system language (if it starts with "zh", use Chinese)
- English (as the final fallback option)
Debugging
This project follows the MCP protocol specification and does not output any logs by default to avoid interfering with JSON-RPC communication. Debug information is controlled through call parameters:
Using the debug Parameter
Set the debug: true
parameter when calling a tool:
{
"url": "https://example.com",
"debug": true
}
Debug messages are sent to the standard error stream (stderr) using the following format:
[MCP-SERVER] MCP server starting...
[CLIENT] Fetching URL: https://example.com
Debug Log File
When debug mode is enabled, all debug messages are also written to a log file located at:
~/.mult-fetch-mcp-server/debug.log
This log file can be accessed through the MCP resources API:
const result = await client.readResource({ uri: "file:///logs/debug" });
console.log(result.contents[0].text);
const clearResult = await client.readResource({ uri: "file:///logs/clear" });
console.log(clearResult.contents[0].text);
Proxy Settings
This tool supports various methods to configure proxy settings:
1. Using the proxy
Parameter
The most direct way is to specify the proxy in the request parameters:
{
"url": "https://example.com",
"proxy": "http://your-proxy-server:port",
"debug": true
}
2. Using Environment Variables
The tool will automatically detect and use proxy settings from standard environment variables:
export HTTP_PROXY=http://your-proxy-server:port
export HTTPS_PROXY=http://your-proxy-server:port
npm run server
3. System Proxy Detection
The tool attempts to detect system proxy settings based on your operating system:
- Windows: Reads proxy settings from environment variables using the
set
command
- macOS/Linux: Reads proxy settings from environment variables using the
env
command
4. Proxy Troubleshooting
If you're having issues with proxy detection:
- Use the
debug: true
parameter to see detailed logs about proxy detection
- Explicitly specify the proxy using the
proxy
parameter
- Ensure your proxy URL is in the correct format:
http://host:port
or https://host:port
- For websites that require browser capabilities, set
useBrowser: true
to use browser mode
5. Browser Mode and Proxies
When using browser mode (useBrowser: true
), the tool will:
- First try to use the explicitly specified proxy (if provided)
- Then try to use system proxy settings
- Finally, proceed without a proxy if none is found
Browser mode is particularly useful for websites that implement anti-scraping measures or require JavaScript execution.
Parameter Handling
This project handles parameters in the following ways:
- debug: Passed through call parameters, each request can individually control whether to enable debug output
- MCP_LANG: Retrieved from environment variables, controls the language settings of the entire server
Usage
Creating a Client
import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { StdioClientTransport } from '@modelcontextprotocol/sdk/client/stdio.js';
import path from 'path';
import { fileURLToPath } from 'url';
const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);
const transport = new StdioClientTransport({
command: 'node',
args: [path.resolve(__dirname, 'dist/index.js')],
stderr: 'inherit',
env: {
...process.env
}
});
const client = new Client({
name: "example-client",
version: "1.0.0"
});
await client.connect(transport);
const result = await client.callTool({
name: 'fetch_html',
arguments: {
url: 'https://example.com',
debug: true
}
});
if (result.isError) {
console.error('Fetch failed:', result.content[0].text);
} else {
console.log('Fetch successful!');
console.log('Content preview:', result.content[0].text.substring(0, 500));
}
Supported Tools
fetch_html
: Get HTML content of a webpage
fetch_json
: Get JSON data
fetch_txt
: Get plain text content
fetch_markdown
: Get Markdown formatted content
fetch_plaintext
: Get plain text content converted from HTML (strips HTML tags)
Resources Support
The server includes support for the resources/list and resources/read methods, but currently no resources are defined in the implementation. The resource system is designed to provide access to project files and documentation, but this feature is not fully implemented yet.
Resource Usage Example
const resourcesResult = await client.listResources({});
console.log('Available resources:', resourcesResult);
Supported Prompt Templates
The server provides the following prompt templates:
fetch-website
: Get website content, supporting different formats and browser mode
extract-content
: Extract specific content from a website, supporting CSS selectors and data type specification
debug-fetch
: Debug website fetching issues, analyze possible causes and provide solutions
Prompt Template Usage
- Use
prompts/list
to get a list of available prompt templates
- Use
prompts/get
to get specific prompt template content
const promptsResult = await client.listPrompts({});
console.log('Available prompts:', promptsResult);
const fetchPrompt = await client.getPrompt({
name: "fetch-website",
arguments: {
url: "https://example.com",
format: "html",
useBrowser: "false"
}
});
console.log('Fetch website prompt:', fetchPrompt);
const debugPrompt = await client.getPrompt({
name: "debug-fetch",
arguments: {
url: "https://example.com",
error: "Connection timeout"
}
});
console.log('Debug fetch prompt:', debugPrompt);
Parameter Options
Each tool supports the following parameters:
Basic Parameters
url
: URL to fetch (required)
headers
: Custom request headers (optional, default {})
proxy
: Proxy server URL in the format http://host:port or https://host:port (optional)
Network Control Parameters
timeout
: Timeout in milliseconds (optional, default is 30000)
maxRedirects
: Maximum number of redirects to follow (optional, default is 10)
noDelay
: Whether to disable random delay between requests (optional, default is false)
useSystemProxy
: Whether to use system proxy (optional, default is true)
Content Size Control Parameters
enableContentSplitting
: Whether to split large content into chunks (optional, default is true)
contentSizeLimit
: Maximum content size in bytes before splitting (optional, default is 50000)
startCursor
: Starting cursor position in bytes for retrieving content from a specific position (optional, default is 0)
These parameters help manage large content that would exceed AI model context size limits, allowing you to retrieve web content in manageable chunks while maintaining the ability to process the complete information.
Chunk Management
chunkId
: Unique identifier for a chunk set when content is split (used for requesting subsequent chunks)
When content is split into chunks, the response includes metadata that allows the AI to request subsequent chunks using the chunkId
and startCursor
parameters. The system uses byte-level chunk management to provide precise control over content retrieval, enabling seamless processing of content from any position.
Mode Control Parameters
useBrowser
: Whether to use browser mode (optional, default is false)
useNodeFetch
: Whether to force using Node.js mode (optional, default is false, mutually exclusive with useBrowser
)
autoDetectMode
: Whether to automatically detect and switch to browser mode if standard mode fails with 403/Forbidden errors (optional, default is true). Set to false to strictly use the specified mode without automatic switching.
Browser Mode Specific Parameters
waitForSelector
: Selector to wait for in browser mode (optional, default is 'body')
waitForTimeout
: Timeout to wait in browser mode in milliseconds (optional, default is 5000)
scrollToBottom
: Whether to scroll to the bottom of the page in browser mode (optional, default is false)
saveCookies
: Whether to save cookies in browser mode (optional, default is true)
closeBrowser
: Whether to close the browser instance (optional, default is false)
extractContent
: Whether to use the Readability algorithm to extract main content (optional, default false)
includeMetadata
: Whether to include metadata in the extracted content (optional, default false, only works when extractContent
is true)
fallbackToOriginal
: Whether to fall back to the original content when extraction fails (optional, default true, only works when extractContent
is true)
Debug Parameters
debug
: Whether to enable debug output (optional, default false)
Use the content extraction feature to get the core content of a webpage, filtering out navigation bars, advertisements, sidebars, and other distracting elements:
{
"url": "https://example.com/article",
"extractContent": true,
"includeMetadata": true
}
The extracted content will include the following metadata (if available):
- Title
- Byline (author)
- Site name
- Excerpt
- Content length
- Readability flag (isReaderable)
Special Usage
To extract only the meaningful content from an article webpage:
{
"url": "https://example.com/news/article",
"extractContent": true,
"includeMetadata": true
}
For websites where content extraction might fail, you can use fallbackToOriginal
to ensure you get some content:
{
"url": "https://example.com/complex-layout",
"extractContent": true,
"fallbackToOriginal": true
}
Closing Browser Without Fetching
To close the browser instance without performing any fetch operation:
{
"url": "about:blank",
"closeBrowser": true
}
Proxy Priority
The proxy is determined in the following order:
- Command line specified proxy
proxy
parameter in the request
- Environment variables (if
useSystemProxy
is true)
- Git configuration (if
useSystemProxy
is true)
If proxy
is set, useSystemProxy
will be automatically set to false.
Debug Output
When debug: true
is set, logs will be output to stderr with the following prefixes:
[MCP-SERVER]
: Logs from the MCP server
[NODE-FETCH]
: Logs from the Node.js fetcher
[BROWSER-FETCH]
: Logs from the browser fetcher
[CLIENT]
: Logs from the client
[TOOLS]
: Logs from the tool implementation
[FETCHER]
: Logs from the main fetcher interface
[CONTENT]
: Logs related to content handling
[CONTENT-PROCESSOR]
: Logs from the HTML content processor
[CONTENT-SIZE]
: Logs related to content size management
[CHUNK-MANAGER]
: Logs related to content chunking operations
[ERROR-HANDLER]
: Logs related to error handling
[BROWSER-MANAGER]
: Logs from the browser instance manager
[CONTENT-EXTRACTOR]
: Logs from the content extractor
License
MIT
Updated by lmcc-dev