
Security News
Attackers Are Hunting High-Impact Node.js Maintainers in a Coordinated Social Engineering Campaign
Multiple high-impact npm maintainers confirm they have been targeted in the same social engineering campaign that compromised Axios.
@flyrank/flyscrape
Advanced tools
A powerful, modular web scraping and crawling library for Node.js, inspired by crawl4ai. Features stealth mode, LLM extraction, and markdown processing.

FlyScrape is a Node.js package, based on top of Crawl4AI, that makes it easy to integrate powerful scrapers and crawlers directly into your web applications. Designed for the modern web, it provides modular, production-ready tools to extract clean, structured data, ready for RAG pipelines, AI agents, or advanced analytics.
Whether you’re building a content aggregator, an AI agent, or a complex data pipeline, FlyScrape simplifies web crawling and scraping while giving you maximum flexibility and performance.
npm install @flyrank/flyscrape
# or
yarn add @flyrank/flyscrape
# or
pnpm add @flyrank/flyscrape
import { AsyncWebCrawler } from "@flyrank/flyscrape";
async function main() {
const crawler = new AsyncWebCrawler();
await crawler.start();
// Crawl a URL and get clean Markdown
const result = await crawler.arun("https://example.com");
if (result.success) {
console.log(result.markdown);
}
await crawler.close();
}
main();
Extract only the main article content, removing all UI clutter.
const result = await crawler.arun("https://blog.example.com/guide", {
contentOnly: true,
excludeMedia: true, // Remove images/videos
});
srcset and picture.raw:) or local files (file://).FlyScrape includes a provider-agnostic API service that registers providers from environment variables and exposes REST endpoints for n8n and other workflow tools.
API_PROVIDER_<NAME>_ENDPOINTAPI_PROVIDER_<NAME>_AUTH_TYPE (api_key, oauth, basic, none)API_PROVIDER_<NAME>_API_KEYAPI_PROVIDER_<NAME>_API_KEY_HEADERAPI_PROVIDER_<NAME>_API_KEY_PREFIXAPI_PROVIDER_<NAME>_OAUTH_TOKENAPI_PROVIDER_<NAME>_OAUTH_HEADERAPI_PROVIDER_<NAME>_USERNAMEAPI_PROVIDER_<NAME>_PASSWORDAPI_PROVIDER_<NAME>_RATE_LIMITAPI_PROVIDER_<NAME>_RATE_WINDOW_MSAPI_PROVIDER_<NAME>_LOG_LEVELAPI_PROVIDER_<NAME>_HEALTH_ENDPOINTAPI_PROVIDER_<NAME>_TIMEOUT_MSAPI_SERVICE_PORTAPI_SERVICE_BASE_PATHAPI_SERVICE_LOG_LEVELAPI_SERVICE_MAX_BODY_BYTESGET /healthGET /v1/providersGET /v1/providers/:nameGET /v1/providers/:name/healthPOST /v1/providers/:name/request{
"success": true,
"requestId": "uuid",
"data": {}
}
{
"success": false,
"requestId": "uuid",
"error": {
"code": "ERROR_CODE",
"message": "Human readable message",
"details": {}
}
}
API_PROVIDER_OPENAI_ENDPOINT=https://api.openai.com/v1
API_PROVIDER_OPENAI_AUTH_TYPE=api_key
API_PROVIDER_OPENAI_API_KEY=sk-...
API_PROVIDER_OPENAI_API_KEY_HEADER=Authorization
API_PROVIDER_OPENAI_API_KEY_PREFIX=Bearer
API_PROVIDER_OPENAI_RATE_LIMIT=120
API_PROVIDER_OPENAI_RATE_WINDOW_MS=60000
API_PROVIDER_OPENAI_LOG_LEVEL=info
API_PROVIDER_OPENAI_HEALTH_ENDPOINT=https://api.openai.com/v1/models
API_SERVICE_PORT=3000
API_SERVICE_BASE_PATH=/v1
API_SERVICE_LOG_LEVEL=info
curl -X POST http://localhost:3000/v1/providers/openai/request \
-H "Content-Type: application/json" \
-d '{
"method": "POST",
"path": "/chat/completions",
"body": {
"model": "gpt-4o-mini",
"messages": [{ "role": "user", "content": "Hello" }]
}
}'
bun run api-service
docker build -t flyscrape-api .
docker run --env-file .env -p 3000:3000 flyscrape-api
Keep your session alive across multiple requests to look like a real user and avoid being blocked.
const sessionId = 'my-session-1';
// First request: Creates session, saves cookies/local storage
await crawler.arun("https://example.com/login", {
session_id: sessionId
});
// Second request: Reuses the same session (cookies are preserved!)
await crawler.arun("https://example.com/dashboard", {
session_id: sessionId
});
// Clean up when done
await crawler.closeSession(sessionId);
Use impit under the hood to mimic real browser TLS fingerprints without the overhead of a full browser.
// Fast mode (no browser, but stealthy TLS fingerprint)
const result = await crawler.arun("https://example.com", {
jsExecution: false // Disables Playwright, enables impit
});
Enable advanced anti-detection features to bypass WAFs and bot detection systems.
const crawler = new AsyncWebCrawler({
stealth: true, // Enable stealth mode
headless: true,
});
await crawler.start();
Need full control? Provide a customTransformer to define exactly how HTML maps to Markdown.
const result = await crawler.arun("https://example.com", {
processing: {
markdown: {
customTransformer: (html) => {
// Your custom logic here
return myCustomConverter(html);
}
}
}
});
Handle modern SPAs with ease using built-in scrolling and wait strategies.
const result = await crawler.arun("https://infinite-scroll.com", {
autoScroll: true, // Automatically scroll to bottom
waitMode: 'networkidle', // Wait for network to settle
});
Inject custom logic at key stages of the crawling process.
const result = await crawler.arun("https://example.com", {
hooks: {
onPageCreated: async (page) => {
// Set cookies or modify environment
await page.context().addCookies([...]);
},
onLoad: async (page) => {
// Interact with the page
await page.click('#accept-cookies');
}
}
});
Process raw HTML or local files directly without a web server.
// Raw HTML
await crawler.arun("raw:<html><body><h1>Hello</h1></body></html>");
// Local File
await crawler.arun("file:///path/to/local/file.html");
Define a schema and let the LLM do the work.
const schema = {
type: "object",
properties: {
title: { type: "string" },
price: { type: "number" },
features: { type: "array", items: { type: "string" } }
}
};
const result = await crawler.arun("https://store.example.com/product/123", {
extraction: {
type: "llm",
schema: schema,
provider: myOpenAIProvider // Your LLM provider instance
}
});
Crawl all pages listed in a sitemap (or sitemap index) in one call. Sitemaps are fetched over HTTP with timeouts and redirect limits; optional category counts (e.g. products, pages, blogs, collections) are supported.
import {
AsyncWebCrawler,
fetchSitemapUrls,
getSitemapIndexCategories,
} from "@flyrank/flyscrape";
// Option A: Crawl all pages from a sitemap
const crawler = new AsyncWebCrawler();
await crawler.start();
const results = await crawler.crawlFromSitemap(
"https://www.flyrank.com/sitemap.xml",
{ jsExecution: false }, // fast fetch-only mode
{ maxUrls: 1000, timeout: 10_000 }
);
await crawler.close();
// Option B: Get only the list of URLs from the sitemap
const urls = await fetchSitemapUrls("https://www.flyrank.com/sitemap.xml", {
sameOriginOnly: true,
maxUrls: 500,
});
// Option C: Get categorized counts (e.g. products (6), pages (6), blogs (12))
const { categories, totalSitemaps } = await getSitemapIndexCategories(
"https://www.flyrank.com/sitemap.xml"
);
for (const [name, info] of Object.entries(categories)) {
console.log(`${name} (${info.count})`);
}
// Option D: Crawl and get category breakdown in one call
const out = await crawler.crawlFromSitemap(
"https://www.flyrank.com/sitemap.xml",
{ jsExecution: false },
{ includeSitemapCategories: true }
);
if (!Array.isArray(out)) {
console.log("Categories:", out.sitemapCategories.categories);
// out.results = crawl results
}
We welcome contributions! Please see our Contribution Guidelines for details on how to get started.
This project is licensed under the MIT License.
FAQs
A powerful, modular web scraping and crawling library for Node.js, inspired by crawl4ai. Features stealth mode, LLM extraction, and markdown processing.
We found that @flyrank/flyscrape demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
Multiple high-impact npm maintainers confirm they have been targeted in the same social engineering campaign that compromised Axios.

Security News
Axios compromise traced to social engineering, showing how attacks on maintainers can bypass controls and expose the broader software supply chain.

Security News
Node.js has paused its bug bounty program after funding ended, removing payouts for vulnerability reports but keeping its security process unchanged.