A library for efficiently walking a directory recursively
Search for anything on web.
This repository contains a list of of HTTP user-agents used by robots, crawlers, and spiders as in single JSON file.
Utilities to build Storybook crawling tools with Puppeteer
HTTP request module customized for crawlers.
Very straightforward, event driven web crawler. Features a flexible queue interface and a basic cache mechanism with extensible backend.
Pure javascript cross-platform module to extract text from PDFs.
A library to recursively retrieve and serialize Notion pages with customization for machine learning applications.
pure nodejs OPCUA SDK - module client-crawler (deprecated - use @sterfive/crawler module instead)
Stealth mode: Applies various techniques to make detection of headless puppeteer harder.
[](https://www.npmjs.com/package/recrawl-sync) [](https://github.com/aleclarson/recrawl/actions/workflows/release.yml)
TypeScript definitions for simplecrawler
This is an ES6 adaptation of the original PHP library CrawlerDetect, this library will help you detect bots/crawlers/spiders vie the useragent.
JavaScript SDK for Firecrawl API
A web crawler that works with prember to discover URLs in your app
Apify API client for JavaScript
A set of shared utilities that can be used by crawlers
The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
Parser for XML Sitemaps to be used with Robots.txt and web crawlers
The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
Templates for the crawlee projects
The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
TypeScript definitions for crawler
Parse robot directives within HTML meta and/or HTTP headers.
Used to run a web crawler that checks for errors on specified pages.
TypeScript definitions for x-ray-crawler
JavaScript module detecting bots/crawlers/spiders via user-agent
Promptbook: Run AI apps in plain human language across multiple models and platforms
Node.js Hydra web crawler
Automatically extracts structured information from webpages
A CLI tool to crawl websites, clone Git repos, or scan local directories and consolidate content into a single Markdown file.
MCP server for Firecrawl web scraping integration. Supports both cloud and self-hosted instances. Features include web scraping, batch processing, structured data extraction, and LLM-powered content analysis.
Stop website fingerprinting techniques playwright edition
Crawler (spider) of site web pages by domain name
SDK to interact with the web-crawler service