Crawler is a web spider written with Nodejs. It gives you the full power of jQuery on the server to parse a big number of pages as they are downloaded, asynchronously
A library for efficiently walking a directory recursively
Search for anything on web.
This repository contains a list of of HTTP user-agents used by robots, crawlers, and spiders as in single JSON file.
Utilities to build Storybook crawling tools with Puppeteer
HTTP request module customized for crawlers.
pure nodejs OPCUA SDK - module client-crawler
Very straightforward, event driven web crawler. Features a flexible queue interface and a basic cache mechanism with extensible backend.
Stealth mode: Applies various techniques to make detection of headless puppeteer harder.
Pure javascript cross-platform module to extract text from PDFs.
An `URL` parser for crawling purpose.
TypeScript definitions for crawler
[](https://www.npmjs.com/package/recrawl-sync) [](https://github.com/aleclarson/recrawl/actions/workflows/release.yml)
This is an ES6 adaptation of the original PHP library CrawlerDetect, this library will help you detect bots/crawlers/spiders vie the useragent.
A web crawler that works with prember to discover URLs in your app
Opensource Framework Crawler in Node.js
A library to recursively retrieve and serialize Notion pages with customization for machine learning applications.
Used to run a web crawler that checks for errors on specified pages.
A set of shared utilities that can be used by crawlers
Apify API client for JavaScript
The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
TypeScript definitions for npm-license-crawler
The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
Templates for the crawlee projects
SDK to interact with the web-crawler service
A web crawler made for the SEO based on plugins. Please wait or contribute ... still in beta
TypeScript definitions for x-ray-crawler
A library to test if a url(request) is crawled, usually used in a web crawler. Compatible with `request` and `node-crawler`
The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
Crawler4nodejs is an open source web crawler for Node.js which provides a simple interface for crawling the Web.
Parser for XML Sitemaps to be used with Robots.txt and web crawlers
Priority based Semantic Web Crawler.
Automatically extracts structured information from webpages