
Security News
The Hidden Blast Radius of the Axios Compromise
The Axios compromise shows how time-dependent dependency resolution makes exposure harder to detect and contain.
@pdftron/web-crawler
Advanced tools
A simple NodeJS web crawler that actually executes JS!
BETA - Not for production use
const c = new Crawler(options);
options <[Object]>
debug <[boolean]> Whether to display logs during execution. Defaults to falsemaxConnections <[number]> Number of simultanious connections that can be open. Defaults to 10Adds a URl to the fetch queue
url <[string]> URL to start crawling atStarts processing the queue
A function that determines if a URL should be fetched or not.
callback <[Function(string)]> function that determines if a url is fetched. Is passed the URL to be fetched. Must return true or false. If true is returned, the URL will be fetched.Attach an event listener to the instance.
key <[string]> type of event listener to attach. Can be one of:
done Called when the process is done. callback is passed an array of urls found.fetched Called when a page is fully fetched. callback is passed an object with html and url. This is the only way to get the HTML from a page using the crawler.foundURL Called when a new URL is found and added to the queue. callback is passed the URL and the page the URL was found on.loadError Called when a page can not be fetched. Parameters are the url that cant be fetched, the page the url was found on, and the status codeconst Crawler = require('@pdftron/web-crawler');
const c = new Crawler({ debug: false });
c.queue('https://www.pdftron.com/documentation');
c.shouldFetch((url) => {
return url.indexOf('/documentation') > -1 && url.indexOf('web/guides') > -1;
})
c.on('foundURL', (url, foundOn) => {
console.log(`${url} was found on ${foundOn}`);
})
c.on('done', (data) => {
console.log(data);
})
c.on('fetched', ({ url, html }) => {
console.log(url, html);
})
c.start();
git clone https://github.com/XodoDocs/web-crawler.git
cd web-crawler
npm i
npm run test
FAQs
Web crawling using Puppeteer
The npm package @pdftron/web-crawler receives a total of 3 weekly downloads. As such, @pdftron/web-crawler popularity was classified as not popular.
We found that @pdftron/web-crawler demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
The Axios compromise shows how time-dependent dependency resolution makes exposure harder to detect and contain.

Research
A supply chain attack on Axios introduced a malicious dependency, plain-crypto-js@4.2.1, published minutes earlier and absent from the project’s GitHub releases.

Research
Malicious versions of the Telnyx Python SDK on PyPI delivered credential-stealing malware via a multi-stage supply chain attack.