@xapp/arachne-cli
Example Usage
To crawl a site and save the pages to a local ./temp directory
node lib crawl http://www.thecoffeefaq.com/ -d ./temp
To also save markdown and schema.org FAQs
node lib crawl http://www.thecoffeefaq.com/ -a -t -d ./temp
With a whitelisted patterns file
node lib crawl http://www.thecoffeefaq.com/ -a -t -d ./temp -w ./temp/whitelist.md
Windows & WSL2 Notes
Follow the instructions here to setup: https://github.com/puppeteer/puppeteer/issues/1837#issuecomment-689006806
You will need to start XLaunch before running the CLI, select multiple windows, no client, turn off access control.
If the normal commands don't work, you might need to pass in the executablePath (-e) and run headless (-h).
node lib crawl http://www.thecoffeefaq.com/ -e /usr/bin/google-chrome -h