Security News
Fluent Assertions Faces Backlash After Abandoning Open Source Licensing
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
@algolia/404-crawler
Advanced tools
A command line interface to crawl and detect 404 pages from sitemap.
Make sure npm is installed in your computer. To know more about it, visit https://docs.npmjs.com/downloading-and-installing-node-js-and-npm
In a terminal, run
npm install -g @algolia/404-crawler
After that, you'll be able to use the command 404crawler
in your terminal
Crawl and detect every 404 pages from algolia website's sitemap:
404crawler crawl -u https://algolia.com/sitemap.xml
Use JavaScript rendering to crawl and identify all 404 or 'Not Found' pages on the Algolia website.
404crawler crawl -u https://algolia.com/sitemap.xml --render-js
Crawl and identify all 404 pages on the Algolia website by analyzing its sitemap, including all potential sub-path variations
404crawler crawl -u https://algolia.com/sitemap.xml --include-variations
--sitemap-url
or -u
:
Required URL of the sitemap.xml
file.
--render-js
or -r
:
Use JavaScript rendering to crawl and identify a 'Not Found Page' if the status code isn't a 404. This option is useful for websites that returns a 200 status code even if the page is not found (Next.js with custom not found page for example)
--output
or -o
:
Ouput path for the JSON file of the results. Example: crawler/results.json
. If not set, no file is written after the crawl.
--include-variations
or -v
:
Include all sub-path variations from URLs found in the sitemap.xml
.
For example, if https://algolia.com/foo/bar/baz is found in the sitemap, the crawler will test https://algolia.com/foo/bar/baz, https://algolia.com/foo/bar, https://algolia.com/foo and https://algolia.com
--exit-on-detection
or -e
:
Exit when a 404 or a 'Not Found' page is detected.
--run-in-parallel
or -p
:
Run the crawler with multiple pages in parallel. By default, the number of parallel instances is set to 10. See --batch-size
option to configure this number.
--batch-size
or -s
:
Number or parallel instances of crawler to run: the more this number is, the more resources are consumed. Only available when --run-in-parallel
option is set. If not set, default to 10.
--browser-type
or -b
:
Type of the browser to use to crawl pages. Can be 'firefox', 'chromium' or 'webkit'. If not set, default to 'firefox'.
This CLI is built with TypeScript and uses ts-node
to run the code locally.
Install all dependencies
pnpm i
pnpm 404crawler crawl <options>
Update package.json
version
Commit and push changes
Build JS files in dist/
with
pnpm build
Initialize npm with Algolia org as scope
npm init --scope=algolia
Follow instructions
Publish package with
npm publish
This package uses:
FAQs
Detect 404/Not found pages from sitemap
The npm package @algolia/404-crawler receives a total of 5 weekly downloads. As such, @algolia/404-crawler popularity was classified as not popular.
We found that @algolia/404-crawler demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 85 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
Research
Security News
Socket researchers uncover the risks of a malicious Python package targeting Discord developers.
Security News
The UK is proposing a bold ban on ransomware payments by public entities to disrupt cybercrime, protect critical services, and lead global cybersecurity efforts.