
Security News
Attackers Are Hunting High-Impact Node.js Maintainers in a Coordinated Social Engineering Campaign
Multiple high-impact npm maintainers confirm they have been targeted in the same social engineering campaign that compromised Axios.
crawl-cli-tool
Advanced tools
A CLI tool for web crawling with auto-discovery, recursive crawling, and markdown output
A CLI tool for web crawling with auto-discovery, recursive crawling, and markdown output. Optimized to be used as a Claude Skill.
llms.txt, sitemap.xml, or robots.txtnpm install -g crawl-cli-tool
After installation, Playwright's Chromium browser will be automatically installed.
| Task | Command |
|---|---|
| Crawl single page | crawl-cli <url> |
| Recursive crawl (depth 2) | crawl-cli <url> -d 2 |
| Save to single file | crawl-cli <url> -o output.md |
| Each URL as separate file | crawl-cli <url> -d 2 -O ./docs/ |
| JSON format | crawl-cli <url> -f json |
| Discover files | crawl-cli discover <url> |
# Crawl a single page
crawl-cli https://example.com
# Crawl with auto-discovery (default for single page)
crawl-cli https://docs.anthropic.com
# Crawl with depth 3
crawl-cli https://example.com -d 3
# Limit to 50 pages
crawl-cli https://example.com -d 3 -m 50
# Markdown (default)
crawl-cli https://example.com -f md
# JSON output
crawl-cli https://example.com -f json
# HTML output (styled, ready for browser)
crawl-cli https://example.com -f html -o page.html
# Plain text (stripped formatting)
crawl-cli https://example.com -f txt
-o)All crawled pages combined into one file:
# Single page to file
crawl-cli https://example.com/page -o ./articles/page.md
# Multiple pages (depth 2) to one file
crawl-cli https://docs.example.com -d 2 -o ./all-docs.md
# Append to existing file
crawl-cli https://another.com -o ./all-docs.md --append
-O)Each crawled URL becomes its own file:
# Crawl docs site, each page as separate file
crawl-cli https://docs.anthropic.com -d 2 -O ./anthropic-docs/
# Result:
# anthropic-docs/
# ├── _index.md # Auto-generated index with links
# ├── docs_anthropic_com.md
# ├── getting-started.md
# ├── api_reference.md
# └── guides_setup.md
# Same in JSON format
crawl-cli https://docs.example.com -d 2 -O ./docs/ -f json
# Skip index file generation
crawl-cli https://docs.example.com -d 2 -O ./docs/ --no-index
# See all files created
crawl-cli https://docs.example.com -d 2 -O ./docs/ -v
# Print to console
crawl-cli https://example.com
# Quiet mode for piping (no spinner/progress)
crawl-cli https://example.com -q -f json | jq '.[] | .title'
# Just discover available files (llms.txt, sitemap, etc.)
crawl-cli discover https://example.com
# Force auto-discovery
crawl-cli https://example.com --discover
# Disable auto-discovery
crawl-cli https://example.com --no-discover
# Increase concurrent requests
crawl-cli https://example.com -c 10
# Set page timeout (ms)
crawl-cli https://example.com -t 60000
# Verbose output
crawl-cli https://example.com -v
| Option | Description | Default |
|---|---|---|
-d, --depth <n> | Maximum crawl depth | 1 |
-m, --max-pages <n> | Maximum pages to crawl | 100 |
-c, --concurrent <n> | Concurrent requests | 5 |
-o, --output <file> | Output to single file (combined) | stdout |
-O, --output-dir <folder> | Output to directory (each URL separate) | - |
-f, --format <type> | Output format: md, json, html, txt | md |
--json | Shorthand for -f json | - |
--append | Append to file instead of overwrite | false |
--no-index | Skip index file for directory output | false |
--discover | Force auto-discovery | auto |
--no-discover | Disable auto-discovery | - |
-t, --timeout <ms> | Page timeout | 30000 |
-v, --verbose | Verbose output | false |
-q, --quiet | Minimal output (good for piping) | false |
When crawling, the tool checks for these files in order:
llms.txt - AI assistant instructions filellms-full.txt - Full llms.txt variantsitemap.xml - Site structure maprobots.txt - Crawling rules.well-known/ai.txt - Well-known AI file.well-known/llms.txt - Well-known llms variant.well-known/sitemap.xml - Well-known sitemapimport { crawl, discover, crawlSingle } from 'crawl-cli-tool';
// Full crawl with options
const results = await crawl('https://example.com', {
maxDepth: 2,
maxPages: 50,
autoDiscover: true,
});
// Single page
const result = await crawlSingle('https://example.com/page');
// Just discovery
const discovered = await discover('https://example.com');
"Chromium not found"
npx playwright install chromium
"Timeout waiting for page"
crawl-cli <url> -t 60000 # 60 second timeout
"Too many pages"
crawl-cli <url> -m 20 -d 1 # Limit to 20 pages, depth 1
MIT
FAQs
A CLI tool for web crawling with auto-discovery, recursive crawling, and markdown output
We found that crawl-cli-tool demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
Multiple high-impact npm maintainers confirm they have been targeted in the same social engineering campaign that compromised Axios.

Security News
Axios compromise traced to social engineering, showing how attacks on maintainers can bypass controls and expose the broader software supply chain.

Security News
Node.js has paused its bug bounty program after funding ended, removing payouts for vulnerability reports but keeping its security process unchanged.