scoopi
CLI tool to scoop documentation websites and convert them to local Markdown files for LLM consumption.
Installation
npm install
npm run setup
npm install -g .
Requirements
- Node.js 18+
- Chrome browser (automatically installed via Puppeteer)
Usage
Basic snipping
scoopi https://docs.example.com
Advanced options
scoopi https://docs.example.com --depth 2 --output ./my-docs
scoopi https://docs.example.com --include "**/api/**,**/guide/**" --exclude "**/legacy/**"
scoopi https://docs.example.com --delay 2000
scoopi https://docs.example.com --verbose
Configuration
scoopi config --show
scoopi config --reset
Options
--depth, -d <number>: Maximum scooping depth (default: 3)
--output, -o <path>: Output directory (default: ./docs)
--include <patterns>: URL patterns to include (comma-separated)
--exclude <patterns>: URL patterns to exclude (comma-separated)
--delay <ms>: Delay between requests in milliseconds (default: 1000)
--verbose: Enable verbose logging
Features
-
🥄 Smart scooping: Automatically detects and follows documentation links
-
📝 Clean conversion: Converts HTML to clean, readable Markdown
-
🗂️ Organized output: Creates hierarchical directory structure based on URLs
-
🎯 Pattern matching: Include/exclude URLs using glob patterns
-
⚡ Performance: Configurable delays and depth limits
-
🔍 Content filtering: Removes navigation, ads, and other non-content elements
-
📊 Progress tracking: Real-time progress indicators and detailed logging
Development
npm test
npm run dev
npm run lint
License
MIT