Krawler
Simple little command-line web crawler. Use it to find 404's or 500's on your site.
I use it to warm caches. Multi-threaded enabled for faster crawling.
Installation
Install:
gem install krawler
Usage
From the command line:
$ krawl http://localhost:3000/
Options:
-e, --exclude regex Exclude matching paths
-s, --sub-restrict Restrict to sub paths of base url
-c, --concurrent count Crawl with count number of concurrent connections
-r, --randomize Randomize crawl path
Examples:
Restrict crawling to sub-paths of /public
$ krawl http://localhost:3000/public -s
Restrict crawling to paths that do not match /^\/api\//
$ krawl http://localhost:3000/ -e "^\/api\/"
Crawl with 4 current threaded crawlers. Make sure your server is capable of handling
concurrent requests.
$ krawl http://production.server -c 4
Randomize the crawl path. Helpful when you have a lot of links and get bored watching
the same crawl path over and over.
$ krawl http://localhost:3000/ -r
Contributing
- Fork it
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Added some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create new Pull Request