Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

kudzu

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

kudzu

  • 1.3.3
  • Rubygems
  • Socket score

Version published
Maintainers
1
Created
Source

Kudzu

A simple web crawler for ruby.

Features

  • Run single-thread or multi-thread.
  • Pool HTTP connection.
  • Restrict links by url-based patterns.
  • Respect robots.txt.
  • Store page contents via adapter.

Dependencies

  • ruby 2.5+
  • libicu

Installation

Add to your application's Gemfile:

gem 'kudzu'

Then run:

$ bundle install

Usage

Crawl html files in example.com:

crawler = Kudzu::Crawler.new do
  user_agent 'YOUR_AWESOME_APP'
  add_filter do
    focus_host true
    allow_mime_type %w(text/html)
  end
end
crawler.run('http://example.com/') do
  on_success do |page, link|
    puts page.url
  end
end

Adapters

This gem supports only in-memory crawling by default. Use following adapter to save page contents persistently:

  • kudzu-adapter-active_record

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/kanety/kudzu. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.

License

The gem is available as open source under the terms of the MIT License.

FAQs

Package last updated on 17 Oct 2024

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc