Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

vore

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

vore

  • 0.5.0
  • Rubygems
  • Socket score

Version published
Maintainers
1
Created
Source

Vore

Vore, by LewdBacon

Vore quickly crawls websites and spits out text sans tags. It's written in Ruby and powered by Rust.

Installation

Install the gem and add to the application's Gemfile by executing:

$ bundle add vore

If bundler is not being used to manage dependencies, install the gem by executing:

$ gem install vore

Usage

crawler = Vore::Crawler.new
crawler.scrape_each_page("https://choosealicense.com") do |page|
  puts page
end

Each page is a simple class consisting of the following values:

  • content: the text of the HTML document, sans tags
  • title: the title of the HTML document (if any)
  • meta: the document's meta tags (if any)
  • path: the document's path

The scraping is managed by spider-rs, so you know it's fast.

Configuration

NameDescriptionDefault
delayA value (in milliseconds) which introduces an artifical delay when crawling. Useful for situations where there's rate limiting involved.0
output_dirWhere the resulting HTML files are stored."tmp/vore"
delete_after_yieldWhether the downloaded HTML files are deleted after the yield block finishes.true
log_levelThe logging level.:warn

Processing pages

Vore processes HTML using handlers. By default, there are two:

  • The MetaExtractor, which extracts information from your title and meta tags
  • The TagRemover, which removes unnecessary elements like header, footer, script

If you wish to process the HTML further, you can provide your own handler:

Vore::Crawler.new(handlers: [MySpecialHandler.new])

Handlers are defined using Selma. Note that the MetaExtractor is always included and defined first, but if you pass in anything to the handler array, it'll overwrite Vore's other default handlers. You can of course choose to include them manually:

# preserve Vore's default content handler while adding your own;
# `MetaExtractor` is prefixed to the front
Vore::Crawler.new(handlers: [Vore::Handlers::TagRemover.new, MySpecialHandler.new])

In tests

Since the actual HTTP calls occur in a separate process, Vore will not integrate with libraries like VCR or Webmock by default. You'll need to require "vore/minitest_helper" to get a function that emulates the HTTP GET requests in a way Ruby can interpret.

Based on your needs, you can overwrite any of the existing methods to suit your application's needs. For example, if you prefer HTML to be generated by Faker, you can create and require a file that looks like the following:


require "vore/minitest_helper"

module Vore
  module TestHelperExtension
    DOCUMENT_TITLES = [
      "Hello, I need help",
      "I need to update my payment information",
    ]
    DOCUMENT_CONTENT = [
      "Hey, I'm having trouble with my computer. Can you help me?",
      # v--- always creates three page chunks
      "I need to update my payment information. Like, now. Right now. Now. Can you help me? Please? Now?" + "Can you help me? Please? Now?" * 100,
    ]

    def content
      @counter = -1 unless defined?(@counter)
      @counter += 1

      html = "<!DOCTYPE html><html><head><title>#{DOCUMENT_TITLES[@counter]}</title>"

      meta_tag_count.times do # arbitrarily set to 5
        html += "<meta name=\"#{Faker::Lorem.word}\" content=\"#{Faker::Lorem.word}\" />"
      end

      html += "</head><body>"

      html += "<p>#{DOCUMENT_CONTENT[@counter]}</p>"

      html += "</body></html>"

      html
    end
  end

  Vore::TestHelper.prepend(Vore::TestHelperExtension)
end

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake test to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and the created tag, and push the .gem file to rubygems.org.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/gjtorikian/vore.

License

The gem is available as open source under the terms of the MIT License.

FAQs

Package last updated on 29 Jul 2024

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc