Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

probot

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

probot

  • 0.4.0
  • Rubygems
  • Socket score

Version published
Maintainers
1
Created
Source

Probot

OMG another Ruby Robot.txt parser? It was an accident, I didn't mean to make it and I shouldn't have but here we are. It started out tiny and grew. Yes I should have used one of the other gems.

Does this even deserve a gem? Feel free to just copy and paste the single file which implements this - one less dependency eh?

On the plus side of this yak shaving, there are some nice features I don't think the others have.

  1. Support for consecutive user agents making up a single record:
User-agent: first-agent
User-agent: second-agent
Disallow: /

This record blocks both first-agent and second-agent from the site.

  1. It selects the most specific allow / disallow rule, using rule length as a proxy for specificity. You can also ask it to show you the matching rules and their scores.
txt = %Q{
User-agent: *
Disallow: /dir1
Allow: /dir1/dir2
Disallow: /dir1/dir2/dir3
}
Probot.new(txt).matches("/dir1/dir2/dir3")
=> {:disallowed=>{/\/dir1/=>5, /\/dir1\/dir2\/dir3/=>15}, :allowed=>{/\/dir1\/dir2/=>10}}

In this case, we can see the Disallow rule with length 15 would be followed.

  1. It sets the User-Agent string when fetching robots.txt

Installation

Install the gem and add to the application's Gemfile by executing:

$ bundle add probot

If bundler is not being used to manage dependencies, install the gem by executing:

$ gem install probot

Usage

It's straightforward to use. Instantiate it if you'll make a few requests:

> r = Probot.new('https://booko.info', agent: 'BookScraper')
> r.rules
=>  {"*"=>{"disallow"=>[/\/search/, /\/products\/search/, /\/.*\/refresh_prices/, /\/.*\/add_to_cart/, /\/.*\/get_prices/, /\/lists\/add/, /\/.*\/add$/, /\/api\//, /\/users\/bits/, /\/users\/create/, /\/prices\//, /\/widgets\/issue/], "allow"=>[], "crawl_delay"=>0, "crawl-delay"=>0.1},
 "YandexBot"=>{"disallow"=>[], "allow"=>[], "crawl_delay"=>0, "crawl-delay"=>300.0}}

> r.allowed?("/abc/refresh_prices")
=> false
> r.allowed?("https://booko.info/9780765397522/All-Systems-Red")
=> true
> r.allowed?("https://booko.info/9780765397522/refresh_prices")
=> false

Or just one-shot it for one-offs:

Probot.allowed?("https://booko.info/9780765397522/All-Systems-Red", agent: "BookScraper")

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake test to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and the created tag, and push the .gem file to rubygems.org.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/[USERNAME]/Probot.

Further Reading

License

The gem is available as open source under the terms of the MIT License.

FAQs

Package last updated on 30 Oct 2024

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc