Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

scraper_rb

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

scraper_rb

  • 0.1.2
  • Rubygems
  • Socket score

Version published
Maintainers
1
Created
Source

Ruby Gem Version Build Status

Prompt API - Scraper API - Ruby Package

scraper_rb is a simple python wrapper for scraper-api.

Requirements

  1. You need to signup for Prompt API
  2. You need to subscribe scraper-api, test drive is free!!!
  3. You need to set PROMPTAPI_TOKEN environment variable after subscription.

then;

$ gem install scraper_rb

or; install from GitHub:

$ gem install scraper_rb --version "0.1.2" --source "https://rubygems.pkg.github.com/promptapi"

Example Usage

Basic scraper:

require "scraper_rb"

s = ScraperRb.new('https://pypi.org/classifiers/') # no params
s.get
s.response
# {
#     :headers=>{:"Content-Length"=>...}, 
#     :url=>"https://pypi.org/classifiers/",
#     :data=>"<!DOCTYPE html>\n<html> ...",
# }

s.response[:headers]     # => return response headers
s.response[:data]        # => return scraped html
s.save('/tmp/data.html') # => {:file=>"/tmp/data.html", :size=>321322}

# or

save_result = s.save('/tmp/data.html')
puts save_result[:error] if save_result.key?(:error) # we have a file error

You can add url parameters for extra operations. Valid parameters are:

  • auth_password: for HTTP Realm auth password
  • auth_username: for HTTP Realm auth username
  • cookie: URL Encoded cookie header.
  • country: 2 character country code. If you wish to scrape from an IP address of a specific country.
  • referer: HTTP referer header
  • selector: CSS style selector path such as a.btn div li. If selector is enabled, returning result will be collection of data and saved file will be in .json format.

Here is an example with using url parameters and selector:

require "scraper_rb"

params = {country: 'EE', selector: 'ul li button[data-clipboard-text]'}
s = ScraperRb.new('https://pypi.org/classifiers/', params)
s.get
s.response[:headers]       # => return response headers
s.response[:data]          # => return an array, collection of given selector
s.response[:data].length   # => 734 
s.save('/tmp/test.json')   # => {:file=>"/tmp/test.json", :size=>174449}

# or

save_result = s.save('/tmp/test.json')
puts save_result[:error] if save_result.key?(:error) # we have a file error

Default timeout value is set to 10 seconds. You can change this while initializing the instance:

s = ScraperRb.new('https://pypi.org/classifiers/', params={}, timeout=50) 
# => 50 seconds timeout w/o params

s = ScraperRb.new('https://pypi.org/classifiers/', params={country: 'EE'}, timeout=50) 
# => 50 seconds timeout

You can add extra X- headers:

s = ScraperRb.new('https://pypi.org/classifiers/', headers={'X-Referer': 'https://www.google.com'}) 

# or
s = ScraperRb.new('https://pypi.org/classifiers/', params={country: 'EE'}, headers={'X-Referer': 'https://www.google.com'}, timeout=50) 
# => 50 seconds timeout

headers param is a Hash, you can add key/value data. Header keys must star with X- prefix. More detail can found at Mozilla site.


Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake test to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org

$ rake -T

rake build            # Build bin_checker_rb-X.X.X.gem into the pkg directory
rake clean            # Remove any temporary products
rake clobber          # Remove any generated files
rake install          # Build and install bin_checker_rb-X.X.X.gem into system gems
rake install:local    # Build and install bin_checker_rb-X.X.X.gem into system gems without network access
rake release[remote]  # Create tag v0.0.0 and build and push bin_checker_rb-X.X.X.gem to rubygems.org
rake test             # Run tests
  • If you have PROMPTAPI_TOKEN you’ll have real http request based tests available.
  • Set RUBY_DEVELOPMENT to 1 for more verbose test results

License

This project is licensed under MIT


Contributer(s)


Contribute

Bug reports and pull requests are welcome on GitHub:

  1. fork (https://github.com/promptapi/scraper_rb/fork)
  2. Create your branch (git checkout -b my-feature)
  3. commit yours (git commit -am 'Add awesome features...')
  4. push your branch (git push origin my-feature)
  5. Than create a new Pull Request!

This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the code of conduct.


FAQs

Package last updated on 06 Oct 2020

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc