Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More →

Sign in Demo Install

What is Socket?

Socket for GitHub

Detect suspicious packages in PRs

Socket CLI

Use Socket from the command line

Socket Web Extension

Use Socket from your browser

Socket Dependency Search

Find any package for your project

Integrations

All Integrations

Ticketing & Messaging

Package Managers

Docs

Want to read all the docs? Start here

Customers

Check out our customer stories

Blog

Keep up to date with all the news

Changelog

Latest updates and enhancements

FAQ

Answers to common questions

Package Alerts

Learn about all Socket alerts

Glossary

Open source and security terms

Blog

Application Security

Customer Stories

About

Why we built Socket

Love

See why developers love Socket

Careers

Join our team

Investors

Learn about our investors

Security

Our security practices

Why Socket?

Socket vs Dependabot

Socket vs Semgrep

Socket vs EndorLabs

Achievements

Fortune Cyber 60

Pricing Love Docs

Sign in Demo Install

rubygems
Categories
Server
Web
Crawler

Crawler

crawler_detect

CrawlerDetect is a library to detect bots/crawlers via the user agent

1.2.4 • 8 months ago

crawler

BFS webcrawler that implements Observable

0.2.1 • 15 years ago

wombat

Generic Web crawler with a DSL that parses structured data from web pages

3.0.0 • 2 years ago

voight_kampff

Voight-Kampff detects bots, spiders, crawlers and replicants

2.0.0 • 2 years ago

cobweb

Cobweb is a web crawler that can use resque to cluster crawls to quickly crawl extremely large sites which is much more performant than multi-threaded crawlers. It is also a standalone crawler that has a sophisticated statistics monitoring interface to monitor the progress of the crawls.

1.2.1 • 4 years ago

polipus

An easy to use distributed web-crawler framework based on Redis

0.5.1 • 9 years ago

validate-website

validate-website is a web crawler for checking the markup validity with XML Schema / DTD and not found urls.

1.12.0 • 2 years ago

rubyretriever

Asynchronous web crawler, scraper and file harvester

1.4.6 • 9 years ago

nightcrawler_swift

Like the X-Men nightcrawler this gem teleports your assets to a OpenStack Swift bucket/container

1.0.0 • 7 years ago

is_crawler

is_crawler does exactly what you might think it does: determine if the supplied string matches a known crawler or bot.

0.1.5 • 12 years ago

crawler_rocks

a crawler toolkit

0.0.6 • 8 years ago

iudex-da

Iudex is a general purpose web crawler and feed processor in ruby/java. The iudex-da gem provides a PostgreSQL-based content meta-data store and work priority queue.

1.7.1 • 7 years ago

google_ajax_crawler

Rack Middleware adhering to the Google Ajax Crawling Scheme, using a headless browser to render JS heavy pages and serve a dom snapshot of the rendered state to a requesting search engine.

0.2.0 • 11 years ago

linkedincrawler

Crawls public LinkedIn profiles via Google

0.0.20 • 8 years ago

grell

Ruby web crawler using PhantomJS

2.1.2 • 4 years ago

arachnid

Arachnid is a web crawler that relies on Bloom Filters to efficiently store visited urls and Typhoeus to avoid the overhead of Mechanize when crawling every page on a domain.

0.4.1 • 11 years ago

iudex-core

Iudex is a general purpose web crawler and feed processor in ruby/java. The iudex-core gem contains core facilities and notably, does not contain such facilities as database-backed state management.

1.7.0 • 10 years ago

rails_analyzer_tools

Rails Analyzer Tools contains Bench, a simple web page benchmarker, Crawler, a tool for beating up on web sites, RailsStat, a tool for monitoring Rails web sites, and IOTail, a tail(1) method for Ruby IOs.

1.4.0 • 15 years ago

rdig

Website crawler and fulltext indexer.

0.3.12 • 13 years ago

masque

JavaScript enabled web crawler kit

0.4.3 • 10 years ago

iudex-html

Iudex is a general purpose web crawler and feed processor in ruby/java. The iudex-html gem contains filters for HTML parsing, filtering, exracting text and links.

1.7.1 • 7 years ago

iudex-worker

Iudex is a general purpose web crawler and feed processor in ruby/java. The iudex-worker gem provides a worker deamon for feed/page processing.

1.7.0 • 10 years ago

iudex-http

Iudex is a general purpose web crawler and feed processor in ruby/java. The iudex-http gem contains and http client agnostic abstraction layer.

1.7.0 • 10 years ago

iudex-simhash

Iudex is a general purpose web crawler and feed processor in ruby/java. The iudex-simhash gem contains support for generation and searching over simhash fingerprints

1.7.0 • 10 years ago

iudex-barc

Iudex is a general purpose web crawler and feed processor in ruby/java. The iudex-barc gem contains support for the BARC Basic ARChive format.

1.7.1 • 6 years ago

iudex-httpclient-3

Iudex is a general purpose web crawler and feed processor in ruby/java. This gem is an rjack-httpclient-3 based implementation of the iudex-http interfaces.

1.7.0 • 10 years ago

instagram-crawler

Crawl instagram photos, posts and videos for download.

0.3.0 • 6 years ago

iudex-jetty-httpclient

Iudex is a general purpose web crawler and feed processor in ruby/java. This gem is a Jetty HTTP Client based implementation of the iudex-http interfaces.

1.7.1 • 8 years ago

arachnid2

A simple, fast web crawler

0.4.0 • 4 years ago

iudex-filter

Iudex is a general purpose web crawler and feed processor in ruby/java. The iudex-filter gem contains a fundamental filtering/chain of responsbility sub-system.

1.7.0 • 10 years ago

iudex-rome

Iudex is a general purpose web crawler and feed processor in ruby/java. The iudex-rome gems is an adaption of rjack-rome for feed parsing in Iudex.

1.7.0 • 10 years ago

dmm-crawler

Show DMM and DMM.R18's crawled data. e.g. ranking

0.4.5 • 5 years ago

twittercrawler

Crawls Twitter

0.0.12 • 7 years ago

simplecrawler

The SimpleCrawler module is a library for crawling web sites. The crawler provides comprehensive data from the page crawled which can be used for page analysis, indexing, accessibility checks etc. Restrictions can be specified to limit crawling of binary files.

0.1.8 • 13 years ago

ruby-cheerio

Ruby Cheerio is a jQuery style HTML parser, which take selectors as input. This is a Ruby version NodeJS package named 'Cheerio', which is extensively used by crawlers. Please visit the home page for usage details.

0.0.5 • 8 years ago

daimon_skycrawlers

This is a crawler framework.

1.0.0 • 8 years ago

cangrejo

Cangrejo lets you consume crabfarm crawlers using a simple DSL

0.2.5 • 9 years ago

indeedcrawler

Crawls Indeed resumes

0.0.5 • 8 years ago

iudex-http-test

Iudex is a general purpose web crawler and feed processor in ruby/java. The iudex-http-test gem contains a HTTP test server for testing HTTP client implementations.

1.7.0 • 10 years ago

pba_crawler

Crawl websites

0.0.11 • 14 years ago

wayback_archiver

Post URLs to Wayback Machine (Internet Archive), using a crawler, from Sitemap(s) or a list of URLs.

1.4.0 • 4 years ago

rcrawl

A web crawler written in ruby

0.5.1 • 15 years ago

regexp_crawler

RegexpCrawler is a Ruby library for crawl data from website using regular expression.

0.9.2 • 15 years ago

semantic-crawler

SemanticCrawler is a ruby library that encapsulates data gathering from different sources. Currently microdata from websites, country information from Freebase, Factbook and FAO (Food and Agriculture Organization of the United Nations), crisis information from GDACS.org and geo data from LinkedGeoData are supported. Additional the GeoNames module allows to get Factbook and FAO country information from GPS coordinates.

0.7.1 • 12 years ago

iudex-async-httpclient

Iudex is a general purpose web crawler and feed processor in ruby/java. This gem is an rjack-async-httpclient based implementation of the iudex-http interfaces.

1.7.0 • 10 years ago

iudex-char-detector

Iudex is a general purpose web crawler and feed processor in ruby/java. The iudex-char-detector gem provides charset detection support.

1.7.0 • 10 years ago

webget

webget gem - a web (go get) crawler incl. web cache

0.2.5 • 4 years ago

rack_staging

Automatically protects your staging app from web crawlers and casual visitors.

0.2.0 • 13 years ago

web_crawler

Web crawler help you with parse and collect data from the web

0.5.4 • 13 years ago

repocrawler

Grab the information of repository from the GitHub, RubyGems, The Ruby Toolbox and Stackoverflow

0.2.29 • 9 years ago

Product

Package Alerts
Integrations
Docs
Pricing
FAQ
Roadmap
Changelog

About

About
Love
Blog
Glossary
Discord Community
CareersHiring
Send Feedback
Contact Us
System Status

Packages

npm

Directory
Explore
Random Package
Most Popular
Top Maintainers
Removed Packages

Go

Directory
Explore
Random Package

Maven

Directory
Explore
Random Package

PyPI

Directory
Explore
Random Package

Rubygems

Directory
Explore
Random Package

Stay in touch

Get open source security insights delivered straight into your inbox.

Enter your email

Terms
Privacy
Security

Made with ⚡️ by Socket Inc