
Research
/Security News
Toptal’s GitHub Organization Hijacked: 10 Malicious Packages Published
Threat actors hijacked Toptal’s GitHub org, publishing npm packages with malicious payloads that steal tokens and attempt to wipe victim systems.
CrawlerDetect is a Ruby version of PHP class @CrawlerDetect.
It helps to detect bots/crawlers/spiders via the user agent and other HTTP-headers. Currently able to detect 1,000's of bots/spiders/crawlers.
Comparing with other popular bot-detection gems:
CrawlerDetect | Voight-Kampff | Browser | |
---|---|---|---|
Number of bot-patterns | >1000 | ~280 | ~280 |
Number of checked HTTP-headers | 11 | 1 | 1 |
Number of updates of bot-list (1st half of 2018) | 14 | 1 | 7 |
In order to remain up-to-date, this gem does not accept any crawler data updates – any PRs to edit the crawler data should be offered to the original JayBizzle/CrawlerDetect project.
Add this line to your application's Gemfile:
gem 'crawler_detect'
CrawlerDetect.is_crawler?("Bot user agent")
=> true
Or if you need crawler name:
detector = CrawlerDetect.new("Googlebot/2.1 (http://www.google.com/bot.html)")
detector.is_crawler?
# => true
detector.crawler_name
# => "Googlebot"
Optionally you can add additional methods for request
:
request.is_crawler?
# => false
request.crawler_name
# => nil
It's more flexible to use request.is_crawler?
rather than CrawlerDetect.is_crawler?
because it automatically checks 10 HTTP-headers, not only HTTP_USER_AGENT
.
Only one thing you have to do is to configure Rack::CrawlerDetect
midleware:
class Application < Rails::Application
# ...
config.middleware.use Rack::CrawlerDetect
end
use Rack::CrawlerDetect
In some cases you may want to use your own white-list, or black-list or list of http-headers to detect User-agent.
It is possible to do via CrawlerDetect::Config
. For example, you may have initializer like this:
CrawlerDetect.setup! do |config|
config.raw_headers_path = File.expand_path("crawlers/MyHeaders.json", __dir__)
config.raw_crawlers_path = File.expand_path("crawlers/MyCrawlers.json", __dir__)
config.raw_exclusions_path = File.expand_path("crawlers/MyExclusions.json", __dir__)
end
Make sure that your files are correct JSON files. Look at the raw files which are used by default for more information.
You can run rubocop
\ rspec
with any ruby version using docker like this:
docker build --build-arg RUBY_VERSION=3.3 --build-arg BUNDLER_VERSION=2.5 -t crawler_detect:3.3 .
docker run -it crawler_detect:3.3 bundle exec rspec
MIT License
FAQs
Unknown package
We found that crawler_detect demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
/Security News
Threat actors hijacked Toptal’s GitHub org, publishing npm packages with malicious payloads that steal tokens and attempt to wipe victim systems.
Research
/Security News
Socket researchers investigate 4 malicious npm and PyPI packages with 56,000+ downloads that install surveillance malware.
Security News
The ongoing npm phishing campaign escalates as attackers hijack the popular 'is' package, embedding malware in multiple versions.