![38% of CISOs Fear They’re Not Moving Fast Enough on AI](https://cdn.sanity.io/images/cgdhsj6q/production/faa0bc28df98f791e11263f8239b34207f84b86f-1024x1024.webp?w=400&fit=max&auto=format)
Security News
38% of CISOs Fear They’re Not Moving Fast Enough on AI
CISOs are racing to adopt AI for cybersecurity, but hurdles in budgets and governance may leave some falling behind in the fight against cyber threats.
Spidr is a versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.
a
tags.iframe
tags.frame
tags./robots.txt
support.Start spidering from a URL:
Spidr.start_at('http://tenderlovemaking.com/') do |agent|
# ...
end
Spider a host:
Spidr.host('solnic.eu') do |agent|
# ...
end
Spider a domain (and any sub-domains):
Spidr.domain('ruby-lang.org') do |agent|
# ...
end
Spider a site:
Spidr.site('http://www.rubyflow.com/') do |agent|
# ...
end
Spider multiple hosts:
Spidr.start_at('http://company.com/', hosts: ['company.com', /host[\d]+\.company\.com/]) do |agent|
# ...
end
Do not spider certain links:
Spidr.site('http://company.com/', ignore_links: [%{^/blog/}]) do |agent|
# ...
end
Do not spider links on certain ports:
Spidr.site('http://company.com/', ignore_ports: [8000, 8010, 8080]) do |agent|
# ...
end
Do not spider links blacklisted in robots.txt:
Spidr.site('http://company.com/', robots: true) do |agent|
# ...
end
Print out visited URLs:
Spidr.site('http://www.rubyinside.com/') do |spider|
spider.every_url { |url| puts url }
end
Build a URL map of a site:
url_map = Hash.new { |hash,key| hash[key] = [] }
Spidr.site('http://intranet.com/') do |spider|
spider.every_link do |origin,dest|
url_map[dest] << origin
end
end
Print out the URLs that could not be requested:
Spidr.site('http://company.com/') do |spider|
spider.every_failed_url { |url| puts url }
end
Finds all pages which have broken links:
url_map = Hash.new { |hash,key| hash[key] = [] }
spider = Spidr.site('http://intranet.com/') do |spider|
spider.every_link do |origin,dest|
url_map[dest] << origin
end
end
spider.failures.each do |url|
puts "Broken link #{url} found in:"
url_map[url].each { |page| puts " #{page}" }
end
Search HTML and XML pages:
Spidr.site('http://company.com/') do |spider|
spider.every_page do |page|
puts ">>> #{page.url}"
page.search('//meta').each do |meta|
name = (meta.attributes['name'] || meta.attributes['http-equiv'])
value = meta.attributes['content']
puts " #{name} = #{value}"
end
end
end
Print out the titles from every page:
Spidr.site('https://www.ruby-lang.org/') do |spider|
spider.every_html_page do |page|
puts page.title
end
end
Print out every HTTP redirect:
Spidr.host('company.com') do |spider|
spider.every_redirect_page do |page|
puts "#{page.url} -> #{page.headers['Location']}"
end
end
Find what kinds of web servers a host is using, by accessing the headers:
servers = Set[]
Spidr.host('company.com') do |spider|
spider.all_headers do |headers|
servers << headers['server']
end
end
Pause the spider on a forbidden page:
Spidr.host('company.com') do |spider|
spider.every_forbidden_page do |page|
spider.pause!
end
end
Skip the processing of a page:
Spidr.host('company.com') do |spider|
spider.every_missing_page do |page|
spider.skip_page!
end
end
Skip the processing of links:
Spidr.host('company.com') do |spider|
spider.every_url do |url|
if url.path.split('/').find { |dir| dir.to_i > 1000 }
spider.skip_link!
end
end
end
$ gem install spidr
See {file:LICENSE.txt} for license information.
FAQs
Unknown package
We found that spidr demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
CISOs are racing to adopt AI for cybersecurity, but hurdles in budgets and governance may leave some falling behind in the fight against cyber threats.
Research
Security News
Socket researchers uncovered a backdoored typosquat of BoltDB in the Go ecosystem, exploiting Go Module Proxy caching to persist undetected for years.
Security News
Company News
Socket is joining TC54 to help develop standards for software supply chain security, contributing to the evolution of SBOMs, CycloneDX, and Package URL specifications.