isbot 🤖/👨🦰
Detect bots/crawlers/spiders using the user agent string.
Releasing Version 4: deprecation notice
Version 4 will become the "latest" version on npm on January 2024
npm i isbot@4
or npm i isbot@next
I'll be releasing version 4 as "latest" soon. Migration is simple, just replace import isbot from "isbot"
with import { isbot } from "isbot"
in your code.
If you are using extended functionality, there will be more changes and the feature you're using is no longer supported as is. Please open an issue if you need help migrating.
Please visit isbot for more information.
Usage
import isbot from 'isbot'
isbot(request.getHeader('User-Agent'))
isbot(req.get('user-agent'))
isbot(navigator.userAgent)
isbot('Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)')
isbot('Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36')
Using JSDeliver CDN you can import an iife script
See specific versions https://www.jsdelivr.com/package/npm/isbot or https://cdn.jsdelivr.net/npm/isbot
<script src="https://cdn.jsdelivr.net/npm/isbot@3"></script>
// isbot is global
isbot(navigator.userAgent)
Additional functionality
Extend: Add user agent patterns
Add rules to user agent match RegExp: Array of strings
isbot('Mozilla/5.0 (X11) Firefox/111.0')
isbot.extend([
'istat',
'x11'
])
isbot('Mozilla/5.0 (X11) Firefox/111.0')
Exclude: Remove matches of known crawlers
Remove rules to user agent match RegExp (see existing rules in src/list.json
file)
This function requires konwnledge of the internal structure of the list - which may change at any time. It is recommended to use the clear
function instead
isbot('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4590.2 Safari/537.36 Chrome-Lighthouse')
isbot.exclude(['chrome-lighthouse'])
isbot('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4590.2 Safari/537.36 Chrome-Lighthouse')
Find: Verbose result
Return the respective match for bot user agent rule
isbot.find('Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0 DejaClick/2.9.7.2')
Matches: Get patterns
Return all patterns that match the user agent string
isbot.matches('Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0 SearchRobot/1.0')
Clear:
Remove all matching patterns so this user agent string will pass
const ua = 'Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0 SearchRobot/1.0';
isbot(ua)
isbot.clear(ua)
isbot(ua)
Spawn: Create new instances
Create new instances of isbot. Instance is spawned using spawner's list as base
const one = isbot.spawn()
const two = isbot.spawn()
two.exclude(['chrome-lighthouse'])
one('Chrome-Lighthouse')
two('Chrome-Lighthouse')
Create isbot using custom list (instead of the maintained list)
const lean = isbot.spawn([ 'bot' ])
lean('Googlebot')
lean('Chrome-Lighthouse')
Get a copy of the Regular Expression pattern
const { pattern } = isbot
Definitions
- Bot. Autonomous program imitating or replacing some aspect of a human behaviour, performing repetitive tasks much faster than human users could.
- Good bot. Automated programs who visit websites in order to collect useful information. Web crawlers, site scrapers, stress testers, preview builders and other programs are welcomed on most websites because they serve purposes of mutual benefits.
- Bad bot. Programs which are designed to perform malicious actions, ultimately hurting businesses. Testing credential databases, DDoS attacks, spam bots.
Clarifications
What does "isbot" do?
This package aims to identify "Good bots". Those who voluntarily identify themselves by setting a unique, preferably descriptive, user agent, usually by setting a dedicated request header.
What doesn't "isbot" do?
It does not try to recognise malicious bots or programs disguising themselves as real users.
Why would I want to identify good bots?
Recognising good bots such as web crawlers is useful for multiple purposes. Although it is not recommended to serve different content to web crawlers like Googlebot, you can still elect to
- Flag pageviews to consider with business analysis.
- Prefer to serve cached content and relieve service load.
- Omit third party solutions' code (tags, pixels) and reduce costs.
It is not recommended to whitelist requests for any reason based on user agent header only. Instead other methods of identification can be added such as reverse dns lookup.
Data sources
We use external data sources on top of our own lists to keep up to date
Crawlers user agents:
Non bot user agents:
Missing something? Please open an issue
Major releases breaking changes (full changelog)
Remove testing for node 6 and 8
Change return value for isbot: true
instead of matched string
No functional change
Real world data
Execution times in milliseconds |
---|
|