What is puppeteer-extra-plugin-stealth?
The puppeteer-extra-plugin-stealth package is designed to make headless puppeteer browsers less detectable by web servers. It achieves this by applying various techniques to mask the fact that a browser is being controlled by automation scripts.
What are puppeteer-extra-plugin-stealth's main functionalities?
Bypass WebDriver Detection
This feature allows you to bypass WebDriver detection by websites. The plugin modifies the navigator.webdriver property and other related properties to make the browser appear as if it is not being controlled by automation scripts.
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());
(async () => {
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.goto('https://example.com');
// Perform actions on the page
await browser.close();
})();
Masking Browser Fingerprints
This feature helps in masking various browser fingerprints such as user-agent, languages, and other properties that can be used to detect automated browsing.
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());
(async () => {
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.goto('https://example.com');
// Perform actions on the page
await browser.close();
})();
Evading Chrome Headless Detection
This feature evades detection mechanisms that check for headless Chrome. It modifies various properties and behaviors to make the headless browser appear as a regular browser.
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());
(async () => {
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.goto('https://example.com');
// Perform actions on the page
await browser.close();
})();
Other packages similar to puppeteer-extra-plugin-stealth
puppeteer-extra
puppeteer-extra is a modular plugin framework for puppeteer. It allows you to add various plugins to enhance puppeteer's functionality. While it includes the stealth plugin, it also offers other plugins for ad-blocking, recaptcha solving, and more.
puppeteer-cluster
puppeteer-cluster is a library that allows you to create a cluster of puppeteer workers to perform parallel tasks. While it does not focus on stealth, it is useful for scaling up web scraping tasks efficiently.
playwright
playwright is a Node.js library to automate Chromium, Firefox, and WebKit with a single API. It offers built-in features to handle headless detection and provides more control over browser contexts and sessions compared to puppeteer.
A plugin for puppeteer-extra.
Install
yarn add puppeteer-extra-plugin-stealth
API
Table of Contents
Extends: PuppeteerExtraPlugin
Stealth mode: Applies various techniques to make detection of headless puppeteer harder. 💯
Purpose
There are a couple of ways the use of puppeteer can easily be detected by a target website.
The addition of HeadlessChrome
to the user-agent being only the most obvious one.
The goal of this plugin is to be the definite companion to puppeteer to avoid
detection, applying new techniques as they surface.
As this cat & mouse game is in it's infancy and fast-paced the plugin
is kept as flexibile as possible, to support quick testing and iterations.
Modularity
This plugin uses puppeteer-extra
's dependency system to only require
code mods for evasions that have been enabled, to keep things modular and efficient.
The stealth
plugin is a convenience wrapper that requires multiple evasion techniques
automatically and comes with defaults. You could also bypass the main module and require
specific evasion plugins yourself, if you whish to do so (as they're standalone puppeteer-extra
plugins):
// bypass main module and require a specific stealth plugin directly:
puppeteer.use(require('puppeteer-extra-plugin-stealth/evasions/console.debug')())
Contributing
PRs are welcome, if you want to add a new evasion technique I suggest you
look at the template to kickstart things.
Notes
Word of caution: Due to the intrusive nature of these detection mitigation techniques
they might break functionality on certain sites. Selectively disable techniques if that happens or submit a PR with a fix. :-)
Kudos
Thanks to Evan Sangaline and Paul Irish for kickstarting the discussion!
Type: function (opts)
opts
Object Options (optional, default {}
)
opts.enabledEvasions
Set<string>? Specify which evasions to use (by default all)
Example:
const puppeteer = require('puppeteer-extra')
puppeteer.use(require('puppeteer-extra-plugin-stealth')())
;(async () => {
const browser = await puppeteer.launch({ args: ['--no-sandbox'], headless: true })
const page = await browser.newPage()
const testUrl = 'https://intoli.com/blog/' +
'not-possible-to-block-chrome-headless/chrome-headless-test.html'
await page.goto(testUrl)
const screenshotPath = '/tmp/headless-test-result.png'
await page.screenshot({path: screenshotPath})
console.log('have a look at the screenshot:', screenshotPath)
await browser.close()
})()
Get all available evasions.
Please look into the evasions directory for an up to date list.
Type: Set<string>
Example:
const pluginStealth = require('puppeteer-extra-plugin-stealth')()
console.log(pluginStealth.availableEvasions)
puppeteer.use(pluginStealth)
Get all enabled evasions.
Enabled evasions can be configured either through opts
or by modifying this property.
Type: Set<string>
Example:
const pluginStealth = require('puppeteer-extra-plugin-stealth')()
pluginStealth.enabledEvasions.delete('console.debug')
puppeteer.use(pluginStealth)