What is puppeteer-extra?
puppeteer-extra is a modular plugin framework for the popular headless browser automation library Puppeteer. It allows users to easily extend Puppeteer's functionality with plugins, making it more versatile and powerful for various web scraping, automation, and testing tasks.
What are puppeteer-extra's main functionalities?
Stealth Plugin
The Stealth Plugin helps to avoid detection by various anti-bot measures on websites. It modifies the browser's behavior to make it look more like a regular user.
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());
(async () => {
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.goto('https://example.com');
await page.screenshot({ path: 'example.png' });
await browser.close();
})();
Adblocker Plugin
The Adblocker Plugin blocks ads and trackers, making page loads faster and reducing the amount of data processed.
const puppeteer = require('puppeteer-extra');
const AdblockerPlugin = require('puppeteer-extra-plugin-adblocker');
puppeteer.use(AdblockerPlugin());
(async () => {
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.goto('https://example.com');
await page.screenshot({ path: 'example.png' });
await browser.close();
})();
Recaptcha Plugin
The Recaptcha Plugin helps to solve reCAPTCHAs automatically using third-party services like 2Captcha, making it easier to automate interactions with websites that use CAPTCHA challenges.
const puppeteer = require('puppeteer-extra');
const RecaptchaPlugin = require('puppeteer-extra-plugin-recaptcha');
puppeteer.use(RecaptchaPlugin({ provider: { id: '2captcha', token: 'YOUR_2CAPTCHA_API_KEY' } }));
(async () => {
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.goto('https://example.com');
const { captchas } = await page.solveRecaptchas();
await page.screenshot({ path: 'example.png' });
await browser.close();
})();
Other packages similar to puppeteer-extra
puppeteer
Puppeteer is the core library for headless browser automation. It provides a high-level API to control Chrome or Chromium over the DevTools Protocol. While it is powerful, it lacks the modular plugin system of puppeteer-extra, making it less flexible for certain tasks.
playwright
Playwright is a newer library from Microsoft that provides similar functionality to Puppeteer but supports multiple browsers (Chromium, Firefox, and WebKit). It offers a more robust and versatile API but does not have a plugin system like puppeteer-extra.
selenium-webdriver
Selenium WebDriver is a widely-used tool for browser automation that supports multiple browsers and programming languages. It is more mature and has a larger community but is generally considered more complex and slower compared to Puppeteer and puppeteer-extra.
A light-weight wrapper around puppeteer
to enable plugins through a clean interface.
Installation
yarn add puppeteer puppeteer-extra
Quickstart
// puppeteer-extra is a drop-in replacement for puppeteer,
// it augments the installed puppeteer with plugin functionality
const puppeteer = require('puppeteer-extra')
// register plugins through `.use()`
puppeteer.use(require('puppeteer-extra-plugin-anonymize-ua')({ makeWindows: true }))
puppeteer.use(require('puppeteer-extra-plugin-stealth')())
// usage as normal
puppeteer.launch().then(async browser => {
const page = await browser.newPage()
await page.goto('https://httpbin.org/headers', { waitUntil: 'domcontentloaded' })
const content = await page.content()
console.log('content:', content) // => (..) User-Agent: (..) Windows NT 10.0
await browser.close()
})
Plugins
- Applies various evasion techniques to make detection of headless puppeteer harder.
- Makes puppeteer browser debugging possible from anywhere.
- Creates a secure tunnel to make the devtools frontend (incl. screencasting) accessible from the public internet
- Makes quick puppeteer debugging and exploration fun with an interactive REPL.
- Blocks resources (images, media, css, etc.) in puppeteer.
- Supports all resource types, blocking can be toggled dynamically.
- Allows flash content to run on all sites without user interaction.
- Anonymizes the user-agent on all pages.
- Supports dynamic replacing, so the browser version stays intact and recent.
Check out the packages folder for more plugins.
Contributing
PRs and new plugins are welcome! :tada: The plugin API for puppeteer-extra
is clean and fun to use. Have a look the PuppeteerExtraPlugin base class documentation to get going and check out the existing plugins (minimal example is the anonymize-ua plugin) for reference.
We use a monorepo powered by Lerna (and yarn workspaces), ava for testing, the standard style for linting and JSDoc heavily to auto-generate markdown documentation based on code. :-)
Kudos
Compatibility
puppeteer-extra
and all plugins are tested continously against Node v8, v9, v10 and Puppeteer v1.3 to v1.8 and @next, as well as a any combination thereof.
Some plugins won't work in headless mode due to Chrome limitations (e.g. user preferences in the profile folder), look into xvfb-run
if you still require a headless experience in these circumstances.
API
Table of Contents
Modular plugin framework to teach puppeteer
new tricks.
This module acts a drop-in replacement for puppeteer
.
Allows PuppeteerExtraPlugin's to register themselves and
to extend puppeteer with additional functionality.
Type: function ()
Example:
const puppeteer = require('puppeteer-extra')
puppeteer.use(require('puppeteer-extra-plugin-anonymize-ua')())
puppeteer.use(require('puppeteer-extra-plugin-font-size')({defaultFontSize: 18}))
;(async () => {
const browser = await puppeteer.launch({headless: false})
const page = await browser.newPage()
await page.goto('http://example.com', {waitUntil: 'domcontentloaded'})
await browser.close()
})()
Outside interface to register plugins.
Type: function (plugin): this
plugin
PuppeteerExtraPlugin
Example:
const puppeteer = require('puppeteer-extra')
puppeteer.use(require('puppeteer-extra-plugin-anonymize-ua')())
puppeteer.use(require('puppeteer-extra-plugin-user-preferences')())
const browser = await puppeteer.launch(...)
Launch a new browser instance with given arguments.
Augments the original puppeteer.launch
method with plugin lifecycle methods.
All registered plugins that have a beforeLaunch
method will be called
in sequence to potentially update the options
Object before launching the browser.
Type: function (options): Puppeteer.Browser
Attach Puppeteer to an existing Chromium instance.
Augments the original puppeteer.connect
method with plugin lifecycle methods.
All registered plugins that have a beforeConnect
method will be called
in sequence to potentially update the options
Object before launching the browser.
Type: function (options)
options
{browserWSEndpoint: string, ignoreHTTPSErrors: boolean} (optional, default {}
)
Get all registered plugins.
Type: Array<PuppeteerExtraPlugin>
- See: puppeteer-extra-plugin/data
Collects the exposed data
property of all registered plugins.
Will be reduced/flattened to a single array.
Can be accessed by plugins that listed the dataFromPlugins
requirement.
Implemented mainly for plugins that need data from other plugins (e.g. user-preferences
).
Type: function (name)
name
string? Filter data by name property (optional, default null
)
Regular Puppeteer method that is being passed through.
Type: function (): string
Regular Puppeteer method that is being passed through.
Type: function ()
Regular Puppeteer method that is being passed through.
Type: function (options): PuppeteerBrowserFetcher