What is puppeteer-extra?
puppeteer-extra is a modular plugin framework for the popular headless browser automation library Puppeteer. It allows users to easily extend Puppeteer's functionality with plugins, making it more versatile and powerful for various web scraping, automation, and testing tasks.
What are puppeteer-extra's main functionalities?
Stealth Plugin
The Stealth Plugin helps to avoid detection by various anti-bot measures on websites. It modifies the browser's behavior to make it look more like a regular user.
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());
(async () => {
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.goto('https://example.com');
await page.screenshot({ path: 'example.png' });
await browser.close();
})();
Adblocker Plugin
The Adblocker Plugin blocks ads and trackers, making page loads faster and reducing the amount of data processed.
const puppeteer = require('puppeteer-extra');
const AdblockerPlugin = require('puppeteer-extra-plugin-adblocker');
puppeteer.use(AdblockerPlugin());
(async () => {
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.goto('https://example.com');
await page.screenshot({ path: 'example.png' });
await browser.close();
})();
Recaptcha Plugin
The Recaptcha Plugin helps to solve reCAPTCHAs automatically using third-party services like 2Captcha, making it easier to automate interactions with websites that use CAPTCHA challenges.
const puppeteer = require('puppeteer-extra');
const RecaptchaPlugin = require('puppeteer-extra-plugin-recaptcha');
puppeteer.use(RecaptchaPlugin({ provider: { id: '2captcha', token: 'YOUR_2CAPTCHA_API_KEY' } }));
(async () => {
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.goto('https://example.com');
const { captchas } = await page.solveRecaptchas();
await page.screenshot({ path: 'example.png' });
await browser.close();
})();
Other packages similar to puppeteer-extra
puppeteer
Puppeteer is the core library for headless browser automation. It provides a high-level API to control Chrome or Chromium over the DevTools Protocol. While it is powerful, it lacks the modular plugin system of puppeteer-extra, making it less flexible for certain tasks.
playwright
Playwright is a newer library from Microsoft that provides similar functionality to Puppeteer but supports multiple browsers (Chromium, Firefox, and WebKit). It offers a more robust and versatile API but does not have a plugin system like puppeteer-extra.
selenium-webdriver
Selenium WebDriver is a widely-used tool for browser automation that supports multiple browsers and programming languages. It is more mature and has a larger community but is generally considered more complex and slower compared to Puppeteer and puppeteer-extra.
A light-weight wrapper around puppeteer
that enables plugins through a clean interface.
Installation
yarn add puppeteer-extra
Puppeteer is a peer dependency of puppeteer-extra,
which means you can install your own preferred version:
yarn add puppeteer
yarn add puppeteer@next
Quickstart
const puppeteer = require('puppeteer-extra')
// Register plugins through `.use()`
puppeteer.use(require('puppeteer-extra-plugin-anonymize-ua')())
puppeteer.use(require('puppeteer-extra-plugin-font-size')({defaultFontSize: 18}))
(async () => {
const browser = await puppeteer.launch({headless: false})
const page = await browser.newPage()
await page.goto('http://example.com', {waitUntil: 'domcontentloaded'})
await browser.close()
})()
Plugins
- Applies various techniques to make detection of headless puppeteer harder.
- Allow flash content to run on all sites without user interaction.
- Anonymizes the user-agent on all pages.
- Supports dynamic replacing, so the browser version stays intact and recent.
Check out the packages folder for more plugins.
Contributing
PRs and new plugins are welcome! :tada: The plugin API for puppeteer-extra
is clean and fun to use. Have a look the PuppeteerExtraPlugin base class documentation to get going and check out the existing plugins (minimal example is the anonymize-ua plugin) for reference.
We use a monorepo powered by Lerna (and yarn workspaces), ava for testing, the standard style for linting and JSDoc heavily to auto-generate markdown documentation based on code. :-)
Kudos
API
Table of Contents
Modular plugin framework to teach puppeteer
new tricks.
This module acts a drop-in replacement for puppeteer
.
Allows PuppeteerExtraPlugin's to register themselves and
to extend puppeteer with additional functionality.
Type: function ()
Example:
const puppeteer = require('puppeteer-extra')
puppeteer.use(require('puppeteer-extra-plugin-anonymize-ua')())
puppeteer.use(require('puppeteer-extra-plugin-font-size')({defaultFontSize: 18}))
(async () => {
const browser = await puppeteer.launch({headless: false})
const page = await browser.newPage()
await page.goto('http://example.com', {waitUntil: 'domcontentloaded'})
await browser.close()
})()
Outside interface to register plugins.
Type: function (plugin): this
plugin
PuppeteerExtraPlugin
Example:
const puppeteer = require('puppeteer-extra')
puppeteer.use(require('puppeteer-extra-plugin-anonymize-ua')())
puppeteer.use(require('puppeteer-extra-plugin-user-preferences')())
const browser = await puppeteer.launch(...)
Main launch method.
Augments the original puppeteer.launch
method with plugin lifecycle methods.
All registered plugins that have a beforeLaunch
method will be called
in sequence to potentially update the options
Object before launching puppeteer.
Type: function (options): Puppeteer.Browser
options
Object? Regular Puppeteer options (optional, default {}
)
Get all registered plugins.
Type: Array<PuppeteerExtraPlugin>
- See: puppeteer-extra-plugin/data
Collects the exposed data
property of all registered plugins.
Will be reduced/flattened to a single array.
Can be accessed by plugins that listed the dataFromPlugins
requirement.
Implemented mainly for plugins that need data from other plugins (e.g. user-preferences
).
Type: function (name)
name
string? Filter data by name property (optional, default null
)
Regular Puppeteer method that is being passed through.
Type: function (options)
Regular Puppeteer method that is being passed through.
Type: function (): string
Regular Puppeteer method that is being passed through.
Type: function ()
Regular Puppeteer method that is being passed through.
Type: function (options): PuppeteerBrowserFetcher