What is puppeteer-extra-plugin?
The puppeteer-extra-plugin package is a modular plugin framework for Puppeteer, which allows you to easily extend the functionality of Puppeteer with various plugins. It provides a way to enhance Puppeteer's capabilities, such as stealth mode, ad-blocking, and more.
What are puppeteer-extra-plugin's main functionalities?
Stealth Mode
The Stealth Mode feature allows Puppeteer to mimic human-like behavior and avoid detection by anti-bot systems. This is useful for web scraping and automation tasks where detection can be an issue.
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
// Perform actions as a stealthy browser
await browser.close();
})();
Ad Blocking
The Ad Blocking feature allows Puppeteer to block ads on web pages, making the browsing experience cleaner and faster. This is particularly useful for scraping content without being interrupted by ads.
const puppeteer = require('puppeteer-extra');
const AdblockerPlugin = require('puppeteer-extra-plugin-adblocker');
puppeteer.use(AdblockerPlugin());
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
// Ads will be blocked on the page
await browser.close();
})();
User Data
The User Data feature allows Puppeteer to save and reuse user data, such as cookies and local storage. This is useful for maintaining sessions and state across different browsing sessions.
const puppeteer = require('puppeteer-extra');
const UserDataPlugin = require('puppeteer-extra-plugin-user-data');
puppeteer.use(UserDataPlugin({
userDataDir: './user_data'
}));
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
// User data will be saved in the specified directory
await browser.close();
})();
Other packages similar to puppeteer-extra-plugin
puppeteer-cluster
puppeteer-cluster is a package that provides a simple and efficient way to manage multiple Puppeteer instances. It allows you to run multiple browser instances in parallel, making it ideal for large-scale web scraping and automation tasks. Compared to puppeteer-extra-plugin, puppeteer-cluster focuses more on parallelism and resource management.
puppeteer-core
puppeteer-core is a lightweight version of Puppeteer that does not include the bundled Chromium browser. It allows you to use Puppeteer with any existing browser installation. While puppeteer-core does not offer plugins like puppeteer-extra-plugin, it provides more flexibility in terms of browser choice and version management.
playwright
Playwright is a Node.js library developed by Microsoft for browser automation. It supports multiple browsers (Chromium, Firefox, and WebKit) and offers features like auto-waiting, network interception, and more. Playwright is similar to Puppeteer but provides broader browser support and additional features, making it a strong alternative to puppeteer-extra-plugin.
Installation
yarn add puppeteer-extra-plugin
Base class for puppeteer-extra
plugins.
Provides convenience lifecycle methods to avoid boilerplate.
API
I've refactored the code to TypeScript and the generated typedoc API documentation can be found here.
Unfortunately the generated documentation is currently not as nice as the former documentation.js one but I'm working on it. :-)
Changelog
v3.1.0
- Now written in TypeScript 🎉
- Breaking change: Now using a named export:
const PuppeteerExtraPlugin = require('puppeteer-extra-plugin')
const { PuppeteerExtraPlugin } = require('puppeteer-extra-plugin')