What is puppeteer?
Puppeteer is a Node library which provides a high-level API to control Chrome or Chromium over the DevTools Protocol. It is primarily used for automating web browser actions, such as taking screenshots, generating pre-rendered content, and automating form submissions, among other things.
What are puppeteer's main functionalities?
Web Scraping
Puppeteer can be used to scrape content from web pages by programmatically navigating to the page and extracting the required data.
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
const data = await page.evaluate(() => document.querySelector('*').outerHTML);
console.log(data);
await browser.close();
})();
Automated Testing
Puppeteer can automate form submissions and simulate user actions for testing web applications.
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com/login');
await page.type('#username', 'user');
await page.type('#password', 'pass');
await page.click('#submit');
// Check for successful login
await page.waitForSelector('#logout');
await browser.close();
})();
PDF Generation
Puppeteer can generate PDFs from web pages, which is useful for creating reports, invoices, and other printable documents.
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com', {waitUntil: 'networkidle2'});
await page.pdf({path: 'example.pdf', format: 'A4'});
await browser.close();
})();
Screenshot Capture
Puppeteer can take screenshots of web pages, either of the full page or specific elements, which is useful for capturing the state of a page for documentation or testing.
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
await page.screenshot({path: 'example.png'});
await browser.close();
})();
Other packages similar to puppeteer
playwright
Playwright is a Node library to automate the Chromium, WebKit, and Firefox browsers with a single API. It is similar to Puppeteer but adds support for multiple browser types and has additional features like network interception.
selenium-webdriver
Selenium WebDriver is one of the most well-known browser automation tools. It supports multiple browsers and languages, making it more versatile than Puppeteer, but it can be more complex to set up and slower in execution.
nightmare
Nightmare is a high-level browser automation library. It is simpler and has a more fluent API compared to Puppeteer, but it is less actively maintained and lacks some of the newer features that Puppeteer provides.
webdriverio
WebdriverIO is a custom implementation for selenium's W3C webdriver API. It is designed to be more accessible than the Selenium WebDriver and integrates well with modern web and mobile application testing practices.
Puppeteer
Puppeteer is a Node.js library which provides a high-level API to control
Chrome/Chromium over the
DevTools Protocol.
Puppeteer runs in
headless
mode by default, but can be configured to run in full ("headful")
Chrome/Chromium.
Example
import puppeteer from 'puppeteer';
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://developer.chrome.com/');
await page.setViewport({width: 1080, height: 1024});
await page.type('.devsite-search-field', 'automate beyond recorder');
const searchResultSelector = '.devsite-result-item-link';
await page.waitForSelector(searchResultSelector);
await page.click(searchResultSelector);
const textSelector = await page.waitForSelector(
'text/Customize and automate'
);
const fullTitle = await textSelector?.evaluate(el => el.textContent);
console.log('The title of this blog post is "%s".', fullTitle);
await browser.close();
})();