What is puppeteer?
Puppeteer is a Node library which provides a high-level API to control Chrome or Chromium over the DevTools Protocol. It is primarily used for automating web browser actions, such as taking screenshots, generating pre-rendered content, and automating form submissions, among other things.
What are puppeteer's main functionalities?
Web Scraping
Puppeteer can be used to scrape content from web pages by programmatically navigating to the page and extracting the required data.
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
const data = await page.evaluate(() => document.querySelector('*').outerHTML);
console.log(data);
await browser.close();
})();
Automated Testing
Puppeteer can automate form submissions and simulate user actions for testing web applications.
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com/login');
await page.type('#username', 'user');
await page.type('#password', 'pass');
await page.click('#submit');
// Check for successful login
await page.waitForSelector('#logout');
await browser.close();
})();
PDF Generation
Puppeteer can generate PDFs from web pages, which is useful for creating reports, invoices, and other printable documents.
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com', {waitUntil: 'networkidle2'});
await page.pdf({path: 'example.pdf', format: 'A4'});
await browser.close();
})();
Screenshot Capture
Puppeteer can take screenshots of web pages, either of the full page or specific elements, which is useful for capturing the state of a page for documentation or testing.
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
await page.screenshot({path: 'example.png'});
await browser.close();
})();
Other packages similar to puppeteer
playwright
Playwright is a Node library to automate the Chromium, WebKit, and Firefox browsers with a single API. It is similar to Puppeteer but adds support for multiple browser types and has additional features like network interception.
selenium-webdriver
Selenium WebDriver is one of the most well-known browser automation tools. It supports multiple browsers and languages, making it more versatile than Puppeteer, but it can be more complex to set up and slower in execution.
nightmare
Nightmare is a high-level browser automation library. It is simpler and has a more fluent API compared to Puppeteer, but it is less actively maintained and lacks some of the newer features that Puppeteer provides.
webdriverio
WebdriverIO is a custom implementation for selenium's W3C webdriver API. It is designed to be more accessible than the Selenium WebDriver and integrates well with modern web and mobile application testing practices.
Puppeteer
Puppeteer is a Node.js library which provides a high-level API to control
Chrome/Chromium over the
DevTools Protocol.
Puppeteer runs in
headless
mode by default, but can be configured to run in full (non-headless)
Chrome/Chromium.
What can I do?
Most things that you can do manually in the browser can be done using Puppeteer!
Here are a few examples to get you started:
- Generate screenshots and PDFs of pages.
- Crawl a SPA (Single-Page Application) and generate pre-rendered content (i.e.
"SSR" (Server-Side Rendering)).
- Automate form submission, UI testing, keyboard input, etc.
- Create an automated testing environment using the latest JavaScript and
browser features.
- Capture a
timeline trace
of your site to help diagnose performance issues.
- Test Chrome Extensions.
Getting Started
Installation
To use Puppeteer in your project, run:
npm i puppeteer
When you install Puppeteer, it automatically downloads a recent version of
Chromium (~170MB macOS, ~282MB Linux, ~280MB Windows) that is
guaranteed to work
with Puppeteer. For a version of Puppeteer without installation, see
puppeteer-core
.
Environment Variables
Puppeteer looks for certain
environment variables for
customizing behavior. If Puppeteer doesn't find them in the environment during
the installation step, a lowercased variant of these variables will be used from
the npm config.
HTTP_PROXY
, HTTPS_PROXY
, NO_PROXY
- defines HTTP proxy settings that are
used to download and run the browser.PUPPETEER_CACHE_DIR
- defines the directory to be used by Puppeteer for
caching. Defaults to
os.homedir()/.cache/puppeteer
.PUPPETEER_SKIP_CHROMIUM_DOWNLOAD
- do not download bundled Chromium during
installation step.PUPPETEER_TMP_DIR
- defines the directory to be used by Puppeteer for
creating temporary files. Defaults to
os.tmpdir()
.PUPPETEER_DOWNLOAD_HOST
- specifies the URL prefix that is used to download
Chromium. Note: this includes protocol and might even include path prefix.
Defaults to https://storage.googleapis.com
.PUPPETEER_DOWNLOAD_PATH
- specifies the path for the downloads folder.
Defaults to <cache>/chromium
, where <cache>
is Puppeteer's cache
directory.PUPPETEER_BROWSER_REVISION
- specifies a certain version of the browser
you'd like Puppeteer to use. See
puppeteer.launch
on
how executable path is inferred.PUPPETEER_EXECUTABLE_PATH
- specifies an executable path to be used in
puppeteer.launch
.PUPPETEER_PRODUCT
- specifies which browser you'd like Puppeteer to use.
Must be either chrome
or firefox
. This can also be used during
installation to fetch the recommended browser binary. Setting product
programmatically in
puppeteer.launch
supersedes this environment variable.PUPPETEER_EXPERIMENTAL_CHROMIUM_MAC_ARM
— specify Puppeteer download
Chromium for Apple M1. On Apple M1 devices Puppeteer by default downloads the
version for Intel's processor which runs via Rosetta. It works without any
problems, however, with this option, you should get more efficient resource
usage (CPU and RAM) that could lead to a faster execution time.
Environment variables except for PUPPETEER_CACHE_DIR
are not used for
puppeteer-core
since core does not automatically handle
browser downloading.
puppeteer-core
Every release since v1.7.0 we publish two packages:
puppeteer
is a product for browser automation. When installed, it downloads
a version of Chromium, which it then drives using puppeteer-core
. Being an
end-user product, puppeteer
supports a bunch of convenient PUPPETEER_*
env
variables to tweak its behavior.
puppeteer-core
is a library to help drive anything that supports DevTools
protocol. puppeteer-core
doesn't download Chromium when installed. Being a
library, puppeteer-core
is fully driven through its programmatic interface.
You should only use puppeteer-core
if you are
connecting to a remote browser
or managing browsers yourself.
If you are managing browsers yourself, you will need to call
puppeteer.launch
with
an explicit
executablePath
or channel
.
When using puppeteer-core
, remember to change the import:
import puppeteer from 'puppeteer-core';
Usage
Puppeteer follows the latest
maintenance LTS version of
Node.
Puppeteer will be familiar to people using other browser testing frameworks. You
launch/connect
a browser,
create some
pages, and then manipulate them with
Puppeteer's API.
For more in-depth usage, check our guides and
examples.
Example
The following example searches
developers.google.com/web for articles
tagged "Headless Chrome" and scrape results from the results page.
import puppeteer from 'puppeteer';
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://developers.google.com/web/');
await page.type('.devsite-search-field', 'Headless Chrome');
const allResultsSelector = '.devsite-suggest-all-results';
await page.waitForSelector(allResultsSelector);
await page.click(allResultsSelector);
const resultsSelector = '.gsc-results .gs-title';
await page.waitForSelector(resultsSelector);
const links = await page.evaluate(resultsSelector => {
return [...document.querySelectorAll(resultsSelector)].map(anchor => {
const title = anchor.textContent.split('|')[0].trim();
return `${title} - ${anchor.href}`;
});
}, resultsSelector);
console.log(links.join('\n'));
await browser.close();
})();
Default runtime settings
1. Uses Headless mode
Puppeteer launches Chromium in
headless mode.
To launch a full version of Chromium, set the
headless
option when launching a browser:
const browser = await puppeteer.launch({headless: false});
2. Runs a bundled version of Chromium
By default, Puppeteer downloads and uses a specific version of Chromium so its
API is guaranteed to work out of the box. To use Puppeteer with a different
version of Chrome or Chromium, pass in the executable's path when creating a
Browser
instance:
const browser = await puppeteer.launch({executablePath: '/path/to/Chrome'});
You can also use Puppeteer with Firefox Nightly (experimental support). See
Puppeteer.launch
for
more information.
See
this article
for a description of the differences between Chromium and Chrome.
This article
describes some differences for Linux users.
3. Creates a fresh user profile
Puppeteer creates its own browser user profile which it cleans up on every
run.
Using Docker
See our guide on using Docker.
Using Chrome Extensions
See our
guide on using Chrome extensions.
Resources
Contributing
Check out our contributing guide to get an
overview of Puppeteer development.
FAQ
Our FAQ has migrated to
our site.