Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More →

puppeteer

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

puppeteer

A high-level API to control headless Chrome over the DevTools Protocol

22.1.0
Source
npm

Version published: 9 months ago

Maintainers: 2

Created: 12 years ago

What is puppeteer?

Puppeteer is a Node library which provides a high-level API to control Chrome or Chromium over the DevTools Protocol. It is primarily used for automating web browser actions, such as taking screenshots, generating pre-rendered content, and automating form submissions, among other things.

What are puppeteer's main functionalities?

Web Scraping

Puppeteer can be used to scrape content from web pages by programmatically navigating to the page and extracting the required data.

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://example.com');
  const data = await page.evaluate(() => document.querySelector('*').outerHTML);
  console.log(data);
  await browser.close();
})();

Automated Testing

Puppeteer can automate form submissions and simulate user actions for testing web applications.

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://example.com/login');
  await page.type('#username', 'user');
  await page.type('#password', 'pass');
  await page.click('#submit');
  // Check for successful login
  await page.waitForSelector('#logout');
  await browser.close();
})();

PDF Generation

Puppeteer can generate PDFs from web pages, which is useful for creating reports, invoices, and other printable documents.

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://example.com', {waitUntil: 'networkidle2'});
  await page.pdf({path: 'example.pdf', format: 'A4'});
  await browser.close();
})();

Screenshot Capture

Puppeteer can take screenshots of web pages, either of the full page or specific elements, which is useful for capturing the state of a page for documentation or testing.

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://example.com');
  await page.screenshot({path: 'example.png'});
  await browser.close();
})();

Other packages similar to puppeteer

Puppeteer

Guides | API | FAQ | Contributing | Troubleshooting

Puppeteer is a Node.js library which provides a high-level API to control Chrome/Chromium over the DevTools Protocol. Puppeteer runs in headless mode by default, but can be configured to run in full ("headful") Chrome/Chromium.

What can I do?

Most things that you can do manually in the browser can be done using Puppeteer! Here are a few examples to get you started:

Generate screenshots and PDFs of pages.
Crawl a SPA (Single-Page Application) and generate pre-rendered content (i.e. "SSR" (Server-Side Rendering)).
Automate form submission, UI testing, keyboard input, etc.
Create an automated testing environment using the latest JavaScript and browser features.
Capture a timeline trace of your site to help diagnose performance issues.
Test Chrome Extensions.

Getting Started

Installation

To use Puppeteer in your project, run:

npm i puppeteer
# or using yarn
yarn add puppeteer
# or using pnpm
pnpm i puppeteer

When you install Puppeteer, it automatically downloads a recent version of Chrome for Testing (~170MB macOS, ~282MB Linux, ~280MB Windows) and a chrome-headless-shell binary (starting with Puppeteer v21.6.0) that is guaranteed to work with Puppeteer. The browser is downloaded to the $HOME/.cache/puppeteer folder by default (starting with Puppeteer v19.0.0). See configuration for configuration options and environmental variables to control the download behavor.

If you deploy a project using Puppeteer to a hosting provider, such as Render or Heroku, you might need to reconfigure the location of the cache to be within your project folder (see an example below) because not all hosting providers include $HOME/.cache into the project's deployment.

For a version of Puppeteer without the browser installation, see puppeteer-core.

If used with TypeScript, the minimum supported TypeScript version is 4.7.4.

Configuration

Puppeteer uses several defaults that can be customized through configuration files.

For example, to change the default cache directory Puppeteer uses to install browsers, you can add a .puppeteerrc.cjs (or puppeteer.config.cjs) at the root of your application with the contents

const {join} = require('path');

/**
 * @type {import("puppeteer").Configuration}
 */
module.exports = {
  // Changes the cache location for Puppeteer.
  cacheDirectory: join(__dirname, '.cache', 'puppeteer'),
};

After adding the configuration file, you will need to remove and reinstall puppeteer for it to take effect.

See the configuration guide for more information.

`puppeteer-core`

For every release since v1.7.0 we publish two packages:

puppeteer is a product for browser automation. When installed, it downloads a version of Chrome, which it then drives using puppeteer-core. Being an end-user product, puppeteer automates several workflows using reasonable defaults that can be customized.

puppeteer-core is a library to help drive anything that supports DevTools protocol. Being a library, puppeteer-core is fully driven through its programmatic interface implying no defaults are assumed and puppeteer-core will not download Chrome when installed.

You should use puppeteer-core if you are connecting to a remote browser or managing browsers yourself. If you are managing browsers yourself, you will need to call puppeteer.launch with an explicit executablePath (or channel if it's installed in a standard location).

When using puppeteer-core, remember to change the import:

import puppeteer from 'puppeteer-core';

Usage

Puppeteer follows the latest maintenance LTS version of Node.

Puppeteer will be familiar to people using other browser testing frameworks. You launch/connect a browser, create some pages, and then manipulate them with Puppeteer's API.

For more in-depth usage, check our guides and examples.

Example

The following example searches developer.chrome.com for blog posts with text "automate beyond recorder", click on the first result and print the full title of the blog post.

import puppeteer from 'puppeteer';

(async () => {
  // Launch the browser and open a new blank page
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  // Navigate the page to a URL
  await page.goto('https://developer.chrome.com/');

  // Set screen size
  await page.setViewport({width: 1080, height: 1024});

  // Type into search box
  await page.type('.devsite-search-field', 'automate beyond recorder');

  // Wait and click on first result
  const searchResultSelector = '.devsite-result-item-link';
  await page.waitForSelector(searchResultSelector);
  await page.click(searchResultSelector);

  // Locate the full title with a unique string
  const textSelector = await page.waitForSelector(
    'text/Customize and automate'
  );
  const fullTitle = await textSelector?.evaluate(el => el.textContent);

  // Print the full title
  console.log('The title of this blog post is "%s".', fullTitle);

  await browser.close();
})();

Default runtime settings

1. Uses Headless mode

By default Puppeteer launches Chrome in the Headless mode.

const browser = await puppeteer.launch();
// Equivalent to
const browser = await puppeteer.launch({headless: true});

Before v22, Puppeteer launched the old Headless mode by default. The old headless mode is now known as chrome-headless-shell and ships as a separate binary. chrome-headless-shell does not match the behavior of the regular Chrome completely but it is currently more performant for automation tasks where the complete Chrome feature set is not needed. If the performance is more important for your use case, switch to chrome-headless-shell as following:

const browser = await puppeteer.launch({headless: 'shell'});

To launch a "headful" version of Chrome, set the headless to false option when launching a browser:

const browser = await puppeteer.launch({headless: false});

2. Runs a bundled version of Chrome

By default, Puppeteer downloads and uses a specific version of Chrome so its API is guaranteed to work out of the box. To use Puppeteer with a different version of Chrome or Chromium, pass in the executable's path when creating a Browser instance:

const browser = await puppeteer.launch({executablePath: '/path/to/Chrome'});

You can also use Puppeteer with Firefox. See status of cross-browser support for more information.

See this article for a description of the differences between Chromium and Chrome. This article describes some differences for Linux users.

3. Creates a fresh user profile

Puppeteer creates its own browser user profile which it cleans up on every run.

Using Docker

See our Docker guide.

Using Chrome Extensions

See our Chrome extensions guide.

Resources

Contributing

Check out our contributing guide to get an overview of Puppeteer development.

FAQ

Our FAQ has migrated to our site.

Keywords

FAQs

What is puppeteer?

Is puppeteer well maintained?

Package last updated on 17 Feb 2024

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

puppeteer

What is puppeteer?

What are puppeteer's main functionalities?

Other packages similar to puppeteer

playwright

selenium-webdriver

nightmare

webdriverio

Puppeteer

Guides | API | FAQ | Contributing | Troubleshooting

What can I do?

Getting Started

Installation

Configuration

puppeteer-core

Usage

Example

Default runtime settings

Using Docker

Using Chrome Extensions

Resources

Contributing

FAQ

Keywords

Related posts

Massive npm Malware Campaign Leverages Ethereum Smart Contracts To Evade Detection and Maintain Control

Author Typosquatting on npm: Attackers Impersonate Sindre Sorhus with Malicious ‘chalk-node’ Package

Supply Chain Attack on LottieFiles Player Caused by Compromised npmjs Credentials

`puppeteer-core`