You're Invited:Meet the Socket Team at BlackHat and DEF CON in Las Vegas, Aug 4-6.RSVP
Socket
Book a DemoInstallSign in
Socket

scrapy-impersonate

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

scrapy-impersonate

Scrapy download handler that can impersonate browser fingerprints

1.6.0
pipPyPI
Maintainers
1

scrapy-impersonate

version

scrapy-impersonate is a Scrapy download handler. This project integrates curl_cffi to perform HTTP requests, so it can impersonate browsers' TLS signatures or JA3 fingerprints.

Installation

pip install scrapy-impersonate

Activation

To use this package, replace the default http and https Download Handlers by updating the DOWNLOAD_HANDLERS setting:

DOWNLOAD_HANDLERS = {
    "http": "scrapy_impersonate.ImpersonateDownloadHandler",
    "https": "scrapy_impersonate.ImpersonateDownloadHandler",
}

By setting USER_AGENT = None, curl_cffi will automatically choose the appropriate User-Agent based on the impersonated browser:

USER_AGENT = None

Also, be sure to install the asyncio-based Twisted reactor for proper asynchronous execution:

TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"

Usage

Set the impersonate Request.meta key to download a request using curl_cffi:

import scrapy


class ImpersonateSpider(scrapy.Spider):
    name = "impersonate_spider"
    custom_settings = {
        "TWISTED_REACTOR": "twisted.internet.asyncioreactor.AsyncioSelectorReactor",
        "USER_AGENT": None,
        "DOWNLOAD_HANDLERS": {
            "http": "scrapy_impersonate.ImpersonateDownloadHandler",
            "https": "scrapy_impersonate.ImpersonateDownloadHandler",
        },
        "DOWNLOADER_MIDDLEWARES": {
            "scrapy_impersonate.RandomBrowserMiddleware": 1000,
        },
    }

    def start_requests(self):
        for _ in range(5):
            yield scrapy.Request(
                "https://tls.browserleaks.com/json",
                dont_filter=True,
            )

    def parse(self, response):
        # ja3_hash: 98cc085d47985d3cca9ec1415bbbf0d1 (chrome133a)
        # ja3_hash: 2d692a4485ca2f5f2b10ecb2d2909ad3 (firefox133)
        # ja3_hash: c11ab92a9db8107e2a0b0486f35b80b9 (chrome124)
        # ja3_hash: 773906b0efdefa24a7f2b8eb6985bf37 (safari15_5)
        # ja3_hash: cd08e31494f9531f560d64c695473da9 (chrome99_android)

        yield {"ja3_hash": response.json()["ja3_hash"]}

impersonate-args

You can pass any necessary arguments to curl_cffi through impersonate_args. For example:

yield scrapy.Request(
    "https://tls.browserleaks.com/json",
    dont_filter=True,
    meta={
        "impersonate": browser,
        "impersonate_args": {
            "verify": False,
            "timeout": 10,
        },
    },
)

Supported browsers

The following browsers can be impersonated

BrowserVersionBuildOSName
Chrome9999.0.4844.51Windows 10chrome99
Chrome9999.0.4844.73Android 12chrome99_android
Chrome100100.0.4896.75Windows 10chrome100
Chrome101101.0.4951.67Windows 10chrome101
Chrome104104.0.5112.81Windows 10chrome104
Chrome107107.0.5304.107Windows 10chrome107
Chrome110110.0.5481.177Windows 10chrome110
Chrome116116.0.5845.180Windows 10chrome116
Chrome119119.0.6045.199macOS Sonomachrome119
Chrome120120.0.6099.109macOS Sonomachrome120
Chrome123123.0.6312.124macOS Sonomachrome123
Chrome124124.0.6367.60macOS Sonomachrome124
Chrome131131.0.6778.86macOS Sonomachrome131
Chrome131131.0.6778.81Android 14chrome131_android
Chrome133133.0.6943.55macOS Sequoiachrome133a
Edge9999.0.1150.30Windows 10edge99
Edge101101.0.1210.47Windows 10edge101
Safari15.316612.4.9.1.8MacOS Big Sursafari15_3
Safari15.517613.2.7.1.8MacOS Montereysafari15_5
Safari17.0unclearMacOS Sonomasafari17_0
Safari17.2uncleariOS 17.2safari17_2_ios
Safari18.0unclearMacOS Sequoiasafari18_0
Safari18.0uncleariOS 18.0safari18_0_ios
Firefox133.0133.0.3macOS Sonomafirefox133
Firefox135.0135.0.1macOS Sonomafirefox135

Thanks

This project is inspired by the following projects:

  • curl_cffi - Python binding for curl-impersonate via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.
  • curl-impersonate - A special build of curl that can impersonate Chrome & Firefox
  • scrapy-playwright - Playwright integration for Scrapy

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts