scrapy-impersonate
scrapy-impersonate
is a Scrapy download handler. This project integrates curl_cffi to perform HTTP requests, so it can impersonate browsers' TLS signatures or JA3 fingerprints.
Installation
pip install scrapy-impersonate
Activation
Replace the default http
and/or https
Download Handlers through DOWNLOAD_HANDLERS
DOWNLOAD_HANDLERS = {
"http": "scrapy_impersonate.ImpersonateDownloadHandler",
"https": "scrapy_impersonate.ImpersonateDownloadHandler",
}
Also, be sure to install the asyncio-based Twisted reactor:
TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
Basic usage
Set the impersonate
Request.meta key to download a request using curl_cffi
:
import scrapy
class ImpersonateSpider(scrapy.Spider):
name = "impersonate_spider"
custom_settings = {
"DOWNLOAD_HANDLERS": {
"http": "scrapy_impersonate.ImpersonateDownloadHandler",
"https": "scrapy_impersonate.ImpersonateDownloadHandler",
},
"TWISTED_REACTOR": "twisted.internet.asyncioreactor.AsyncioSelectorReactor",
}
def start_requests(self):
for browser in ["chrome110", "edge99", "safari15_5"]:
yield scrapy.Request(
"https://tls.browserleaks.com/json",
dont_filter=True,
meta={"impersonate": browser},
)
def parse(self, response):
yield {"ja3_hash": response.json()["ja3_hash"]}
Supported browsers
The following browsers can be impersonated
Browser | Version | Build | OS | Name |
---|
| 99 | 99.0.4844.51 | Windows 10 | chrome99 |
| 99 | 99.0.4844.73 | Android 12 | chrome99_android |
| 100 | 100.0.4896.75 | Windows 10 | chrome100 |
| 101 | 101.0.4951.67 | Windows 10 | chrome101 |
| 104 | 104.0.5112.81 | Windows 10 | chrome104 |
| 107 | 107.0.5304.107 | Windows 10 | chrome107 |
| 110 | 110.0.5481.177 | Windows 10 | chrome110 |
| 116 | 116.0.5845.180 | Windows 10 | chrome116 |
| 119 | 119.0.6045.199 | macOS Sonoma | chrome119 |
| 120 | 120.0.6099.109 | macOS Sonoma | chrome120 |
| 123 | 123.0.6312.124 | macOS Sonoma | chrome123 |
| 124 | 124.0.6367.60 | macOS Sonoma | chrome124 |
| 99 | 99.0.1150.30 | Windows 10 | edge99 |
| 101 | 101.0.1210.47 | Windows 10 | edge101 |
| 15.3 | 16612.4.9.1.8 | MacOS Big Sur | safari15_3 |
| 15.5 | 17613.2.7.1.8 | MacOS Monterey | safari15_5 |
| 17.0 | unclear | MacOS Sonoma | safari17_0 |
| 17.2 | unclear | iOS 17.2 | safari17_2_ios |
Thanks
This project is inspired by the following projects:
- curl_cffi - Python binding for curl-impersonate via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.
- curl-impersonate - A special build of curl that can impersonate Chrome & Firefox
- scrapy-playwright - Playwright integration for Scrapy