Security News
The Push to Ban Ransom Payments Is Gaining Momentum
Ransomware costs victims an estimated $30 billion per year and has gotten so out of control that global support for banning payments is gaining momentum.
Readme
Python binding for curl-impersonate via cffi.
Unlike other pure python http clients like httpx
or requests
, curl_cffi
can
impersonate browsers' TLS/JA3 and HTTP/2 fingerprints. If you are blocked by some
website for no obvious reason, you can give curl_cffi
a try.
Scrapfly is an enterprise-grade solution providing Web Scraping API that aims to simplify the scraping process by managing everything: real browser rendering, rotating proxies, and fingerprints (TLS, HTTP, browser) to bypass all major anti-bots. Scrapfly also unlocks the observability by providing an analytical dashboard and measuring the success rate/block rate in detail.
Scrapfly is a good solution if you are looking for a cloud-managed solution for curl_cffi
.
If you are managing TLS/HTTP fingerprint by yourself with curl_cffi
, they also maintain
this tool to convert curl
command into python curl_cffi code!
asyncio
with proxy rotation on each request.requests | aiohttp | httpx | pycurl | curl_cffi | |
---|---|---|---|---|---|
http2 | ❌ | ❌ | ✅ | ✅ | ✅ |
sync | ✅ | ❌ | ✅ | ✅ | ✅ |
async | ❌ | ✅ | ✅ | ❌ | ✅ |
websocket | ❌ | ✅ | ❌ | ❌ | ✅ |
fingerprints | ❌ | ❌ | ❌ | ❌ | ✅ |
speed | 🐇 | 🐇🐇 | 🐇 | 🐇🐇 | 🐇🐇 |
pip install curl_cffi --upgrade
This should work on Linux, macOS and Windows out of the box.
If it does not work on you platform, you may need to compile and install curl-impersonate
first and set some environment variables like LD_LIBRARY_PATH
.
To install beta releases:
pip install curl_cffi --upgrade --pre
To install unstable version from GitHub:
git clone https://github.com/yifeikong/curl_cffi/
cd curl_cffi
make preprocess
pip install .
curl_cffi
comes with a low-level curl
API and a high-level requests
-like API.
Use the latest impersonate versions, do NOT copy chrome110
here without changing.
from curl_cffi import requests
# Notice the impersonate parameter
r = requests.get("https://tools.scrapfly.io/api/fp/ja3", impersonate="chrome110")
print(r.json())
# output: {..., "ja3n_hash": "aa56c057ad164ec4fdcb7a5a283be9fc", ...}
# the js3n fingerprint should be the same as target browser
# To keep using the latest browser version as `curl_cffi` updates,
# simply set impersonate="chrome" without specifying a version.
# Other similar values are: "safari" and "safari_ios"
r = requests.get("https://tools.scrapfly.io/api/fp/ja3", impersonate="chrome")
# http/socks proxies are supported
proxies = {"https": "http://localhost:3128"}
r = requests.get("https://tools.scrapfly.io/api/fp/ja3", impersonate="chrome110", proxies=proxies)
proxies = {"https": "socks://localhost:3128"}
r = requests.get("https://tools.scrapfly.io/api/fp/ja3", impersonate="chrome110", proxies=proxies)
s = requests.Session()
# httpbin is a http test website, this endpoint makes the server set cookies
s.get("https://httpbin.org/cookies/set/foo/bar")
print(s.cookies)
# <Cookies[<Cookie foo=bar for httpbin.org />]>
# retrieve cookies again to verify
r = s.get("https://httpbin.org/cookies")
print(r.json())
# {'cookies': {'foo': 'bar'}}
Supported impersonate versions, as supported by my fork of curl-impersonate:
However, only Chrome-like browsers are supported. Firefox support is tracked in #59.
Notes:
0.6.0
.0.6.0
, previous http2 fingerprints were not correct.from curl_cffi.requests import AsyncSession
async with AsyncSession() as s:
r = await s.get("https://example.com")
More concurrency:
import asyncio
from curl_cffi.requests import AsyncSession
urls = [
"https://google.com/",
"https://facebook.com/",
"https://twitter.com/",
]
async with AsyncSession() as s:
tasks = []
for url in urls:
task = s.get(url)
tasks.append(task)
results = await asyncio.gather(*tasks)
from curl_cffi.requests import Session, WebSocket
def on_message(ws: WebSocket, message):
print(message)
with Session() as s:
ws = s.ws_connect(
"wss://api.gemini.com/v1/marketdata/BTCUSD",
on_message=on_message,
)
ws.run_forever()
For low-level APIs, Scrapy integration and other advanced topics, see the docs for more details.
Yescaptcha is a proxy service that bypasses Cloudflare and uses the API interface to obtain verified cookies (e.g. cf_clearance
). Click here to register: https://yescaptcha.com/i/stfnIO
ScrapeNinja is a web scraping API with two engines: fast, with high performance and TLS fingerprint; and slower with a real browser under the hood.
ScrapeNinja handles headless browsers, proxies, timeouts, retries, and helps with data extraction, so you can just get the data in JSON. Rotating proxies are available out of the box on all subscription plans.
FAQs
libcurl ffi bindings for Python, with impersonation support.
We found that curl-cffi demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Ransomware costs victims an estimated $30 billion per year and has gotten so out of control that global support for banning payments is gaining momentum.
Application Security
New SEC disclosure rules aim to enforce timely cyber incident reporting, but fear of job loss and inadequate resources lead to significant underreporting.
Security News
The Python Software Foundation has secured a 5-year sponsorship from Fastly that supports PSF's activities and events, most notably the security and reliability of the Python Package Index (PyPI).