Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Random User-Agent middleware picks up User-Agent
strings based on Python User Agents <https://github.com/selwin/python-user-agents>
__ and MDN <https://developer.mozilla.org/en-US/docs/Web/HTTP/Browser_detection_using_the_user_agent>
__.
The simplest way is to install it via pip
:
pip install scrapy-user-agents
Turn off the built-in UserAgentMiddleware
and add
RandomUserAgentMiddleware
.
In Scrapy >=1.0:
.. code:: python
DOWNLOADER_MIDDLEWARES = {
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
'scrapy_user_agents.middlewares.RandomUserAgentMiddleware': 400,
}
In Scrapy <1.0:
.. code:: python
DOWNLOADER_MIDDLEWARES = {
'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None,
'scrapy_user_agents.middlewares.RandomUserAgentMiddleware': 400,
}
A default User-Agent file is included in this repository, it contains about 2200 user agent strings collected from https://developers.whatismybrowser.com/ using https://github.com/hyan15/crawler-demo/tree/master/crawling-basic/common_user_agents. You can supply your own User-Agent file by set RANDOM_UA_FILE
.
There's a configuration parameter RANDOM_UA_TYPE
in format <device_type>.<browser_type>
, default is desktop.chrome
. For device_type part, only desktop
, mobile
, tablet
are supported. For browser_type part, only chrome
, firefox
, safari
, ie
, safari
are supported. If you don't want to fix to only one browser type, you can use random
to choose from all browser types.
You can set RANDOM_UA_SAME_OS_FAMILY
to True to just use user agents that belong to the same os family, such as windows, mac os, linux, or android, ios, etc. Default value is True.
scrapy-proxies
To use with middlewares of random proxy such as scrapy-proxies <https://github.com/aivarsk/scrapy-proxies>
_, you need:
set RANDOM_UA_PER_PROXY
to True to allow switch per proxy
set priority of RandomUserAgentMiddleware
to be greater than scrapy-proxies
, so that proxy is set before handle UA
There's a configuration parameter FAKEUSERAGENT_FALLBACK
defaulting to
None
. You can set it to a string value, for example Mozilla
or
Your favorite browser
, this configuration can completely disable any
annoying exception.
FAQs
Automatically pick an User-Agent for every request
We found that scrapy-user-agents demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.