Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
scrapy-fake-useragent
Advanced tools
.. image:: https://travis-ci.org/alecxe/scrapy-fake-useragent.svg?branch=master :target: https://travis-ci.org/alecxe/scrapy-fake-useragent
.. image:: https://codecov.io/gh/alecxe/scrapy-fake-useragent/branch/master/graph/badge.svg :target: https://codecov.io/gh/alecxe/scrapy-fake-useragent
.. image:: https://img.shields.io/pypi/pyversions/scrapy-fake-useragent.svg :target: https://pypi.python.org/pypi/scrapy-fake-useragent :alt: PyPI version
.. image:: https://badge.fury.io/py/scrapy-fake-useragent.svg :target: http://badge.fury.io/py/scrapy-fake-useragent :alt: PyPI version
.. image:: https://requires.io/github/alecxe/scrapy-fake-useragent/requirements.svg?branch=master :target: https://requires.io/github/alecxe/scrapy-fake-useragent/requirements/?branch=master :alt: Requirements Status
.. image:: https://img.shields.io/badge/license-MIT-blue.svg :target: https://github.com/alecxe/scrapy-fake-useragent/blob/master/LICENSE.txt :alt: Package license
Random User-Agent middleware for Scrapy scraping framework based on
fake-useragent <https://pypi.python.org/pypi/fake-useragent>
, which picks up User-Agent
strings
based on usage statistics <http://www.w3schools.com/browsers/browsers_stats.asp>
from a real world database <http://useragentstring.com/>
, but also has the option to configure a generator
of fake UA strings, as a backup, powered by
Faker <https://faker.readthedocs.io/en/stable/providers/faker.providers.user_agent.html>
.
It also has the possibility of extending the capabilities of the middleware, by adding your own providers.
Please see CHANGELOG
_.
The simplest way is to install it via pip
:
pip install scrapy-fake-useragent
Turn off the built-in UserAgentMiddleware
and RetryMiddleware
and add
RandomUserAgentMiddleware
and RetryUserAgentMiddleware
.
In Scrapy >=1.0:
.. code:: python
DOWNLOADER_MIDDLEWARES = {
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
'scrapy.downloadermiddlewares.retry.RetryMiddleware': None,
'scrapy_fake_useragent.middleware.RandomUserAgentMiddleware': 400,
'scrapy_fake_useragent.middleware.RetryUserAgentMiddleware': 401,
}
In Scrapy <1.0:
.. code:: python
DOWNLOADER_MIDDLEWARES = {
'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None,
'scrapy.contrib.downloadermiddleware.retry.RetryMiddleware': None,
'scrapy_fake_useragent.middleware.RandomUserAgentMiddleware': 400,
'scrapy_fake_useragent.middleware.RetryUserAgentMiddleware': 401,
}
Recommended setting (1.3.0+):
.. code:: python
FAKEUSERAGENT_PROVIDERS = [
'scrapy_fake_useragent.providers.FakeUserAgentProvider', # this is the first provider we'll try
'scrapy_fake_useragent.providers.FakerProvider', # if FakeUserAgentProvider fails, we'll use faker to generate a user-agent string for us
'scrapy_fake_useragent.providers.FixedUserAgentProvider', # fall back to USER_AGENT value
]
USER_AGENT = '<your user agent string which you will fall back to if all other providers fail>'
The package comes with a thin abstraction layer of User-Agent providers, which for purposes of backwards compatibility defaults to:
.. code:: python
FAKEUSERAGENT_PROVIDERS = [
'scrapy_fake_useragent.providers.FakeUserAgentProvider'
]
The package has also FakerProvider
(powered by Faker library <https://faker.readthedocs.io/>
__) and FixedUserAgentProvider
implemented and available for use if needed.
Each provider is enabled individually, and used in the order they are defined.
In case a provider fails execute (for instance, it can happen <https://github.com/hellysmile/fake-useragent/issues/99>
__ to fake-useragent because of it's dependency
with an online service), the next one will be used.
Example of what FAKEUSERAGENT_PROVIDERS
setting may look like in your case:
.. code:: python
FAKEUSERAGENT_PROVIDERS = [
'scrapy_fake_useragent.providers.FakeUserAgentProvider',
'scrapy_fake_useragent.providers.FakerProvider',
'scrapy_fake_useragent.providers.FixedUserAgentProvider',
'mypackage.providers.CustomProvider'
]
Parameter: FAKE_USERAGENT_RANDOM_UA_TYPE
defaulting to random
.
Other options, as example:
firefox
to mimic only firefox browsersdesktop
or mobile
values to send desktop or mobile strings respectively.You can also set the FAKEUSERAGENT_FALLBACK
option, which is a fake-useragent
specific fallback. For example:
.. code:: python
FAKEUSERAGENT_FALLBACK = 'Mozilla/5.0 (Android; Mobile; rv:40.0)'
What it does is, if the selected FAKE_USERAGENT_RANDOM_UA_TYPE
fails to retrieve a UA, it will use
the type set in FAKEUSERAGENT_FALLBACK
.
Parameter: FAKER_RANDOM_UA_TYPE
defaulting to user_agent
which is the way of selecting totally random User-Agents values.
Other options, as example:
chrome
firefox
It also comes with a fixed provider (only provides one user agent), reusing the Scrapy's default USER_AGENT
setting value.
scrapy-proxies
To use with middlewares of random proxy such as scrapy-proxies <https://github.com/aivarsk/scrapy-proxies>
_, you need:
set RANDOM_UA_PER_PROXY
to True to allow switch per proxy
set priority of RandomUserAgentMiddleware
to be greater than scrapy-proxies
, so that proxy is set before handle UA
The package is under MIT license. Please see LICENSE
_.
.. |GitHub version| image:: https://badge.fury.io/gh/alecxe%2Fscrapy-fake-useragent.svg :target: http://badge.fury.io/gh/alecxe%2Fscrapy-fake-useragent .. |Requirements Status| image:: https://requires.io/github/alecxe/scrapy-fake-useragent/requirements.svg?branch=master :target: https://requires.io/github/alecxe/scrapy-fake-useragent/requirements/?branch=master .. _LICENSE: https://github.com/alecxe/scrapy-fake-useragent/blob/master/LICENSE.txt .. _CHANGELOG: https://github.com/alecxe/scrapy-fake-useragent/blob/master/CHANGELOG.rst
FAQs
Use a random User-Agent provided by fake-useragent for every request
We found that scrapy-fake-useragent demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.