Security News
Fluent Assertions Faces Backlash After Abandoning Open Source Licensing
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
ScrapingBee is a web scraping API that handles headless browsers and rotates proxies for you. The Python SDK makes it easier to interact with ScrapingBee's API.
You can install ScrapingBee Python SDK with pip.
pip install scrapingbee
The ScrapingBee Python SDK is a wrapper around the requests library. ScrapingBee supports GET and POST requests.
Signup to ScrapingBee to get your API key and some free credits to get started.
>>> from scrapingbee import ScrapingBeeClient
>>> client = ScrapingBeeClient(api_key='REPLACE-WITH-YOUR-API-KEY')
>>> response = client.get(
'https://www.scrapingbee.com/blog/',
params={
# Block ads on the page you want to scrape
'block_ads': False,
# Block images and CSS on the page you want to scrape
'block_resources': True,
# Premium proxy geolocation
'country_code': '',
# Control the device the request will be sent from
'device': 'desktop',
# Use some data extraction rules
'extract_rules': {'title': 'h1'},
# Wrap response in JSON
'json_response': False,
# Interact with the webpage you want to scrape
'js_scenario': {
"instructions": [
{"wait_for": "#slow_button"},
{"click": "#slow_button"},
{"scroll_x": 1000},
{"wait": 1000},
{"scroll_x": 1000},
{"wait": 1000},
]
},
# Use premium proxies to bypass difficult to scrape websites (10-25 credits/request)
'premium_proxy': False,
# Execute JavaScript code with a Headless Browser (5 credits/request)
'render_js': True,
# Return the original HTML before the JavaScript rendering
'return_page_source': False,
# Return page screenshot as a png image
'screenshot': False,
# Take a full page screenshot without the window limitation
'screenshot_full_page': False,
# Transparently return the same HTTP code of the page requested.
'transparent_status_code': False,
# Wait, in miliseconds, before returning the response
'wait': 0,
# Wait for CSS selector before returning the response, ex ".title"
'wait_for': '',
# Set the browser window width in pixel
'window_width': 1920,
# Set the browser window height in pixel
'window_height': 1080
},
headers={
# Forward custom headers to the target website
"key": "value"
},
cookies={
# Forward custom cookies to the target website
"name": "value"
}
)
>>> response.text
'<!DOCTYPE html><html lang="en"><head>...'
ScrapingBee takes various parameters to render JavaScript, execute a custom JavaScript script, use a premium proxy from a specific geolocation and more.
You can find all the supported parameters on ScrapingBee's documentation.
You can send custom cookies and headers like you would normally do with the requests library.
Here a little exemple on how to retrieve and store a screenshot from the ScrapingBee blog in its mobile resolution.
>>> from scrapingbee import ScrapingBeeClient
>>> client = ScrapingBeeClient(api_key='REPLACE-WITH-YOUR-API-KEY')
>>> response = client.get(
'https://www.scrapingbee.com/blog/',
params={
# Take a screenshot
'screenshot': True,
# Specify that we need the full height
'screenshot_full_page': True,
# Specify a mobile width in pixel
'window_width': 375
}
)
>>> if response.ok:
with open("./scrapingbee_mobile.png", "wb") as f:
f.write(response.content)
Scrapy is the most popular Python web scraping framework. You can easily integrate ScrapingBee's API with the Scrapy middleware.
The client includes a retry mechanism for 5XX responses.
>>> from scrapingbee import ScrapingBeeClient
>>> client = ScrapingBeeClient(api_key='REPLACE-WITH-YOUR-API-KEY')
>>> response = client.get(
'https://www.scrapingbee.com/blog/',
params={
'render_js': True,
},
retries=5
)
FAQs
ScrapingBee Python SDK
We found that scrapingbee demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
Research
Security News
Socket researchers uncover the risks of a malicious Python package targeting Discord developers.
Security News
The UK is proposing a bold ban on ransomware payments by public entities to disrupt cybercrime, protect critical services, and lead global cybersecurity efforts.