You're Invited:Meet the Socket Team at BlackHat and DEF CON in Las Vegas, Aug 4-6.RSVP
Socket
Book a DemoInstallSign in
Socket

langchain-zenrows

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

langchain-zenrows

A LangChain integration tool that provides reliable web scraping capabilities at any scale using ZenRows' Universal Scraper API

0.1.0
pipPyPI
Maintainers
1

langchain-zenrows

The langchain-zenrows integration tool enables LangChain agents to scrape and access web content at any scale using ZenRows' enterprise-grade infrastructure.

Whether you need to scrape JavaScript-heavy single-page applications, bypass anti-bot systems, access geo-restricted content, or extract structured data at scale, this integration provides the tools and reliability needed for modern AI applications.

Table of Contents

  • Installation
  • Usage
  • API Reference
  • Features
  • License

Installation

pip install langchain-zenrows

Usage

To use the ZenRows Universal Scraper with LangChain, you'll need a ZenRows API key. You can sign up for free at ZenRows.

For more comprehensive examples and use cases, see the examples/ folder.

Basic Usage

import os
from langchain_zenrows import ZenRowsUniversalScraper

# Set your ZenRows API key
os.environ["ZENROWS_API_KEY"] = "<YOUR_ZENROWS_API_KEY>"

# Initialize the tool
scraper = ZenRowsUniversalScraper()

# Scrape a simple webpage
result = scraper.invoke({"url": "https://httpbin.io/html"})
print(result)

Advanced Usage with Parameters

import os
from langchain_zenrows import ZenRowsUniversalScraper

# Set your ZenRows API key
os.environ["ZENROWS_API_KEY"] = "<YOUR_ZENROWS_API_KEY>"

scraper = ZenRowsUniversalScraper()

# Scrape with JavaScript rendering and premium proxies
result = scraper.invoke({
    "url": "https://www.scrapingcourse.com/ecommerce/",
    "js_render": True,
    "premium_proxy": True,
    "proxy_country": "us",
    "response_type": "markdown",
    "wait": 2000  # Wait 2 seconds after page load
})

print(result)

See the API Reference section below for more available parameters and customizing scraping requests.

Using with LangChain Agents

from langchain_zenrows import ZenRowsUniversalScraper
from langchain_openai import ChatOpenAI  # or your preferred LLM
from langgraph.prebuilt import create_react_agent
import os

# Set your ZenRows API key
os.environ["ZENROWS_API_KEY"] = "<YOUR_ZENROWS_API_KEY>"
os.environ["OPENAI_API_KEY"] = "<YOUR_OPEN_AI_API_KEY>"


# Initialize components
llm = ChatOpenAI(model="gpt-4o-mini")
zenrows_tool = ZenRowsUniversalScraper()

# Create agent
agent = create_react_agent(llm, [zenrows_tool])

# Use the agent
result = agent.invoke(
    {
        "messages": "Scrape https://news.ycombinator.com/ and list the top 3 stories with title, points, comments, username, and time."
    }
)

print("Agent Response:")
for message in result["messages"]:
    print(f"{message.content}")

CSS Extraction

Extract specific data using CSS selectors:

import json
import os
from langchain_zenrows import ZenRowsUniversalScraper

# Set your ZenRows API key
os.environ["ZENROWS_API_KEY"] = "<YOUR_ZENROWS_API_KEY>"

scraper = ZenRowsUniversalScraper()

# Extract specific elements
css_selector = json.dumps({
    "title": "h1",
    "paragraphs": "p"
})

result = scraper.invoke({
    "url": "https://httpbin.io/html",
    "css_extractor": css_selector
})

Premium Proxy with Geo-targeting

Access geo-restricted content:

import os
from langchain_zenrows import ZenRowsUniversalScraper

# Set your ZenRows API key
os.environ["ZENROWS_API_KEY"] = "<YOUR_ZENROWS_API_KEY>"

scraper = ZenRowsUniversalScraper()

# Check your IP location
result = scraper.invoke({
    "url": "https://httpbin.io/ip",
    "premium_proxy": True,
    "proxy_country": "us"
})
print(result)  # Shows the US IP being used

API Reference

ZenRowsUniversalScraper

Main tool class for web scraping with ZenRows.

Parameters:

  • zenrows_api_key (str, optional): Your ZenRows API key. If not provided, looks for ZENROWS_API_KEY environment variable.

Input Schema:

For complete parameter documentation and details, see the official ZenRows API Reference.

ParameterTypeDescription
urlstrRequired. The URL to scrape
js_renderboolEnable JavaScript rendering with a headless browser. Essential for modern web apps, SPAs, and sites with dynamic content (default: False)
js_instructionsstrExecute custom JavaScript on the page to interact with elements, scroll, click buttons, or manipulate content
premium_proxyboolUse residential IPs to bypass anti-bot protection. Essential for accessing protected sites (default: False)
proxy_countrystrSet the country of the IP used for the request. Use for accessing geo-restricted content. Two-letter country code
session_idintMaintain the same IP for multiple requests for up to 10 minutes. Essential for multi-step processes
custom_headersdictInclude custom headers in your request to mimic browser behavior
wait_forstrWait for a specific CSS Selector to appear in the DOM before returning content
waitintWait a fixed amount of milliseconds after page load
block_resourcesstrBlock specific resources (images, fonts, etc.) from loading to speed up scraping
response_typestrConvert HTML to other formats. Options: "markdown", "plaintext", "pdf"
css_extractorstrExtract specific elements using CSS selectors (JSON format)
autoparseboolAutomatically extract structured data from HTML (default: False)
screenshotstrCapture an above-the-fold screenshot of the page (default: "false")
screenshot_fullpagestrCapture a full-page screenshot (default: "false")
screenshot_selectorstrCapture a screenshot of a specific element using CSS Selector
screenshot_formatstrChoose between "png" (default) and "jpeg" formats for screenshots
screenshot_qualityintFor JPEG format, set quality from 1 to 100. Lower values reduce file size but decrease quality
original_statusboolReturn the original HTTP status code from the target page (default: False)
allowed_status_codesstrReturns the content even if the target page fails with specified status codes. Useful for debugging or when you need content from error pages
json_responseboolCapture network requests in JSON format, including XHR or Fetch data. Ideal for intercepting API calls made by the web page (default: False)
outputsstrSpecify which data types to extract from the scraped HTML. Accepted values: emails, phone_numbers, headings, images, audios, videos, links, menus, hashtags, metadata, tables, favicon

Features

  • JavaScript Rendering: Scrape modern SPAs and dynamic content
  • Anti-Bot Bypass: Bypass sophisticated bot detection systems
  • Geo-Targeting: Access region-specific content with 190+ countries
  • Multiple Output Formats: HTML, Markdown, Plaintext, PDF, Screenshots
  • CSS Extraction: Target specific data with CSS selectors
  • Structured Data Extraction: Automatically extract emails, phone numbers, links, and other data types
  • Session Management: Maintain consistent sessions across requests
  • Wait Conditions: Smart waiting for dynamic content
  • Premium Proxies: 55M+ residential IPs for maximum success rates

License

langchain-zenrows is distributed under the terms of the MIT license.

Support

Keywords

langchain

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

About

Packages

Stay in touch

Get open source security insights delivered straight into your inbox.

  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc

U.S. Patent No. 12,346,443 & 12,314,394. Other pending.