🚀 Big News: Socket Acquires Coana to Bring Reachability Analysis to Every Appsec Team.Learn more
Socket
DemoInstallSign in
Socket

webcrawlerapi-langchain

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

webcrawlerapi-langchain

LangChain integration for WebCrawlerAPI

0.1.1
PyPI
Maintainers
1

WebCrawlerAPI LangChain Integration

WebcrawlerAPI - is a website to LLM data API. It allows to convert websites and webpages markdown or cleaned content.

No subscription required.

This package provides LangChain integration for WebCrawlerAPI, allowing you to easily use web crawling capabilities with LangChain document processing pipeline.

Installation

Get your API key first

pip install webcrawlerapi-langchain

Usage

Basic Loading

from webcrawlerapi_langchain import WebCrawlerAPILoader

# Initialize the loader
loader = WebCrawlerAPILoader(
    url="https://example.com",
    api_key="your-api-key",
    scrape_type="markdown",
    items_limit=10
)

# Load documents
documents = loader.load()

# Use documents in your LangChain pipeline
for doc in documents:
    print(doc.page_content[:100])
    print(doc.metadata)

Async Loading

# Async loading
documents = await loader.aload()

Lazy Loading

# Lazy loading
for doc in loader.lazy_load():
    print(doc.page_content[:100])

Async Lazy Loading

# Async lazy loading
async for doc in loader.alazy_load():
    print(doc.page_content[:100])

Configuration

The loader accepts the following parameters:

  • url: The URL to crawl
  • api_key: Your WebCrawlerAPI API key
  • scrape_type: Type of scraping (html, cleaned, markdown)
  • items_limit: Maximum number of pages to crawl
  • whitelist_regexp: Regex pattern for URL whitelist
  • blacklist_regexp: Regex pattern for URL blacklist

If you need help with integration feel free to contact us.

License

MIT License

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts