🌐 ScrapeGraph Python SDK

Official Python SDK for the ScrapeGraph API - Smart web scraping powered by AI.
📦 Installation
pip install scrapegraph-py
🚀 Features
- 🤖 AI-powered web scraping and search
- 🔄 Both sync and async clients
- 📊 Structured output with Pydantic schemas
- 🔍 Detailed logging
- ⚡ Automatic retries
- 🔐 Secure authentication
🎯 Quick Start
from scrapegraph_py import Client
client = Client(api_key="your-api-key-here")
[!NOTE]
You can set the SGAI_API_KEY
environment variable and initialize the client without parameters: client = Client()
📚 Available Endpoints
🤖 SmartScraper
Extract structured data from any webpage or HTML content using AI.
from scrapegraph_py import Client
client = Client(api_key="your-api-key-here")
response = client.smartscraper(
website_url="https://example.com",
user_prompt="Extract the main heading and description"
)
html_content = """
<html>
<body>
<h1>Company Name</h1>
<p>We are a technology company focused on AI solutions.</p>
</body>
</html>
"""
response = client.smartscraper(
website_html=html_content,
user_prompt="Extract the company description"
)
print(response)
Output Schema (Optional)
from pydantic import BaseModel, Field
from scrapegraph_py import Client
client = Client(api_key="your-api-key-here")
class WebsiteData(BaseModel):
title: str = Field(description="The page title")
description: str = Field(description="The meta description")
response = client.smartscraper(
website_url="https://example.com",
user_prompt="Extract the title and description",
output_schema=WebsiteData
)
🍪 Cookies Support
Use cookies for authentication and session management:
from scrapegraph_py import Client
client = Client(api_key="your-api-key-here")
cookies = {
"session_id": "abc123def456",
"auth_token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
"user_preferences": "dark_mode,usd"
}
response = client.smartscraper(
website_url="https://example.com/dashboard",
user_prompt="Extract user profile information",
cookies=cookies
)
Common Use Cases:
- E-commerce sites: User authentication, shopping cart persistence
- Social media: Session management, user preferences
- Banking/Financial: Secure authentication, transaction history
- News sites: User preferences, subscription content
- API endpoints: Authentication tokens, API keys
🔄 Advanced Features
Infinite Scrolling:
response = client.smartscraper(
website_url="https://example.com/feed",
user_prompt="Extract all posts from the feed",
cookies=cookies,
number_of_scrolls=10
)
Pagination:
response = client.smartscraper(
website_url="https://example.com/products",
user_prompt="Extract all product information",
cookies=cookies,
total_pages=5
)
Combined with Cookies:
response = client.smartscraper(
website_url="https://example.com/dashboard",
user_prompt="Extract user data from all pages",
cookies=cookies,
number_of_scrolls=5,
total_pages=3
)
🔍 SearchScraper
Perform AI-powered web searches with structured results and reference URLs.
from scrapegraph_py import Client
client = Client(api_key="your-api-key-here")
response = client.searchscraper(
user_prompt="What is the latest version of Python and its main features?"
)
print(f"Answer: {response['result']}")
print(f"Sources: {response['reference_urls']}")
Output Schema (Optional)
from pydantic import BaseModel, Field
from scrapegraph_py import Client
client = Client(api_key="your-api-key-here")
class PythonVersionInfo(BaseModel):
version: str = Field(description="The latest Python version number")
release_date: str = Field(description="When this version was released")
major_features: list[str] = Field(description="List of main features")
response = client.searchscraper(
user_prompt="What is the latest version of Python and its main features?",
output_schema=PythonVersionInfo
)
📝 Markdownify
Converts any webpage into clean, formatted markdown.
from scrapegraph_py import Client
client = Client(api_key="your-api-key-here")
response = client.markdownify(
website_url="https://example.com"
)
print(response)
⚡ Async Support
All endpoints support async operations:
import asyncio
from scrapegraph_py import AsyncClient
async def main():
async with AsyncClient() as client:
response = await client.smartscraper(
website_url="https://example.com",
user_prompt="Extract the main content"
)
print(response)
asyncio.run(main())
📖 Documentation
For detailed documentation, visit docs.scrapegraphai.com
🛠️ Development
For information about setting up the development environment and contributing to the project, see our Contributing Guide.
💬 Support & Feedback
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🔗 Links
Made with ❤️ by ScrapeGraph AI