🌐 ScrapeGraph Python SDK
Official Python SDK for the ScrapeGraph API - Smart web scraping powered by AI.
📦 Installation
pip install scrapegraph-py
🚀 Features
- 🤖 AI-powered web scraping
- 🔄 Both sync and async clients
- 📊 Structured output with Pydantic schemas
- 🔍 Detailed logging
- ⚡ Automatic retries
- 🔐 Secure authentication
🎯 Quick Start
from scrapegraph_py import Client
client = Client(api_key="your-api-key-here")
[!NOTE]
You can set the SGAI_API_KEY
environment variable and initialize the client without parameters: client = Client()
📚 Available Endpoints
🔍 SmartScraper
Scrapes any webpage using AI to extract specific information.
from scrapegraph_py import Client
client = Client(api_key="your-api-key-here")
response = client.smartscraper(
website_url="https://example.com",
user_prompt="Extract the main heading and description"
)
print(response)
Output Schema (Optional)
from pydantic import BaseModel, Field
from scrapegraph_py import Client
client = Client(api_key="your-api-key-here")
class WebsiteData(BaseModel):
title: str = Field(description="The page title")
description: str = Field(description="The meta description")
response = client.smartscraper(
website_url="https://example.com",
user_prompt="Extract the title and description",
output_schema=WebsiteData
)
📝 Markdownify
Converts any webpage into clean, formatted markdown.
from scrapegraph_py import Client
client = Client(api_key="your-api-key-here")
response = client.markdownify(
website_url="https://example.com"
)
print(response)
💻 LocalScraper
Extracts information from HTML content using AI.
from scrapegraph_py import Client
client = Client(api_key="your-api-key-here")
html_content = """
<html>
<body>
<h1>Company Name</h1>
<p>We are a technology company focused on AI solutions.</p>
<div class="contact">
<p>Email: contact@example.com</p>
</div>
</body>
</html>
"""
response = client.localscraper(
user_prompt="Extract the company description",
website_html=html_content
)
print(response)
⚡ Async Support
All endpoints support async operations:
import asyncio
from scrapegraph_py import AsyncClient
async def main():
async with AsyncClient() as client:
response = await client.smartscraper(
website_url="https://example.com",
user_prompt="Extract the main content"
)
print(response)
asyncio.run(main())
📖 Documentation
For detailed documentation, visit docs.scrapegraphai.com
🛠️ Development
For information about setting up the development environment and contributing to the project, see our Contributing Guide.
💬 Support & Feedback
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🔗 Links
Made with ❤️ by ScrapeGraph AI