cloudproxy

Package Overview

Dependencies

Maintainers

Versions

Alerts

File Explorer

Advanced tools

License

Install Socket

Detect and block malicious and high-risk dependencies

Install

cloudproxy

A tool to manage cloud-based proxies for scraping

PyPI

Version: 0.6.35

Maintainers: 1

CloudProxy

cloudproxy

About The Project

The purpose of CloudProxy is to hide your scrapers IP behind the cloud. It allows you to spin up a pool of proxies using popular cloud providers with just an API token. No configuration needed.

CloudProxy exposes an API and modern UI for managing your proxy infrastructure. It includes:

Interactive API documentation with Swagger UI
Modern web interface for proxy management
Real-time proxy status monitoring
Easy scaling controls
Multi-provider support

Providers supported:

Planned:

Azure
Scaleway

Features:

Docker-first deployment - Simple, isolated, production-ready
Modern UI with real-time updates
Interactive API documentation
Multi-provider support
Multiple accounts per provider
Automatic proxy rotation
Rolling deployments - Zero-downtime proxy recycling
Health monitoring
Fixed proxy pool management (maintains target count)

Please always scrape nicely, respectfully.

Getting Started

To get a local copy up and running follow these simple steps.

Prerequisites

CloudProxy is designed to run as a Docker container:

Docker - Required for running CloudProxy (recommended for all deployments)
Python 3.9+ - Only needed for development or custom integrations

Installation

Docker Deployment (Recommended)

CloudProxy is distributed as a Docker image for easy deployment:

# Quick start with DigitalOcean
docker run -d \
  -e PROXY_USERNAME='your_username' \
  -e PROXY_PASSWORD='your_password' \
  -e DIGITALOCEAN_ENABLED=True \
  -e DIGITALOCEAN_ACCESS_TOKEN='your_token' \
  -p 8000:8000 \
  laffin/cloudproxy:latest

# Using environment file (recommended for production)
docker run -d \
  --env-file .env \
  -p 8000:8000 \
  laffin/cloudproxy:0.6.0-beta  # Use specific version tag

Docker Compose Example:

version: '3.8'
services:
  cloudproxy:
    image: laffin/cloudproxy:latest
    ports:
      - "8000:8000"
    env_file:
      - .env
    restart: unless-stopped

It is recommended to use a Docker image tagged to a specific version (e.g., laffin/cloudproxy:0.6.0-beta). See releases for the latest version.

Environment Configuration

Authentication

CloudProxy requires authentication configuration for the proxy servers:

PROXY_USERNAME, PROXY_PASSWORD - Basic authentication credentials (alphanumeric characters only)
ONLY_HOST_IP - Set to True to restrict access to the host server IP only
Both methods can be used simultaneously for enhanced security

Optional Settings

AGE_LIMIT - Proxy age limit in seconds (0 = disabled, default: disabled)

See individual provider documentation for provider-specific environment variables.

Python Package (Development & Integration)

For development or integrating CloudProxy into existing Python applications:

# Install from PyPI
pip install cloudproxy

# Or install from source for development
git clone https://github.com/claffin/cloudproxy.git
cd cloudproxy
pip install -e .

See the Python Package Usage Guide for development and integration examples.

Testing

CloudProxy includes a comprehensive test suite to ensure reliability:

Unit Tests

# Run all tests
pytest -v

# Run specific test file
pytest tests/test_specific.py -v

# Run with coverage
pytest --cov=cloudproxy tests/

End-to-End Testing

Warning: These tests create real cloud instances and will incur costs!

# Run full end-to-end test
./test_cloudproxy.sh

# Run without cleanup (for debugging)
./test_cloudproxy.sh --no-cleanup --skip-connection-test

# Test specific providers
./test_cloudproxy.sh --provider digitalocean

Development

Setting up Development Environment

# Clone the repository
git clone https://github.com/claffin/cloudproxy.git
cd cloudproxy

# Install in development mode
pip install -e .

# Install development dependencies
pip install -r requirements.txt

# Run the application locally
python -m cloudproxy

UI Development

# Navigate to UI directory
cd cloudproxy-ui

# Install dependencies
npm install

# Run development server (hot reload enabled)
npm run serve

# Build for production
npm run build

Adding a New Provider

Create a new directory under cloudproxy/providers/
Implement main.py with the provider orchestration logic
Implement functions.py with cloud API interactions
Follow the standard interface pattern used by existing providers
Add configuration handling in providers/config.py
Add tests in tests/test_provider_name.py

Code Style

Python code follows PEP 8 standards
Use type hints for function parameters and returns
Add comprehensive error handling with Loguru logging
Write unit tests for all new functionality

Usage

CloudProxy provides multiple interfaces for managing your proxy infrastructure:

Web Interface

Access the UI at http://localhost:8000/ui to:

View proxy status across all providers
Scale proxy instances up/down
Monitor health status
Remove individual proxies
View active proxy count

API Documentation

Access the interactive API documentation at http://localhost:8000/docs to:

Explore available endpoints
Test API calls directly
View request/response schemas
Understand authentication requirements

Programmatic Usage

CloudProxy exposes a RESTful API on localhost:8000. Your application can use the API to retrieve and manage proxy servers. All responses include metadata with request ID and timestamp for tracking.

Example of retrieving and using a random proxy:

import random
import requests

def get_random_proxy():
    response = requests.get("http://localhost:8000/random").json()
    return response["proxy"]["url"]

proxies = {
    "http": get_random_proxy(),
    "https": get_random_proxy()
}
my_request = requests.get("https://api.ipify.org", proxies=proxies)

For more detailed examples of using CloudProxy as a Python package, see the Python Package Usage Guide.

Rolling Deployments

CloudProxy supports rolling deployments to ensure zero-downtime proxy recycling. This feature maintains a minimum number of healthy proxies during age-based recycling operations.

Configuration

Enable rolling deployments with these environment variables:

# Enable rolling deployments
ROLLING_DEPLOYMENT=True

# Minimum proxies to keep available during recycling
ROLLING_MIN_AVAILABLE=3  

# Maximum proxies to recycle simultaneously
ROLLING_BATCH_SIZE=2

How It Works

When proxies reach their age limit:

The system checks if recycling would violate minimum availability
Proxies are recycled in batches to maintain service continuity
New proxies are created as old ones are removed
The process continues until all aged proxies are replaced

Monitoring

Check rolling deployment status via the API:

# Get overall status
curl http://localhost:8000/rolling

# Get provider-specific status
curl http://localhost:8000/rolling/digitalocean

For detailed documentation, see the Rolling Deployments Guide.

Multi-Account Provider Support

CloudProxy now supports multiple accounts per provider, allowing you to:

Use multiple API keys or access tokens for the same provider
Configure different regions, sizes, and scaling parameters per account
Organize proxies by account/instance for better management
Scale each account independently

Each provider can have multiple "instances", which represent different accounts or configurations. Each instance has its own:

Scaling parameters (min/max)
Region settings
Size configuration
API credentials
IP addresses

To configure multiple instances, use environment variables with the instance name in the format:

PROVIDERNAME_INSTANCENAME_VARIABLE

For example, to configure two DigitalOcean accounts:

# Default DigitalOcean account
DIGITALOCEAN_ENABLED=True
DIGITALOCEAN_ACCESS_TOKEN=your_first_token
DIGITALOCEAN_DEFAULT_REGION=lon1
DIGITALOCEAN_DEFAULT_MIN_SCALING=2

# Second DigitalOcean account
DIGITALOCEAN_SECONDACCOUNT_ENABLED=True
DIGITALOCEAN_SECONDACCOUNT_ACCESS_TOKEN=your_second_token
DIGITALOCEAN_SECONDACCOUNT_REGION=nyc1
DIGITALOCEAN_SECONDACCOUNT_MIN_SCALING=3

CloudProxy API Examples

CloudProxy exposes a comprehensive REST API for managing your proxy infrastructure. Here are some common examples:

Quick Examples

Get a random proxy

curl -X 'GET' 'http://localhost:8000/random' -H 'accept: application/json'

List all proxies

curl -X 'GET' 'http://localhost:8000/' -H 'accept: application/json'

Set target proxy count

# CloudProxy will maintain exactly 5 proxies (DigitalOcean)
curl -X 'PATCH' 'http://localhost:8000/providers/digitalocean' \
  -H 'Content-Type: application/json' \
  -d '{"min_scaling": 5, "max_scaling": 5}'

# Or for Vultr
curl -X 'PATCH' 'http://localhost:8000/providers/vultr' \
  -H 'Content-Type: application/json' \
  -d '{"min_scaling": 3, "max_scaling": 3}'

Python Usage Example

import requests

# Get a random proxy
response = requests.get("http://localhost:8000/random").json()
proxy_url = response["proxy"]["url"]

# Use the proxy
proxies = {"http": proxy_url, "https": proxy_url}
result = requests.get("https://api.ipify.org", proxies=proxies)

For comprehensive API documentation with all endpoints, request/response schemas, and advanced usage examples, see the API Examples Documentation.

CloudProxy runs on a schedule of every 30 seconds to maintain the target number of proxies specified by MIN_SCALING. If the current count differs from the target, it will create or remove proxies as needed. The new proxy info will appear in IPs once they are deployed and ready to be used.

Roadmap

The project is at early alpha with limited features. Future enhancements may include:

Support for additional cloud providers
Autoscaling based on demand (MIN_SCALING and MAX_SCALING)
Enhanced API for blacklisting and recycling of proxies
Load-based proxy management

See the open issues for a list of proposed features (and known issues).

Limitations

This method of scraping via cloud providers has limitations, many websites have anti-bot protections and blacklists in place which can limit the effectiveness of CloudProxy. Many websites block datacenter IPs and IPs may be tarnished already due to IP recycling. Rotating the CloudProxy proxies regularly may improve results. The best solution for scraping is via proxy services providing residential IPs, which are less likely to be blocked, however are much more expensive. CloudProxy is a much cheaper alternative for scraping sites that do not block datacenter IPs nor have advanced anti-bot protection. This a point frequently made when people share this project which is why I am including this in the README.

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

My target is to review all PRs within a week of being submitted, though sometimes it may be sooner or later.

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Christian Laffin - @christianlaffin - christian.laffin@gmail.com

Project Link: https://github.com/claffin/cloudproxy

Troubleshooting

Common Issues

Proxies not being created

Check that the provider is enabled (e.g., DIGITALOCEAN_ENABLED=True)
Verify API credentials are correct and have necessary permissions
Check the logs for authentication errors
Ensure minimum scaling is set above 0

Authentication failures

Verify PROXY_USERNAME and PROXY_PASSWORD are set
Ensure credentials only contain alphanumeric characters (special characters may cause URL encoding issues)
Check if ONLY_HOST_IP is restricting access

Provider-specific issues

AWS: Ensure the AMI ID is valid for your region
GCP: Check that the service account has necessary permissions
DigitalOcean: Verify the access token has write permissions
Hetzner: Ensure the API token is valid
Vultr: Verify the API token has appropriate permissions and the selected plan/region is available

Docker container issues

# Check container logs
docker logs <container-id>

# Verify environment variables are passed correctly
docker exec <container-id> env | grep PROXY

# Test connectivity to the API
curl http://localhost:8000/providers

Proxies being deleted unexpectedly

Check the AGE_LIMIT setting - proxies older than this will be automatically replaced
Verify proxies are passing health checks
Check cloud provider quotas and limits

UI not loading

Ensure you're accessing /ui not just the root URL
Check that the UI files were built correctly (npm run build in cloudproxy-ui/)
Verify the FastAPI server is serving static files

For more detailed debugging, enable verbose logging by checking the application logs in the logs/ directory.

FAQs

What is cloudproxy?

Is cloudproxy well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

cloudproxy

CloudProxy

Table of Contents

About The Project

Providers supported:

Planned:

Features:

Getting Started

Prerequisites

Installation

Docker Deployment (Recommended)

Environment Configuration

Authentication

Optional Settings

Python Package (Development & Integration)

Testing

Unit Tests

End-to-End Testing

Development

Setting up Development Environment

UI Development

Adding a New Provider

Code Style

Usage

Web Interface

API Documentation

Programmatic Usage

Rolling Deployments

Configuration

How It Works

Monitoring

Multi-Account Provider Support

CloudProxy API Examples

Quick Examples

Get a random proxy

List all proxies

Set target proxy count

Python Usage Example

Roadmap

Limitations

Contributing

License

Contact

Troubleshooting

Common Issues

Proxies not being created

Authentication failures

Provider-specific issues

Docker container issues

Proxies being deleted unexpectedly

UI not loading

Related posts

Engineering with AI Podcast: The Promise of AI-First Development

Spearphishing Campaign Abuses npm Registry to Target U.S. and Allied Manufacturing and Healthcare Organizations