New Case Study:See how Anthropic automated 95% of dependency reviews with Socket.Learn More →

df-to-rs

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

df-to-rs

A package to upload Pandas DataFrame to Redshift

0.1.24
PyPI

Maintainers: 1

df_to_rs

df_to_rs is a Python package that provides efficient methods to upload, upsert and manage Pandas DataFrames in Amazon Redshift using S3 as an intermediary.

Key Features

Direct DataFrame to Redshift upload
Upsert functionality (update + insert)
Delete and insert operations
Large dataset handling with chunking
Support for JSON/dict/list columns (Redshift SUPER)
AWS IAM Role support for secure authentication
Automatic cleanup of temporary S3 files

Installation

pip install df_to_rs

Usage

1. Initialize with AWS Credentials

from df_to_rs import df_to_rs
import psycopg2

# Connect to Redshift
redshift_conn = psycopg2.connect(
    dbname='your_db',
    host='your-cluster.region.redshift.amazonaws.com',
    port=1433,
    user='your_user',
    password='your_password'
)
redshift_conn.set_session(autocommit=True)

# Initialize with explicit credentials
uploader = df_to_rs(
    region_name='ap-south-1',
    s3_bucket='your-s3-bucket',
    aws_access_key_id='your-access-key-id',
    aws_secret_access_key='your-secret-access-key',
    redshift_c=redshift_conn
)

2. Initialize using EC2 Instance Role (Recommended)

# No AWS credentials needed when using instance role
uploader = df_to_rs(
    region_name='ap-south-1',
    s3_bucket='your-s3-bucket',
    redshift_c=redshift_conn
)

3. Basic Upload

Upload a DataFrame to a Redshift table:

# Simple upload
uploader.upload_to_redshift(
    df=your_dataframe,
    dest='schema.table_name'
)

4. Upsert Operation

Update existing records and insert new ones based on key columns:

# Upsert based on specific columns
uploader.upsert_to_redshift(
    df=your_dataframe,
    dest_table='schema.table_name',
    upsert_columns=['id', 'unique_key'],  # Columns to match existing records
    clear_dest_table=False  # Set True to truncate table before insert
)

5. Delete and Insert

Delete records matching a condition and insert new data:

# Delete and insert with condition
uploader.delete_and_insert_to_redshift(
    df=your_dataframe,
    dest_table='schema.table_name',
    filter_cond="date >= CURRENT_DATE - 7"  # SQL condition for deletion
)

Special Data Types

JSON/Dictionary Columns

The package automatically handles JSON/dict/list columns for Redshift SUPER type:

# DataFrame with JSON column
df = pd.DataFrame({
    'id': [1, 2],
    'json_data': [{'key': 'value'}, {'other': 'data'}]
})

# Will be automatically converted for Redshift SUPER column
uploader.upload_to_redshift(df, 'schema.table_name')

Large Dataset Handling

The package automatically handles large datasets by:

Chunking data into 1 million row segments
Streaming to S3 in memory
Automatic cleanup of temporary files
Progress tracking with timestamps

Error Handling

Automatic transaction rollback on errors
S3 temporary file cleanup
Detailed error messages and timestamps
Safe staging table management for upserts

AWS IAM Role Requirements

When using instance roles, ensure your role has these permissions:

S3: PutObject, GetObject, DeleteObject on the specified bucket
Redshift: COPY command permissions
IAM: AssumeRole permissions if needed

Best Practices

Use instance roles instead of access keys when possible
Set appropriate column types in Redshift, especially for SUPER columns
Create tables with appropriate sort and dist keys before uploading
Monitor the Redshift query logs for performance optimization

License

This project is licensed under the MIT License - see the LICENSE file for details.

Changelog

All notable changes to df_to_rs will be documented in this file.

[0.1.24] - 2025-01-26

Added

Documentation Improved

[0.1.23] - 2025-01-26

Added

Support for instance role-based authentication in AWS
Handling of JSON/dict/list objects for Redshift SUPER columns
Proper cleanup of S3 temporary files

Changed

Made AWS credentials optional in constructor
Optimized DataFrame processing with unified applymap operations
Improved string column handling for better type safety

Fixed

S3 resource cleanup in error scenarios
Transaction handling in delete_and_insert_to_redshift

FAQs

What is df-to-rs?

Is df-to-rs well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

df-to-rs

df_to_rs

Key Features

Installation

Usage

1. Initialize with AWS Credentials

2. Initialize using EC2 Instance Role (Recommended)

3. Basic Upload

4. Upsert Operation

5. Delete and Insert

Special Data Types

JSON/Dictionary Columns

Large Dataset Handling

Error Handling

AWS IAM Role Requirements

Best Practices

License

Changelog

[0.1.24] - 2025-01-26

Added

[0.1.23] - 2025-01-26

Added

Changed

Fixed

Related posts

OpenSSF Launches Open Source Project Security Baseline to Strengthen Software Supply Chain

Michigan TypeScript Founder Successfully Runs Doom Inside TypeScript's Type System