DataRecipe
Table of Contents
- Overview
- Functions
- Contact Information
Overview
This toolkit provides a variety of Python functions to facilitate common data manipulation, data import/export, and database operations.
Functions
General Features
send_email
Sends an email using SMTP with SSL/TLS options, supporting attachments if provided.
- Parameters:
subject
: Email subject as a string.body
: Main content of the email.send_email_address
: Sender's email address.send_email_password
: Sender's email password for SMTP authentication.receive_email_address
: Recipient's email address.attachment_path
: Directory path where attachments are stored (optional).attachment_list
: List of filenames to be attached (optional).smtp_address
: SMTP server address (default: 'smtp.feishu.cn').smtp_port
: SMTP server port (default: 465).
Example with Attachments:
send_email(
"Meeting Documents",
"Please see attached documents for the upcoming meeting.",
"sender@example.com",
"password123",
"receiver@example.com",
attachment_path="/path/to/documents",
attachment_list=["agenda.pdf", "minutes.docx"]
)
Data Validation and Cleaning
check_empty
Checks for empty entries in specified DataFrame columns.
- Parameters:
df
: DataFrame to check.columns
: Columns to check for missing values.output_cols
: Columns to include in the output.
Example:
empty_data = check_empty(df, columns=["name", "email"])
clean_dataframe
Cleans DataFrame by replacing infinite values with NaN.
Example:
clean_dataframe(df)
Data Import/Export
local_to_df
Converts files from a local directory to a pandas DataFrame.
- Parameters:
path
: Directory path to search for files.partial_file_name
: File name pattern to match.skip_rows
: Number of rows to skip at the start of each file.keep_file_name
: If True, adds a column with the file name.sheet_num
: For Excel files, specifies the sheet number to read.encoding
: Character encoding of the files.
Example with CSV files:
df = local_to_df("./data", "sample", keep_file_name=True)
Example with Excel files:
df = local_to_df("./data", "report", sheet_num=2, encoding='utf-8')
df_to_xlsx
Saves a DataFrame to an Excel file.
- Parameters:
df
: DataFrame to save.directory_path
: Path to directory where the file will be saved.file_name
: Name of the output file.
Example:
df_to_xlsx(df, "./output", "output_data")
df_to_csv
Saves a DataFrame to a CSV file.
- Parameters:
df
: DataFrame to save.directory_path
: Path to directory where the file will be saved.file_name
: Name of the output file.
Example:
df_to_csv(df, "./output", "output_data")
Database Operations
update
Updates records in a database table based on conditions.
- Parameters:
raw_df
: DataFrame containing new data to update.database
: Database name.table
: Table name.yaml_file_name
: YAML file name with DB configuration.clause
: SQL clause for record deletion.date_col
: Column name containing date data.custom_path
: Path to directory containing the YAML file.
Example:
update(df, "test_db", "user_data", clause="user_id > 10")
sql_query
Executes a SELECT SQL query and returns a DataFrame.
- Parameters:
database
: Database name.sql
: SQL SELECT statement.yaml_file_name
: YAML file name with DB configuration.custom_path
: Optional path to directory containing the YAML file.
Example:
result_df = sql_query("test_db", "SELECT * FROM users")
Contact Information
For any questions or suggestions regarding the toolkit, please contact us at: