
Product
Introducing Rust Support in Socket
Socket now supports Rust and Cargo, offering package search for all users and experimental SBOM generation for enterprise projects.
It is a library that facilitates converting CSV files to various formats (such as DataFrames or other CSV/Excel files) based on a JSON mapping
DataForgeToolkit: Flexible Data Mapping for CSV/XLSX Files
The DataForgeToolkit is a Python library designed to streamline the process of converting CSV or Excel files into customized DataFrames based on user-defined JSON mapping configurations. Whether you're working with financial reports, customer datasets, or any other structured data, this toolkit empowers you to effortlessly transform raw data into actionable insights.
Features: Versatile File Support: Seamlessly process both CSV and Excel files, providing flexibility in handling various data formats commonly encountered in data analysis tasks.
Customizable Mapping: Define transformation mappings using a JSON file, allowing for precise specification of column names, data cleaning, and value substitutions tailored to your specific data requirements.
Efficient Data Processing: Automate data preprocessing tasks such as handling missing values, standardizing column names, and applying complex value mappings with ease.
pip install dataforgetoolkit
Define Transformation Mapping:
Create a JSON file specifying the transformation mappings for your data. Define column mappings, specify new column names, and define value substitutions as needed.
Use the Toolkit:
Import the DataForgeToolkit in your Python script and utilize the map function to convert your report files:
from dataforgetoolkit import datamapper
datamapper.map('report file path csv / xlsx format','mapping json file path')
Access Mapped Data:
Access the transformed data as a DataFrame for further analysis or export to other formats.
DEFAULT_VALUE = "*"
FILTER_VALUE = "FILTER"
REPLACE_VALUE = "REPLACE_"
CONCAT_VALUE = "CONCAT"
UPPERCASE_VALUE = "UPPERCASE"
LOWERCASE_VALUE = "LOWERCASE"
REGEX_VALUE = "REGEX_"
Transformation mappings are specified using a JSON file. Example:
{ "transformation_mapping": [ { "column": "Name", "new_name": "Student Name", "value_mappings": [ { "*": "Amit Singh" } ] }, { "column": "Age_Column", "new_name": "Age", "value_mappings": [ { "FILTER": "30" } ] }, { "column": "Location", "new_name": "Country", "value_mappings": [ { "REPLACE_usa": "United state of America" } ] }, { "column": "Gender", "new_name": "Sex", "value_mappings": [ { "MALE": "M", "FEMALE": "F" } ] }, { "column": "Zipcode_Column", "new_name": "Processed_Text_regex", "value_mappings": [ { "REPLACE_hello": "hi", "REGEX_[0-9]+": "NUMBER" } ] } ] }
Software Engineer
Contributions are always welcome!
Please adhere to this project's code of conduct
.
Suggest code and open PR/MR
'Intended Audience' :: Developers , Testers , BA
FAQs
It is a library that facilitates converting CSV files to various formats (such as DataFrames or other CSV/Excel files) based on a JSON mapping
We found that dataforgetoolkit demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Product
Socket now supports Rust and Cargo, offering package search for all users and experimental SBOM generation for enterprise projects.
Product
Socket’s precomputed reachability slashes false positives by flagging up to 80% of vulnerabilities as irrelevant, with no setup and instant results.
Product
Socket is launching experimental protection for Chrome extensions, scanning for malware and risky permissions to prevent silent supply chain attacks.