Socket
Socket
Sign inDemoInstall

anonymizedf

Package Overview
Dependencies
2
Maintainers
1
Alerts
File Explorer

Install Socket

Detect and block malicious and high-risk dependencies

Install

anonymizedf

a convenient way to anonymize your data for analytics

    1.0.1

Maintainers
1

Readme

Anonymize df: a convenient way to anonymize your data for analytics

PyPI PyPI - Status PyPI - License Code style: black

What is it?

Anonymize df is a package that helps you quickly and easily generate realistic fake data from a Pandas DataFrame.

What are the expected use cases / why was this made?

  • You're hiring consultants to work on your data but need to anonymize it first
  • You're a consultant and created something great that you want to make into a template

Installation

You can install anonymizedf using pip:

pip install anonymizedf

This will also try downloading the tableau hyper api and pandas packages if you don't have them already.

If you don't want to use pip you can also download this repository and execute:

python setup.py install

Example usage

import pandas as pd
from anonymizedf.anonymizedf import anonymize

# Import the data
df = pd.read_csv("https://query.data.world/s/shcktxndtu3ojonm46tb5udlz7sp3e")

# Prepare the data to be anonymized
an = anonymize(df)

# Select what data you want to anonymize and your preferred style

# Example 1 - just updates df
an.fake_names("Customer Name")
an.fake_ids("Customer ID")
an.fake_whole_numbers("Loyalty Reward Points")
an.fake_categories("Segment")
an.fake_dates("Date")
an.fake_decimal_numbers("Fraction")

# Example 2 - method chaining
fake_df = (
    an
    .fake_names("Customer Name", chaining=True)
    .fake_ids("Customer ID", chaining=True)
    .fake_whole_numbers("Loyalty Reward Points", chaining=True)
    .fake_categories("Segment", chaining=True)
    .fake_dates("Date", chaining=True)
    .fake_decimal_numbers("Fraction", chaining=True)
    .show_data_frame()
)

# Example 3 - multiple assignments
fake_df = an.fake_names("Customer Name")
fake_df = an.fake_ids("Customer ID")
fake_df = an.fake_whole_numbers("Loyalty Reward Points")
fake_df = an.fake_categories("Segment")
fake_df = an.fake_dates("Date")
fake_df = an.fake_decimal_numbers("Fraction")

fake_df.to_csv("fake_customers.csv", index=False)

# One thing to note is that you can't directly pass in a list of columns.
# If you want to apply the same function to multiple columns there are many ways to do that.

# Example 4 - for multiple columns

for column in column_list:
    an.fake_categories(column)

Example output

Customer IDCustomer NameLoyalty Reward PointsSegmentDateFractionFake_Customer NameFake_Customer IDFake_Loyalty Reward PointsFake_SegmentFake_DateFake_Fraction
0AA-10315Alex Avila76Consumer01/01/20007.6Christian Metcalfe-ReidYEJP71011502726136558Segment 11978-11-0929.96
1AA-10375Allen Armold369Consumer02/01/200036.9Helen TaylorXWOB83170110594048286Segment 11989-12-2972.50
2AA-10480Andrew Allen162Consumer03/01/200016.2Joanne PriceVVCJ28547588747677742Segment 11982-09-2379.77
3AA-10645Anna Andreadi803Consumer04/01/200080.3Rhys JonesOXCI12190813836802206Segment 12000-10-147.15
4AB-10015Aaron Bergman935Consumer05/01/200093.5Nigel Baldwin-CookJOXS05799252235987914Segment 12018-01-3040.66

Dependencies

Keywords

FAQs


Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc