db_obfuscation
is a gem that helps to prepare a production size obfuscated database. This obfuscated database can be used for internal testing purposes like user acceptance testing, QA/Regression testing.
db_obfuscation
takes a production database and updates data in every row in each table with fake data. db_obfuscation
ensures that associations between different tables are still maintained.
The gem supports only postgres databases at the moment.
Installation
gem install db_obfuscation
Usage
db_obfuscation obfuscate -c <path of obfuscation_configuration>
-s <Number of rows to be obfuscated in each db transaction>
-l <name_of_log_file>
step_size
is a configuration that depends on every use case. It depends on the processing power of the computer, size of the table etc.
In our experience, 100 row updates per database transaction has been the most optimum configuration for a database. However this number may need to be changed to optimize the performance for your database.
Configuration
A sample configuration folder for the gem is included with the gem. The sample folder is at spec/config
.
A generic configuration folder consists of following files and folders,
- Database Configuration file
<path_to_config_folder>/database.yml
This file contains credentials to connect to the database. This file needs adapter name, host, encoding, username, password, and name of the database.
Sample database.yml
file:
adapter: postgres
host: localhost
encoding: unicode
username: database_user
database: obfuscation_test
password: database_password
<path_to_config_folder>/table_strategies
This folder contains a yaml file for every table, for which a users desires to override default obfuscation configuration.
Each table file contains a mapping between columns and obfuscation strategy for that column. The filename is same as the table whose configuration is specified.
A sample table strategy file is like,
<spec/config/table_strategies/table_2.yml>
table_2:
field_1: :default_strategy
field_2: :whitelisted
date_field: :date_strategy
field_3: :first_name_strategy
db_obfuscation
, by default, obfuscates every string column in a table.
It uses a random word to obfuscate every string column. This default behaviour can be overridden on column and table basis by specifying different strategies respectively.
Different strategies supported are,
- `:whitelisted` to skip obfuscating a particular string column in a table
- `:date_strategy` to include a date column that needs to be obfuscated.
Date columns in a table are not obfuscated by default. Including `:date_strategy` adds a random number of days between 31 and 240 to the current value of date.
- Complete list of different strategies is [here](https://github.com/CaseCommonsDevOps/db_obfuscation/blob/master/lib/db_obfuscation/obfuscator.rb).
3. Truncation Tables
<path_to_config_folder>/truncation_patterns.yml
This file contains string patterns for table names that need to truncated instead of being obfuscated.
Any table name that is the same as the pattern or begins with that pattern, followed by an underscore will be truncated during the obfuscation process.
A sample truncation_patterns.yml
file is like,
- truncation_table_1
- audit
Any table that begins with the word audit_
will be selected for truncation.
<path_to_config_folder/whitelisted_tables.yml
This file contains names of tables that don't need to be obfuscated and should not be touched.
A sample whitelisted_tables.yml
looks like this,
- whitelisted_table_1
- whitelisted_table_2
Requirements
License
Copyright © 2015 Case Commons & Rajat Agrawal.
Licensed under the MIT license, available in the “LICENSE” file.