redact-PHI
A command-line utility to remove, redact and fabricate PHI, PII or protected data (CSV, JSON, XLSX).
Online Demo
Browser-based version
Local Installation
Install globally using npm i -g redact-phi
Usage
redact [options] <infile> [outfile]
Options:
-V, --version output the version number
--delimiter <delimiter> sets delimiter type (default: ",")
--override <override> Provide the location of your redaction specification
-h, --help display help for command
Redaction Example
redact-PHI includes an example (./example/
) :
example.csv
- CSV source file (data to be redacted)example.json
- JSON redaction strategy (defines how columns will be redacted with: faker
, a custom redactor
or a constant
)example.js
- JavaScript custom redactors (this file is optional)
The .csv
, .json
and .js
files must all be named the same (e.g. data.csv
, data.json
, data.js
)
example/example.csv
:
id,email,ssn,first name,last name,full address,order date,order amount
783,Nicola42@yahoo.com,706-25-4558,Caroline,Goodwin,82703 Yasmeen Corner Apt. 379,"10/9/2020, 4:48:52 AM",252.94
784,Ned_Bernier13@yahoo.com,282-57-6226,Izabella,Bosco,2930 Alisa Heights Apt. 572,"3/27/2021, 11:51:26 PM",689.40
784,Ned_Bernier13@yahoo.com,282-57-6226,Izabella,Bosco,2930 Alisa Heights Apt. 572,"6/17/2021, 9:44:54 PM",446.29
786,Nathen_Kovacek64@hotmail.com,677-86-2303,Celine,Dicki,68305 Labadie Shoal Suite 608,"4/23/2021, 6:49:34 PM",889.37
...
example/example.json
:
{
"columns": [
{
"redactWith": "random.number",
"columnNum": "",
"columnKey": "order id",
"tracked": false
},
{
"redactWith": "customGenerateId",
"columnNum": 0,
"columnKey": "",
"tracked": true
},
...
The columns
array contains objects with properties that describe how each column should be redacted:
-
columnNum
- (integer) Identifies the column to be redacted with an integer index (zero-based). (columnNum
or columnKey
must be set)
-
columnKey
- (string) Identifies the column to be redacted using the file's header. (columnNum
or columnKey
must be set)
-
redactWith
- (string) The strategy used to redact a value, there are four options:
- faker function - Available functions :
name.firstName
or internet.email
- faker template - Faker methods in a mustache template :
{{address.streetAddress(true)}}
or {{address.city}}, {{address.state}} {{address.zip}}
- custom JavaScript - For more complicated redacted values you can call a JavaScript function defined in your
.js
file (see example.js
) - constant value - If you wish to replace every value in this column with a constant, e.g.
John Doe
-
tracked
- Tracking preserves the relationships within your data while de-identifying. With tracking enabled, the redaction engine tracks a column's original value and reuses the redacted value if the original is re-encountered. To see this in-action, run redact /example/example.csv
and note how columns with tracking enabled (id
,ssn
,email
) are redacted with the same value in example_redacted.csv
.
example/example.js
:
module.exports = {
customGenerateId: () => {
return ++currentId;
},
...
Custom redactors provide more control over your data. A custom redactor is referenced from your .json
file using the redactWith
property. For example: redactWith: "customGenerateId"
De-identification Examples
These JSON examples will remove the following PII:
- First Name:
"redactWith": "name.firstName"
- Last Name:
"redactWith": "name.lastName"
- Full Name:
"redactWith": "name.findName"
- Social Security number:
"redactWith": "{{datatype.number({\"min\":100,\"max\":999})}}-{{datatype.number({\"min\":10,\"max\":99})}}-{{datatype.number({\"min\":1000,\"max\":9999})}}"
- Email Address:
"redactWith": "internet.email"
- Phone / Fax number:
"redactWith": "phone.phoneNumber"
- Street Address:
"redactWith": "address.streetAddress"
- City:
"redactWith": "address.city"
- Zip Code:
"redactWith": "address.zipCode"
- City, State, Zip:
"redactWith": "{{address.city()}}, {{address.stateAbbr()}} {{address.zipCode()}}"
- Full Address:
"redactWith": "{{address.streetAddress(true)}}"
- IP:
"redactWith": "internet.ip"
- IP v6:
"redactWith": "internet.ipv6"
Full De-identification example.json:
{
"columns": [
{
"redactWith": "name.firstName",
"columnNum": "",
"columnKey": "first name",
"tracked": false
},
{
"redactWith": "name.lastName",
"columnNum": "",
"columnKey": "last name",
"tracked": false
},
{
"redactWith": "name.findName",
"columnNum": "",
"columnKey": "full name",
"tracked": false
},
{
"redactWith": "{{datatype.number({\"min\":100,\"max\":999})}}-{{datatype.number({\"min\":10,\"max\":99})}}-{{datatype.number({\"min\":1000,\"max\":9999})}}",
"columnNum": "",
"columnKey": "social security number",
"tracked": false
},
{
"redactWith": "internet.email",
"columnNum": "",
"columnKey": "email",
"tracked": false
},
{
"redactWith": "phone.phoneNumber",
"columnNum": "",
"columnKey": "phone",
"tracked": false
},
{
"redactWith": "address.city",
"columnNum": "",
"columnKey": "city",
"tracked": false
},
{
"redactWith": "address.zipCode",
"columnNum": "",
"columnKey": "zip",
"tracked": false
},
{
"redactWith": "address.streetAddress",
"columnNum": "",
"columnKey": "street address",
"tracked": false
},
{
"redactWith": "{{address.city()}}, {{address.stateAbbr()}} {{address.zipCode()}}",
"columnNum": "",
"columnKey": "city state zip",
"tracked": false
},
{
"redactWith": "{{address.streetAddress(true)}}",
"columnNum": "",
"columnKey": "full address",
"tracked": false
},
{
"redactWith": "internet.ip",
"columnNum": "",
"columnKey": "ip",
"tracked": false
},
{
"redactWith": "internet.ipv6",
"columnNum": "",
"columnKey": "ipv6",
"tracked": false
}
]
}
Redaction Strategy JSON Schema
JSON Schema for creating a spec