copycat
import { copycat } from '@snaplet/copycat'
copycat.email('foo')
copycat.email('bar')
copycat.email('foo')
Motivation
The problem
Many of the use cases we aim on solving with snaplet involve anonymizing sensitive information. In practice, this involves replacing each bit of sensitive data with something else that resembles the original value, yet does not allow the original value to be inferred.
To do this, we initially turned to faker for replacing the sensitive data with fake data. This approach took us quite far. However, we struggled with getting the replacement data to be deterministic: we found we did not have enough control over how results are generated to be able to easily ensure that for each value of the original data we wanted to replace, we'd always get the same replacement value out.
Faker allows one to seed a psuedo-random number generator (PRNG), such that the same sequence of values will be generated every time. While this means the sequence is deterministic, the problem was we did not have enough control over where the next value in the sequence was going to be used. Changes to the contents or structure in the original data we're replacing and changes to how we are using faker both had an effect on the way we used this sequence, which in turn had an effect on the resulting replacement value for any particular value in the original data. In other words, we had determinism, but not in a way that is useful for our purposes.
The solution
What we were really needing was not the same sequence of generated values every time, but the same mapping to generated values every time.
This is exactly what we designed copycat
to do. For each method provided by copycat, a given input value will always map to the same output value.
import { copycat } from '@snaplet/copycat'
copycat.email('foo')
copycat.email('bar')
copycat.email('foo')
Copycat work statelessly: for the same input, the same value will be returned regardless of the environment, process, call ordering, or any other external factors.
Under the hood, copycat hashes the input values (in part relying on md5), with the intention of making it computationally infeasible for the input values to be inferred from the output values.
Alternative approaches
It is still technically possible to make use of faker or similar libraries that offer deterministic PRNG - with some modification. That said, these solutions came with practical limitations that we decided made them less viable for us:
- It is possible to simply seed the PRNG for every identifier, and then use it to generate only a single value. This seems to be a misuse of these libraries though: there is an up-front cost to seeding these PRNGs that can be expensive if done for each and every value to be generated. Here are benchmarks that point to this up-front cost.
- You can generate a sequence of N values, hash identifiers to some integer smaller than N, then simply use that as an index to lookup a value in the sequence. This can even be done lazily. Still, you're now limiting the uniqueness of the values to N. The larger N is, the larger the cost of keeping these sequences in memory, or the more computationally expensive it is if you do not hold onto the sequences in memory. The smaller N is, the less unique your generated values are.
Note though that for either of these approaches, hashing might also still be needed to make it infeasible for the inputs to be inferred from the outputs.
API Reference
Overview
All copycat functions take in an input
value as their first parameter:
import { copycat } from '@snaplet/copycat'
copycat.email('foo')
The given input can be any JSON-serializable value. For any two calls to the same function, the input given in each call serializes down to the same value, the same output will be returned.
Note that unlike JSON.stringify()
, object property ordering is not considered.
faker
A re-export of the exports of @faker-js/faker
as an object. We do not alter faker in any way, and do not seed it.
email(input)
Takes in an input a string value resembling an email address.
copycat.email('foo')
firstName(input)
Takes in an input and returns a string value resembling a first name.
copycat.firstName('foo')
lastName(input)
Takes in an input and returns a string value resembling a last name.
copycat.lastName('foo')
fullName(input)
Takes in an input and returns a string value resembling a full name.
copycat.fullName('foo')
username(input)
Takes in an input and returns a string value resembling a username.
copycat.username('foo')
uuid(input)
Takes in an input and returns a string value resembling a uuid.
copycat.uuid('foo')
city(input)
Takes in an input and returns a string value representing a city.
copycat.city('foo')
country(input)
Takes in an input and returns a string value representing a country.
copycat.country('foo')
streetName
Takes in an input and returns a string value representing a fictitious street name.
copycat.streetName('foo')
streetAddress
Takes in an input and returns a string value representing a fictitious street address.
copycat.streetAddress('foo')
postalAddress
Takes in an input and returns a string value representing a fictitious postal address.
copycat.postalAddress('foo')
int(input[, options])
Takes in an input
value and returns an integer.
int('foo')
options
min=0
and max=Infinity
: the minimum and maximum possible values for returned numbers
bool(input)
Takes in an input
value and returns a boolean.
bool('foo')
float(input[, options])
Takes in an input
value and returns a number value with both a whole and decimal segment.
float('foo')
options
min=0
and max=Infinity
: the minimum and maximum possible values for returned numbers
dateString(input[, options])
Takes in an input
value and returns a string representing a date in ISO 8601 format.
dateString('foo')
options
minYear=1980
and maxYear=2019
: the minimum and maximum possible year values for returned dates
char(input)
Takes in an input
value and returns a string with a single character.
char('foo')
The generated character will be an alphanumeric: lower and upper case ASCII letters and digits 0 to 9.
word(input)
Takes in an input
value and returns a string value resembling a fictitious word.
word('foo')
words(input)
Takes in an input
value and returns a string value resembling fictitious words.
words('foo')
sentence(input)
Takes in an input
value and returns a string value resembling a sentence of fictitious words.
sentence('foo')
paragraph(input)
Takes in an input
value and returns a string value resembling a paragraph of fictitious words.
paragraph('foo')
oneOf(input, values)
Takes in an input
value and an array of values
, and returns an item in values
that corresponds to that input
:
oneOf('foo', ['red', 'green', 'blue'])