base64-search
JavaScript library for searching for strings in base64-encoded blobs.
I sometimes need to check whether a blob of data with both plain and base64-encoded text contains a certain string. This may happen when analysing the traffic dump of a program or app for security or data protection research, for example.
However, depending on the offset of the search string in the source string, it can be base64-encoded in different ways. Take a look at this JSON blob as an example:
{
"attr1": "Hello, world!",
"attr2": "WW91IGNhbiBnZW5lcmF0ZSBHRFBSIHJlcXVlc3RzIGF0IGRhdGFyZXF1ZXN0cy5vcmchIENoZWNrIGl0IG91dDogaHR0cHM6Ly93d3cuZGF0YXJlcXVlc3RzLm9yZy9nZW5lcmF0b3I=",
"attr3": "base64-search",
"attr4": "SGVsbG8sIHdvcmxkIQ=="
}
Imagine I want to check whether it contains the string datarequests.org
anywhere. If we base64-encode that, we get ZGF0YXJlcXVlc3RzLm9yZw==
. This is not contained anywhere in the JSON. But concluding that the JSON doesn't contain datarequests.org
based on that would be wrong.
Using base64-search
, we find that there are in fact three different ways that datarequests.org
could be encoded that we all need to check (to learn more about why that is, have a look at this great blog post, which this library is inspired by):
> base64Encodings('datarequests.org')
[
'ZGF0YXJlcXVlc3RzLm9yZ',
'RhdGFyZXF1ZXN0cy5vcm',
'YXRhcmVxdWVzdHMub3Jn'
]
And indeed we can see that both ZGF0YXJlcXVlc3RzLm9yZ
and RhdGFyZXF1ZXN0cy5vcm
are present in the JSON blob, so it does contain datarequests.org
, twice even.
Now, in this case, I could of course have just decoded both base64 strings and then performed string matching on the results. But this method is for when you want to search arbitrary data that you don't know the shape of and without manual work.
Installation
You can install base64-search using yarn or npm:
yarn add base64-search
Example usage
import { base64Regex, base64Encodings } from 'base64-search';
console.log(base64Regex('ea70edc1'));
console.log(base64Encodings('ea70edc1'));
API
This library provides the following two functions:
base64Regex(input)
Parameters:
input
(required) – The string or other data that you want to search for. Can be a string
or Buffer
.
returns: string
– A regex that matches all possible base64 encodings of input
in a single capture group.
base64Encodings(input)
Parameters:
input
(required) – The string or other data that you want to search for. Can be a string
or Buffer
.
returns: string[]
– An array of all possible base64 encodings of input
.
License
base64-search is licensed under the MIT license, see the LICENSE
file for details. It was inspired by this blog post, which contains an alternative implementation of the method for PowerShell.
Issues and pull requests are welcome!