Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
This library implements a simple lossless compression scheme adapted to time-dependent high-frequency, high-dimensional signals. It is being developed within the International Brain Laboratory with the aim of being the compression library used for all large-scale electrophysiological recordings based on Neuropixels. The signals are typically recorded at 30 kHz and 10 bit depth, and contain several hundreds of channels.
The requested features for the compression scheme were as follows:
The compression scheme is the following:
Saving the offsets allows for on-the-fly decompression and random data access: one simply has to determine which chunks should be loaded, and load them directly from the compressed binary file. The compressed chunks are decompressed with zlib, and the original data is recovered with a cumulative sum (the inverse of the time difference operation).
With large-scale neurophysiological recordings, we achieved a compression ratio of 3x.
As a consistency check, the compressed file is by default automatically and transparently decompressed and compared to the original file on a byte-per-byte basis.
For development only:
pip install mtscomp
Example:
# Compression: specify the number of channels, sample rate, dtype, optionally save the parameters
# as default in ~/.mtscomp with --set-default
mtscomp data.bin -n 385 -s 30000 -d int16 [--set-default]
# Decompression
mtsdecomp data.cbin -o data.decomp.bin
Usage:
usage: mtscomp [-h] [-d DTYPE] [-s SAMPLE_RATE] [-n N_CHANNELS] [-p CPUS]
[-c CHUNK] [-nc] [-v] [--set-default]
path [out] [outmeta]
Compress a raw binary file.
positional arguments:
path input path of a raw binary file
out output path of the compressed binary file (.cbin)
outmeta output path of the compression metadata JSON file
(.ch)
optional arguments:
-h, --help show this help message and exit
-d DTYPE, --dtype DTYPE
data type
-s SAMPLE_RATE, --sample-rate SAMPLE_RATE
sample rate
-n N_CHANNELS, --n-channels N_CHANNELS
number of channels
-p CPUS, --cpus CPUS number of CPUs to use
-c CHUNK, --chunk CHUNK
chunk duration
-nc, --no-check no check
-v, --debug verbose
--set-default set the specified parameters as the default
usage: mtsdecomp [-h] [-o [OUT]] [--overwrite] [-nc] [-v] cdata [cmeta]
Decompress a raw binary file.
positional arguments:
cdata path to the input compressed binary file (.cbin)
cmeta path to the input compression metadata JSON file (.ch)
optional arguments:
-h, --help show this help message and exit
-o [OUT], --out [OUT]
path to the output decompressed file (.bin)
--overwrite, -f overwrite existing output
-nc, --no-check no check
-v, --debug verbose
Example:
import numpy as np
from mtscomp.mtscomp import compress, decompress
# Compress a .bin file into a pair .cbin (compressed binary file) and .ch (JSON file).
compress('data.bin', 'data.cbin', 'data.ch', sample_rate=20000., n_channels=256, dtype=np.int16)
# Decompress a pair (.cbin, .ch) and return an object that can be sliced like a NumPy array.
arr = decompress('data.cbin', 'data.ch')
X = arr[start:end, :] # decompress the data on the fly directly from the file on disk
arr.close() # Close the file when done
Example:
import numpy as np
from mtscomp import Writer, Reader
# Define a writer to compress a flat raw binary file.
w = Writer(chunk_duration=1.)
# Open the file to compress.
w.open('data.bin', sample_rate=20000., n_channels=256, dtype=np.int16)
# Compress it into a compressed binary file, and a JSON header file.
w.write('data.cbin', 'data.ch')
w.close()
# Define a reader to decompress a compressed array.
r = Reader()
# Open the compressed dataset.
r.open('data.cbin', 'data.ch')
# The reader can be sliced as a NumPy array: decompression happens on the fly. Only chunks
# that need to be loaded are loaded and decompressed.
# Here, we load everything in memory.
array = r[:]
# Or we can decompress into a new raw binary file on disk.
r.tofile('data_dec.bin')
r.close()
Performance on an Neuropixels dataset (30 kHz, 385 channels) and Intel 10-core i9-9820X CPU @ 3.3GHz:
FAQs
Lossless compression for electrophysiology time-series
We found that mtscomp demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 3 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.