Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
indexed-gzip-fileobj-fork-epicfaace
Advanced tools
Fast random access of gzip files in Python
nibabel
The indexed_gzip_fileobj_fork_epicfaace
project is a Python extension which aims to provide a
drop-in replacement for the built-in Python gzip.GzipFile
class, the
IndexedGzipFile
.
indexed_gzip_fileobj_fork_epicfaace
was written to allow fast random access of compressed
NIFTI image files (for which GZIP is the
de-facto compression standard), but will work with any GZIP file.
indexed_gzip_fileobj_fork_epicfaace
is easy to use with nibabel
(http://nipy.org/nibabel/).
The standard gzip.GzipFile
class exposes a random access-like interface (via
its seek
and read
methods), but every time you seek to a new point in the
uncompressed data stream, the GzipFile
instance has to start decompressing
from the beginning of the file, until it reaches the requested location.
An IndexedGzipFile
instance gets around this performance limitation by
building an index, which contains seek points, mappings between
corresponding locations in the compressed and uncompressed data streams. Each
seek point is accompanied by a chunk (32KB) of uncompressed data which is used
to initialise the decompression algorithm, allowing us to start reading from
any seek point. If the index is built with a seek point spacing of 1MB, we
only have to decompress (on average) 512KB of data to read from any location
in the file.
You may find indexed_gzip_fileobj_fork_epicfaace
useful if you need to read from large GZIP files.
A major advantage of indexed_gzip_fileobj_fork_epicfaace
is that it will work with any GZIP file.
However, if you have control over the creation of your GZIP files, you may
wish to consider some alternatives:
mgzip
provides an accelerated
GZIP compression and decompression library.bzip2
and xz
, have better
support for random access.indexed_gzip_fileobj_fork_epicfaace
is available on PyPi - to
install, simply type:
pip install indexed_gzip_fileobj_fork_epicfaace
You can also install indexed_gzip_fileobj_fork_epicfaace
from conda-forge:
conda install -c conda-forge indexed_gzip_fileobj_fork_epicfaace
To compile indexed_gzip_fileobj_fork_epicfaace
, make sure you have cython
installed (and numpy
if you want to compile the tests), and then run:
python setup.py develop
To run the tests, type the following; you will need numpy
, nibabel
,
pytest
, pytest-cov
, and coverage
installed:
python -m indexed_gzip_fileobj_fork_epicfaace.tests
You can use the indexed_gzip_fileobj_fork_epicfaace
module directly:
import indexed_gzip_fileobj_fork_epicfaace as igzip
# You can create an IndexedGzipFile instance
# by specifying a file name, or an open file
# handle. For the latter use, the file handle
# must be opened in read-only binary mode.
# Write support is currently non-existent.
myfile = igzip.IndexedGzipFile('big_file.gz')
some_offset_into_uncompressed_data = 234195
# The index will be automatically
# built on-demand when seeking or
# reading.
myfile.seek(some_offset_into_uncompressed_data)
data = myfile.read(1048576)
nibabel
You can use indexed_gzip_fileobj_fork_epicfaace
with nibabel
. nibabel
>= 2.3.0 will
automatically use indexed_gzip_fileobj_fork_epicfaace
if it is present:
import nibabel as nib
image = nib.load('big_image.nii.gz')
If you are using nibabel
2.2.x, you need to explicitly set the keep_file_open
flag:
import nibabel as nib
image = nib.load('big_image.nii.gz', keep_file_open='auto')
To use indexed_gzip_fileobj_fork_epicfaace
with nibabel
2.1.0 or older, you need to do a little
more work:
import nibabel as nib
import indexed_gzip_fileobj_fork_epicfaace as igzip
# Here we are using 4MB spacing between
# seek points, and using a larger read
# buffer (than the default size of 16KB).
fobj = igzip.IndexedGzipFile(
filename='big_image.nii.gz',
spacing=4194304,
readbuf_size=131072)
# Create a nibabel image using
# the existing file handle.
fmap = nib.Nifti1Image.make_file_map()
fmap['image'].fileobj = fobj
image = nib.Nifti1Image.from_file_map(fmap)
# Use the image ArrayProxy to access the
# data - the index will automatically be
# built as data is accessed.
vol3 = image.dataobj[:, :, :, 3]
If you have a large file, you may wish to pre-generate the index once, and save it out to an index file:
import indexed_gzip_fileobj_fork_epicfaace as igzip
# Load the file, pre-generate the
# index, and save it out to disk.
fobj = igzip.IndexedGzipFile('big_file.gz')
fobj.build_full_index()
fobj.export_index('big_file.gzidx')
The next time you open the same file, you can load in the index:
import indexed_gip as igzip
fobj = igzip.IndexedGzipFile('big_file.gz', index_file='big_file.gzidx')
indexed_gzip_fileobj_fork_epicfaace
does not currently have any support for writing. Currently if you
wish to write to a file, you will need to save the file by alternate means (e.g.
via gzip
or nibabel
), and then re-create a new IndexedGzipFile
instance.
For example:
import nibabel as nib
# Load the entire image into memory
image = nib.load('big_image.nii.gz')
data = image.get_data()
# Make changes to the data
data[:, :, :, 5] *= 100
# Save the image using nibabel
nib.save(data, 'big_image.nii.gz')
# Re-load the image
image = nib.load('big_image.nii.gz')
A small test script is included with
indexed_gzip_fileobj_fork_epicfaace
; this script compares the performance of the IndexedGzipFile
class with the gzip.GzipFile
class. This script does the following:
Generates a test file.
Generates a specified number of seek locations, uniformly spaced throughout the test file.
Randomly shuffles these locations
Seeks to each location, and reads a chunk of data from the file.
This plot shows the results of this test for a few compresed files of varying sizes, with 500 seeks:
The indexed_gzip_fileobj_fork_epicfaace
project is based upon the zran.c
example (written by Mark
Alder) which ships with the zlib source code.
indexed_gzip_fileobj_fork_epicfaace
was originally inspired by Zalan Rajna's (@zrajna)
zindex project:
Z. Rajna, A. Keskinarkaus, V. Kiviniemi and T. Seppanen
"Speeding up the file access of large compressed NIfTI neuroimaging data"
Engineering in Medicine and Biology Society (EMBC), 2015 37th Annual
International Conference of the IEEE, Milan, 2015, pp. 654-657.
https://sourceforge.net/projects/libznzwithzindex/
Initial work on indexed_gzip_fileobj_fork_epicfaace
took place at
Brainhack Paris, at the Institut Pasteur,
24th-26th February 2016, with the support of the
FMRIB Centre, at the
University of Oxford, UK.
Many thanks to the following contributors (listed chronologically):
indexed_gzip_fileobj_fork_epicfaace
to Windows (#3)seek_points
method (#35), README fixes
(#34)indexed_gzip_fileobj_fork_epicfaace
inherits the zlib license, available for
perusal in the LICENSE file.
FAQs
Fast random access of gzip files in Python
We found that indexed-gzip-fileobj-fork-epicfaace demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.