
Research
PyPI Package Disguised as Instagram Growth Tool Harvests User Credentials
A deceptive PyPI package posing as an Instagram growth tool collects user credentials and sends them to third-party bot services.
Note: pyzim
is published on PyPI as python-zim
due to a naming conflict with an existing package.
pyzim
is a semi-pure python package for working with ZIM files. A ZIM file is basically a very highly compressed archive of a website. Examples for ZIM files include offline versions of wikipedia, stackoverflow, project gutenberg and many more.
pyzim
aims to provide a very flexible and open method of interacting with ZIM files. For example, this project aims to give developers the choice whether they want access entries in a ZIM file as fast as possible or with as little RAM usage as possible. pyzim
itself is written in pure python and does not depend on libzim
. However, modern ZIM files use zstandard compression and pyzim
depends on a C library for working with such files.
pyzim
is nearly fully implemented. It supports nearly all reader featurs and you should be able to read all modern ZIM files, but some features (like search) are still missing. A writer also exists and is even capable of editing existing ZIM files.
Basic features:
Most read and write operations on ZIM files are implemented.
Missing features:
The following features are still missing, but planned:
Additional features:
In addition to regular ZIM functionality, the following features are also implemented:
General project features:
pyzim
is published on PyPI as python-zim
due to a naming conflict with an existing package.
Via pip from PyPI
To install via pip
, run pip install python-zim
. Alternatively, run pip install python-zim[all]
to install all additional dependencies (like compression and testing libraries).
Here is a full ist of supported extra dependencies (usage: pip install python-zim[<extra>]
):
tox
will install further dependencies during testing.From source
git clone https://github.com/IMayBeABitShy/pyzim.git
cd
into directory: cd pyzim
pip
: pip install .[compression,testing]
. See above for the meaning of the extras specified. You may have to use python3 -m pip
instead and/or specify --user
.Please take a look at the examples/
directory for fully commented examples.
# read a specific file from the ZIM
import argparse
import pyzim
with pyzim.Zim.open(zimpath) as zim:
entry = zim.get_content_entry_by_url(entrypath)
entry = entry.resolve()
print("URL: ", entry.url)
print("Full URL: ", entry.full_url)
print("Redirect: ", entry.is_redirect)
print("Title: ", entry.title)
print("Mimetype: ", entry.mimetype)
print("Content location: {}@{}".format(entry.blob_number, entry.cluster_number))
print("\n\n=====CONTENT=====\n\n")
print(entry.read())
pyzim
is extensively documented using pydoctor
. There is currently no online version of the documentation, but you can build it locally by running tox -e docs
in the project directory, which will output HTML documentation to html/apidocs/
. This requires tox
to be installed.
If you are a contributor looking to write you own documentation, you can find a pydoctor syntax guide here.
At the time of writing this document, pyzim
achieves a (statement-based) test coverage of 98%. You can run the tests locally by executing tox
in the project directory. Specify the testing
extra during installation of pyzim
to automatically install all test dependencies.
pyzim
logs a lot of low-level operations at numeric values below the DEBUG
level. For example, each entry being read is logged, but normally aren't shown. See the documentation of pyzim.constants
for these log levels. Editing tox.ini
and changing the log level may be helpful when debugging.
**Why do I get an UnsupportedCompressionType
exception with a ZIM file?
pyzim
depends on other libraries to handle the decompression of data from the ZIM file. Luckily, the vast majority of these libraries come included with most python distributions. Unfortunately, these libraries may not be included when you build python yourself. Additionally, the most common compression in modern ZIM files is zstandard
, for which pyzim
depends on pyzstd
. Please ensure that this library is installed.
You can automatically install all optional compression dependencies by installing the compression
extra for pyzim
.
**Why do I get a BindRequired
exception / what does "bound/unbound" mean?
pyzim
differentiates between bound and unbound entries/clusters/... . An unbound object is an object that is not attached to any ZIM object. By default, most objects should be automatically bound by the various methods for accessing them, but if you are accessing any class directly you may encounter unbound ones.
You can bind any such objects by calling their .bind(zim_object)
method.
The idea behind this behavior is that we should be able to use the same code for readers and writers.
The following section lists various other resources related to ZIM files, which may be of interest to you. This includes enduser applications, alternative libraries, documentation and more. These lists are by no means exclusive.
ZIM programming libraries and documentation
libzim
, if you want to use another ZIM moduleZIM files
ZIM viewers (For endusers)
kiwix-tools
contains kiwix-serve
, a dedicated HTTP-Server for ZIM files.FAQs
Library for working with ZIM files
We found that python-zim demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
A deceptive PyPI package posing as an Instagram growth tool collects user credentials and sends them to third-party bot services.
Product
Socket now supports pylock.toml, enabling secure, reproducible Python builds with advanced scanning and full alignment with PEP 751's new standard.
Security News
Research
Socket uncovered two npm packages that register hidden HTTP endpoints to delete all files on command.