Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
pypinfo is a simple CLI to access PyPI download statistics via Google's BigQuery.
$ pypinfo
Usage: pypinfo [OPTIONS] [PROJECT] [FIELDS]... COMMAND [ARGS]...
Valid fields are:
project | version | file | pyversion | percent3 | percent2 | impl | impl-version |
openssl | date | month | year | country | installer | installer-version |
setuptools-version | system | system-release | distro | distro-version | cpu |
libc | libc-version
Options:
-a, --auth TEXT Path to Google credentials JSON file.
--run / --test --test simply prints the query.
-j, --json Print data as JSON, with keys `rows` and `query`.
-i, --indent INTEGER JSON indentation level.
-t, --timeout INTEGER Milliseconds. Default: 120000 (2 minutes)
-l, --limit INTEGER Maximum number of query results. Default: 10
-d, --days INTEGER Number of days in the past to include. Default: 30
-sd, --start-date TEXT Must be negative or YYYY-MM[-DD]. Default: -31
-ed, --end-date TEXT Must be negative or YYYY-MM[-DD]. Default: -1
-m, --month TEXT Shortcut for -sd & -ed for a single YYYY-MM month.
-w, --where TEXT WHERE conditional. Default: file.project = "project"
-o, --order TEXT Field to order by. Default: download_count
--all Show downloads by all installers, not only pip.
-pc, --percent Print percentages.
-md, --markdown Output as Markdown.
-v, --verbose Print debug messages to stderr.
--version Show the version and exit.
-h, --help Show this message and exit.
pypinfo accepts 0 or more options, followed by exactly 1 project, followed by 0 or more fields. By default only the last 30 days are queried. Let's take a look at some examples!
Tip: If queries are resulting in NoneType errors, increase timeout.
$ pypinfo requests
Served from cache: False
Data processed: 2.83 GiB
Data billed: 2.83 GiB
Estimated cost: $0.02
| download_count |
| -------------- |
| 116,353,535 |
$ pypinfo ""
Served from cache: False
Data processed: 116.15 GiB
Data billed: 116.15 GiB
Estimated cost: $0.57
| download_count |
| -------------- |
| 8,642,447,168 |
$ pypinfo django pyversion
Served from cache: False
Data processed: 967.33 MiB
Data billed: 968.00 MiB
Estimated cost: $0.01
| python_version | download_count |
| -------------- | -------------- |
| 3.8 | 1,735,967 |
| 3.6 | 1,654,871 |
| 3.7 | 1,326,423 |
| 2.7 | 876,621 |
| 3.9 | 524,570 |
| 3.5 | 258,609 |
| 3.4 | 12,769 |
| 3.10 | 3,050 |
| 3.3 | 225 |
| 2.6 | 158 |
| Total | 6,393,263 |
$ pypinfo "" country
Served from cache: False
Data processed: 150.40 GiB
Data billed: 150.40 GiB
Estimated cost: $0.74
| country | download_count |
| ------- | -------------- |
| US | 6,614,473,568 |
| IE | 336,037,059 |
| IN | 192,914,402 |
| DE | 186,968,946 |
| NL | 182,691,755 |
| None | 141,753,357 |
| BE | 111,234,463 |
| GB | 109,539,219 |
| SG | 106,375,274 |
| FR | 86,036,896 |
| Total | 8,068,024,939 |
$ pypinfo cryptography system distro
Served from cache: False
Data processed: 2.52 GiB
Data billed: 2.52 GiB
Estimated cost: $0.02
| system_name | distro_name | download_count |
| ----------- | ------------------------------- | -------------- |
| Linux | Ubuntu | 19,524,538 |
| Linux | Debian GNU/Linux | 11,662,104 |
| Linux | Alpine Linux | 3,105,553 |
| Linux | Amazon Linux AMI | 2,427,975 |
| Linux | Amazon Linux | 2,374,869 |
| Linux | CentOS Linux | 1,955,181 |
| Windows | None | 1,522,069 |
| Linux | CentOS | 568,370 |
| Darwin | macOS | 489,859 |
| Linux | Red Hat Enterprise Linux Server | 296,858 |
| Total | | 43,927,376 |
$ pypinfo --days 365 "" project
Served from cache: False
Data processed: 1.69 TiB
Data billed: 1.69 TiB
Estimated cost: $8.45
| project | download_count |
| --------------- | -------------- |
| urllib3 | 1,382,528,406 |
| six | 1,172,798,441 |
| botocore | 1,053,169,690 |
| requests | 995,387,353 |
| setuptools | 992,794,567 |
| certifi | 948,518,394 |
| python-dateutil | 934,709,454 |
| idna | 929,781,443 |
| s3transfer | 877,565,186 |
| chardet | 854,744,674 |
| Total | 10,141,997,608 |
$ pypinfo --start-date 2018-04-01 --end-date 2018-04-30 setuptools
Served from cache: False
Data processed: 571.37 MiB
Data billed: 572.00 MiB
Estimated cost: $0.01
| download_count |
| -------------- |
| 8,972,826 |
--start-date
defaults to the first day of the month--end-date
defaults to the last day of the month$ pypinfo --start-date 2018-04 --end-date 2018-04 setuptools
Served from cache: False
Data processed: 571.37 MiB
Data billed: 572.00 MiB
Estimated cost: $0.01
| download_count |
| -------------- |
| 8,972,826 |
$ pypinfo --month 2018-04 setuptools
Served from cache: False
Data processed: 571.37 MiB
Data billed: 572.00 MiB
Estimated cost: $0.01
| download_count |
| -------------- |
| 8,972,826 |
Let's use --test
to only see the query instead of sending it.
$ pypinfo --test --days 365 --limit 100 "" project percent3
SELECT
file.project as project,
ROUND(100 * SUM(CASE WHEN REGEXP_EXTRACT(details.python, r"^([^\.]+)") = "3" THEN 1 ELSE 0 END) / COUNT(*), 1) as percent_3,
COUNT(*) as download_count,
FROM `bigquery-public-data.pypi.file_downloads`
WHERE timestamp BETWEEN TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -366 DAY) AND TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -1 DAY)
AND details.installer.name = "pip"
GROUP BY
project
ORDER BY
download_count DESC
LIMIT 100
pypinfo supports PEP 440 version matching.
We can use it to query stats on a given major version.
$ pypinfo -pc 'pip==21.*' pyversion version
Served from cache: False
Data processed: 34.45 MiB
Data billed: 35.00 MiB
Estimated cost: $0.01
| python_version | version | percent | download_count |
| -------------- | ------- | ------- | -------------- |
| 3.6 | 21.3.1 | 78.74% | 10,430 |
| 3.8 | 21.3.1 | 7.81% | 1,034 |
| 3.7 | 21.2.1 | 3.59% | 476 |
| 3.7 | 21.3.1 | 2.60% | 345 |
| 3.7 | 21.0.1 | 2.25% | 298 |
| 3.8 | 21.0.1 | 1.58% | 209 |
| 3.8 | 21.2.1 | 1.42% | 188 |
| 3.7 | 21.1.2 | 0.81% | 107 |
| 3.9 | 21.3.1 | 0.69% | 92 |
| 3.8 | 21.1.1 | 0.51% | 67 |
| Total | | | 13,246 |
We can also use it to query stats on an exact version:
$ pypinfo -pc 'numpy==1.23rc3' pyversion version
Served from cache: False
Data processed: 34.01 MiB
Data billed: 35.00 MiB
Estimated cost: $0.01
| python_version | version | percent | download_count |
| -------------- | --------- | ------- | -------------- |
| 3.9 | 1.23.0rc3 | 63.33% | 38 |
| 3.8 | 1.23.0rc3 | 28.33% | 17 |
| 3.10 | 1.23.0rc3 | 8.33% | 5 |
| Total | | | 60 |
Check how many downloads came from continuous integration servers:
❯ pypinfo --percent --days 5 pillow ci
Served from cache: False
Data processed: 384.22 MiB
Data billed: 385.00 MiB
Estimated cost: $0.01
| ci | percent | download_count |
| ----- | ------- | -------------- |
| None | 79.37% | 11,963,127 |
| True | 20.63% | 3,109,931 |
| Total | | 15,073,058 |
pypinfo is distributed on PyPI as a universal wheel and is available on Linux/macOS and Windows and supports Python 3.7+.
This is relatively painless, I swear.
Sign up if you haven't already. The first TB of queried data each month is free. Each additional TB is $5.
Sign in on your account if you are not already;
Go to https://console.developers.google.com/cloud-resource-manager and click CREATE PROJECT if you don't already have one:
Note: You will be requested to go back to Big Query panel. In this case, click on top-left button "Navigation Menu", option "API and services > Enabled APIs and services" and on consequent page, on item "Big Query API":
{name}-{credentials_hash}.json
:python -m pip install pypinfo
in the terminal.pypinfo --auth path/to/your_credentials.json
, or set an environment variable GOOGLE_APPLICATION_CREDENTIALS
that points to the file.FAQs
View PyPI download statistics with ease.
We found that pypinfo demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.