If you've ever used Python for data science, web development, or machine learning, you've likely installed packages without a second thought about what's inside them. But as software grows more complex and regulatory standards become stricter, understanding the components of these packages has become crucial.
A current limitation is that the Python package manifest does not include non-Python files, leading to incomplete descriptions of package contents. Recognizing the growing importance of SBOMs for regulatory and security purposes, a new PEP (Python Enhancement Proposal) introduces an optional SBOM metadata field that would enable Python packages to include these detailed component lists.
Addressing the Phantom Dependency Problem#
PEP 770, titled “Improving measurability of Python packages with Software Bill-of-Materials,” was published by PSF Security Developer-in-Residence Seth Larson. The proposal is aimed at making Python packages more transparent and measurable by software composition analysis (SCA) tools, and align Python packages with evolving global standards.
“I am hoping that by providing a mechanism to have SBOM information included in Python packages at all (and as a solution to a named problem ‘phantom dependency’ where Python is the unfortunate star) we’ll see increased interest and contributions from users who need SBOMs,” Larson said in a discussion on SBOMs for Python packages last November.
“As it stands today, if a company wanted to improve projects’ SBOMs that they depend on they would need to start where I am now with packaging PEPs and a plan to contribute to a bunch of Python tools and then advocate for projects they care about to accept their contributions. My goal with the PEP is to at least unblock the first few things in the way of those types of contributions.”
The phantom dependency problem he referenced is when software components that aren’t written in Python are included in packages. Since they can’t be described using Python package metadata, they can get missed by SCA tools, making it difficult to track vulnerable components. The proposal highlights that packaging non-Python software is common in certain use cases, such as scientific, data, web, and machine-learning projects that necessitate the use of languages like Rust, C, C++, Fortran, JavaScript, and others.
“For example, the Python package Pillow includes 16 shared object libraries in the wheel that were bundled by auditwheel as a part of the build,” Larson explained in the proposal. “None of those shared object libraries are detected when using common SCA tools like Syft and Grype. If an SBOM document is included annotating all the included shared libraries then SCA tools can identify the included software reliably.”
The Importance of SBOMs#
An SBOM (Software Bill-of-Materials) is like a detailed ingredient list for software packages. It shows what components, libraries, or dependencies are included in a package, along with their provenance and version. These documents are essential for:
PEP 770 would add SBOM support in a strategic way that doesn’t introduce any backwards compatibility concerns, as it doesn’t change the behavior of any existing metadata fields. Instead it adds a new core metadata field, which would be included in pyproject.toml
files.
Here are the key changes proposed in PEP 770:
- Optional SBOM Metadata Field:
- A new Core Metadata field,
Sbom-File
, will list paths to SBOM documents included in a package. - SBOM documents must follow recognized standards, such as CycloneDX or SPDX.
- Flexibility for Multiple SBOMs:
- Packages can include zero or more SBOMs.
- Multiple SBOM standards can coexist in the same package, ensuring compatibility with diverse tools.
- Integration into Python Packaging Standards:
- SBOM metadata will be added to
pyproject.toml
files. - Build tools will include SBOMs in both source and built distributions (e.g., wheels).
- Validation by PyPI:
- When uploading packages with SBOMs, PyPI will validate that the files are present, properly formatted, and conform to the specified standards.
Supporting Multiple SBOM Standards#
CycloneDX and SPDX are two widely adopted SBOM standards, each with unique strengths. Instead of mandating one standard, PEP 770 allows packages to include SBOMs in either format (or both), ensuring compatibility with a broader range of tools. This flexibility empowers developers to choose the standard that best fits their needs without forcing the ecosystem into a single approach. By supporting multiple standards, Python packages can integrate seamlessly with existing security and compliance workflows, offering greater utility to users and organizations alike.
PEP 770 is currently in draft status, but the proposal has already received some good feedback. A few commenters are in favor of Python being more opinionated and would support specifying a single SBOM format with most leaning towards the CycloneDX format. Larson addressed these comments today, explaining why the proposal doesn’t settle on a single SBOM standard:
Given the large number of tools for building/repairing package archives before publication I opted to treat SBOMs inside archives as opaque and independent from each other and instead placing the burden of “merging” them together afterwards on consumers. This would avoid tools stepping on each-other’s toes when attempting to record data into an SBOM.
Given the above, I didn’t see selecting a single standard as critical. I am open to refactoring the PEP to select a single SBOM standard if that’s desirable. I think this would be an important thing to do if there genuinely is a use-case for intermediary tools modifying SBOM documents produced by other tools while a Python package archive is being built.
This aspect of the proposal and other details are still up for discussion during this review period. Overall, introducing support for SBOMs in packages stands to improve security in the Python ecosystem by uncovering vulnerabilities in phantom dependencies and would be instrumental in helping organizations meet regulatory requirements. We will be following progress on this PEP closely to understand its impact on maintainers and developers in securing Python dependencies.