PyTorch Lightning, a widely adopted deep learning framework developed by Lightning AI, has been impacted by multiple critical deserialization vulnerabilities, disclosed under VU#252619. These issues affect all versions up to and including 2.4.0 and may lead to arbitrary code execution when loading untrusted model files.
The vulnerabilities were reported by Kasimir Schulz of HiddenLayer and coordinated by the CERT Coordination Center (CERT/CC) at Carnegie Mellon University. CERT/CC published the advisory on April 3, 2025, detailing multiple insecure deserialization pathways in PyTorch Lightning’s internal checkpointing, distributed training, and I/O components.
Built as a high-level interface on top of PyTorch, Lightning abstracts boilerplate training logic and supports distributed training, mixed precision, and seamless scaling across hardware. It is deeply embedded in the machine learning ecosystem, appearing in thousands of research papers, enterprise ML pipelines, and open source repositories.
In March 2025, Lightning AI announced that PyTorch Lightning had surpassed 200 million downloads. The project has received contributions from over 1,000 developers and represents more than 400,000 developer hours, a testament to its widespread adoption and importance in modern AI workflows.
However, that same ubiquity increases the risk profile of the vulnerabilities disclosed in VU#252619. Without safeguards on deserialization functions like torch.load()
and Python’s pickle
, users risk unintentionally executing malicious code embedded in model files, a threat with implications for both research reproducibility and production security.
Technical impact#
All affected components rely on insecure deserialization mechanisms, primarily torch.load()
and Python’s pickle
, which permit embedded code execution during load time. These vulnerabilities affect the following subsystems:
- Distributed checkpointing (
_load_distributed_checkpoint
) processes unverified serialized state across nodes in a cluster. - Cloud_IO enables model retrieval from local files, remote URLs, and in-memory byte streams without content validation.
- Lazy loading routines (
_lazy_load
) defer execution without verification. - Integration with DeepSpeed exposes additional deserialization vectors for optimizer and model state.
- PickleSerializer directly wraps Python’s
pickle
module, offering no sandboxing or validation.
An attacker could supply a crafted .ckpt
or .pt
file to an automated workflow—such as a training pipeline, inference service, or model registry—that results in arbitrary code execution in the context of the running Python process. In shared infrastructure, this could lead to full system compromise, data exfiltration, or lateral movement.
Recommended mitigations#
Until a patch is released, CERT/CC recommends the following actions:
- Enforce strict trust boundaries
Avoid loading models or serialized objects from untrusted, unauthenticated, or unmanaged sources. - Use restricted deserialization modes
Use torch.load(weights_only=True)
whenever possible to load only tensor data and avoid execution of arbitrary code. - Isolate untrusted model handling
Load external files in sandboxed environments such as containers with limited privileges or ephemeral VMs. - Inspect serialized files before loading
Use tools like pickletools
to statically inspect pickle content for suspicious behavior before deserialization. - Audit automation and internal tooling
Ensure that model registries, CI/CD pipelines, and training frameworks do not silently deserialize user-provided data.
As of publication, Lightning AI has not publicly acknowledged or patched the vulnerabilities. CERT/CC notes that the vendor has not responded to disclosure communications.
Organizations using PyTorch Lightning should apply the above mitigations and continue monitoring official channels for further updates.