Sign inDemoInstall


Package Overview
File Explorer

Install Socket

Protect your apps from supply chain attacks



Classes for representing different file formats in Python classes for use in type hinting in data workflows




.. image::
.. image::
.. image::
   :alt: Supported Python versions
.. image::
   :alt: Latest Version
.. image::
   :alt: Documentation Status

*Fileformats* provides a library of file-format types implemented as Python classes.
The file-format types are designed to be used in type validation during the construction
of data workflows (e.g. Pydra_, Fastr_), and also provide some basic data handling methods
(e.g. loading data to dictionaries) and conversions between some equivalent types When
the "extended" install option is provided.

File-format types are typically identified by a combination of file extension
and "magic numbers" where applicable, however, unlike many other file-type Python packages,
*FileFormats*, supports multi-file data formats ("file sets") often found in scientific
workflows, e.g. with separate header/data files. *FileFormats* also provides a flexible
framework to add custom identification routines for exotic file formats, e.g.
formats that require inspection of headers to locate data files, directories containing
certain file types, or to peek at metadata fields to define specific sub-types
(e.g. functional MRI DICOM file set).

See the `extension template <>`__
for instructions on how to design *FileFormats* extensions modules to augment the
standard file-types implemented in the main repository with custom domain/vendor-specific
file-format types.

Notes on MIME-type coverage

Support for all non-vendor standard MIME types (i.e. ones not matching ``*/vnd.*`` or ``*/x-*``) has been
added to *FileFormats* by semi-automatically scraping the `IANA MIME types`_ website for file
extensions and magic numbers. As such, many of the formats in the library have not been properly
tested on real data and so should be treated with some caution. If you encounter any issues with an implemented file
type, please raise an issue in the `GitHub tracker <>`__.

Adding support for vendor formats will be relatively straightforward, it just requires someone to do the job
of manually curating the scraped data (a days work or so). Please get in touch if you are interested in helping out
with this.


*FileFormats* can be installed for Python >= 3.7 from PyPI with

.. code-block:: bash

    $ python3 -m pip fileformats

Support for converter methods between a few select formats can be installed by
passing the 'extended' install extra, e.g

.. code-block:: bash

    $ python3 -m pip install fileformats[extended]


Using the ``WithMagicNumber`` mixin class, the ``Png`` format can be defined concisely as

.. code-block:: python

    from fileformats.generic import File
    from fileformats.core.mixin import WithMagicNumber

    class Png(WithMagicNumber, File):
        binary = True
        ext = ".png"
        iana_mime = "image/png"
        magic_number = b".PNG"

Files can then be checked to see whether they are of PNG format by

.. code-block:: python

    png = Png("/path/to/image/file.png")  # Checks the extension and magic number

which will raise a ``FormatMismatchError`` if initialisation or validation fails, or
for a boolean method that checks the validation use ``matches``

.. code-block:: python

    if Png.matches(a_path_to_a_file):
        ... handle case ...

Format Conversion

While not implemented in the main File-formats itself, file-formats provides hooks for other packages to implement extra behaviour such as format conversion. The `fileformats-extras <>`__ implements a number of converters between standard file-format types, e.g. archive types to/from generic file/directories, which if installed can be called using the `convert()` method.

.. code-block:: python

    from fileformats.application import Zip
    from fileformats.generic import Directory

    zip_file = Zip.convert(Directory("/path/to/a/directory"))
    extracted = Directory.convert(zip_file)
    copied = extracted.copy_to("/path/to/output")

The converters are implemented in the Pydra_ dataflow framework, and can be linked into
wider Pydra_ workflows by creating a converter task

.. code-block:: python

    import pydra
    from pydra.tasks.mypackage import MyTask
    from fileformats.application import Json, Yaml

    wf = pydra.Workflow(name="a_workflow", input_spec=["in_json"])
        Yaml.get_converter(Json, name="json2yaml", in_file=wf.lzin.in_json)

Alternatively, the conversion can be executed outside of a Pydra_ workflow with

.. code-block:: python

    json_file = Json("/path/to/file.json")
    yaml_file = Yaml.convert(json_file)


This work is licensed under a
`Creative Commons Attribution 4.0 International License <>`_

.. image::
  :alt: Creative Commons Attribution 4.0 International License

.. _Pydra:
.. _Fastr:
.. _`IANA MIME types`:



Did you know?

Socket installs a GitHub app to automatically flag issues on every pull request and report the health of your dependencies. Find out what is inside your node modules and prevent malicious activity before you update the dependencies.


Related posts

SocketSocket SOC 2 Logo


  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap

Stay in touch

Get open source security insights delivered straight into your inbox.

  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc