Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

bagit

Package Overview
Dependencies
Maintainers
2
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

bagit

Create and validate BagIt packages

  • 1.8.1
  • PyPI
  • Socket score

Maintainers
2

bagit-python

|Build Status| |Coverage Status|

bagit is a Python library and command line utility for working with BagIt <http://purl.org/net/bagit>__ style packages.

Installation

bagit.py is a single-file python module that you can drop into your project as needed or you can install globally with:

::

pip install bagit

Python v2.7+ is required.

Command Line Usage

When you install bagit you should get a command-line program called bagit.py which you can use to turn an existing directory into a bag:

::

bagit.py --contact-name 'John Kunze' /directory/to/bag

Finding Bagit on your system


The ``bagit.py`` program should be available in your normal command-line
window (Terminal on OS X, Command Prompt or Powershell on Windows,
etc.). If you are unsure where it was installed you can also request
that Python search for ``bagit`` as a Python module: simply replace
``bagit.py`` with ``python -m bagit``:

::

    python -m bagit --help

On some systems Python may have been installed as ``python3``, ``py``,
etc. – simply use the same name you use to start an interactive Python
shell:

::

    py -m bagit --help
    python3 -m bagit --help

Configuring BagIt
~~~~~~~~~~~~~~~~~

You can pass in key/value metadata for the bag using options like
``--contact-name`` above, which get persisted to the bag-info.txt. For a
complete list of bag-info.txt properties you can use as commmand line
arguments see ``--help``.

Since calculating checksums can take a while when creating a bag, you
may want to calculate them in parallel if you are on a multicore
machine. You can do that with the ``--processes`` option:

::

    bagit.py --processes 4 /directory/to/bag

To specify which checksum algorithm(s) to use when generating the
manifest, use the --md5, --sha1, --sha256 and/or --sha512 flags (MD5 is
generated by default).

::

    bagit.py --sha1 /path/to/bag
    bagit.py --sha256 /path/to/bag
    bagit.py --sha512 /path/to/bag

If you would like to validate a bag you can use the --validate flag.

::

    bagit.py --validate /path/to/bag

If you would like to take a quick look at the bag to see if it seems
valid by just examining the structure of the bag, and comparing its
payload-oxum (byte count and number of files) then use the ``--fast``
flag.

::

    bagit.py --validate --fast /path/to/bag

And finally, if you'd like to parallelize validation to take advantage
of multiple CPUs you can:

::

    bagit.py --validate --processes 4 /path/to/bag

Using BagIt in your programs
----------------------------

You can also use BagIt programatically in your own Python programs by
importing the ``bagit`` module.

Create
~~~~~~

To create a bag you would do this:

.. code:: python

    bag = bagit.make_bag('mydir', {'Contact-Name': 'John Kunze'})

``make_bag`` returns a Bag instance. If you have a bag already on disk
and would like to create a Bag instance for it, simply call the
constructor directly:

.. code:: python

    bag = bagit.Bag('/path/to/bag')

Update Bag Metadata
~~~~~~~~~~~~~~~~~~~

You can change the metadata persisted to the bag-info.txt by using the
``info`` property on a ``Bag``.

.. code:: python

    # load the bag
    bag = bagit.Bag('/path/to/bag')

    # update bag info metadata
    bag.info['Internal-Sender-Description'] = 'Updated on 2014-06-28.'
    bag.info['Authors'] = ['John Kunze', 'Andy Boyko']
    bag.save()

Update Bag Manifests
~~~~~~~~~~~~~~~~~~~~

By default ``save`` will not update manifests. This guards against a
situation where a call to ``save`` to persist bag metadata accidentally
regenerates manifests for an invalid bag. If you have modified the
payload of a bag by adding, modifying or deleting files in the data
directory, and wish to regenerate the manifests set the ``manifests``
parameter to True when calling ``save``.

.. code:: python


    import shutil, os

    # add a file
    shutil.copyfile('newfile', '/path/to/bag/data/newfile')

    # remove a file
    os.remove('/path/to/bag/data/file')

    # persist changes
    bag.save(manifests=True)

The save method takes an optional processes parameter which will
determine how many processes are used to regenerate the checksums. This
can be handy on multicore machines.

Validation
~~~~~~~~~~

If you would like to see if a bag is valid, use its ``is_valid`` method:

.. code:: python

    bag = bagit.Bag('/path/to/bag')
    if bag.is_valid():
        print("yay :)")
    else:
        print("boo :(")

If you'd like to get a detailed list of validation errors, execute the
``validate`` method and catch the ``BagValidationError`` exception. If
the bag's manifest was invalid (and it wasn't caught by the payload
oxum) the exception's ``details`` property will contain a list of
``ManifestError``\ s that you can introspect on. Each ManifestError,
will be of type ``ChecksumMismatch``, ``FileMissing``,
``UnexpectedFile``.

So for example if you want to print out checksums that failed to
validate you can do this:

.. code:: python


    bag = bagit.Bag("/path/to/bag")

    try:
      bag.validate()

    except bagit.BagValidationError as e:
        for d in e.details:
            if isinstance(d, bagit.ChecksumMismatch):
                print("expected %s to have %s checksum of %s but found %s" %
                      (d.path, d.algorithm, d.expected, d.found))

To iterate through a bag's manifest and retrieve checksums for the
payload files use the bag's entries dictionary:

.. code:: python

    bag = bagit.Bag("/path/to/bag")

    for path, fixity in bag.entries.items():
      print("path:%s md5:%s" % (path, fixity["md5"]))

Contributing to bagit-python development
----------------------------------------

::

    % git clone git://github.com/LibraryOfCongress/bagit-python.git
    % cd bagit-python
    # MAKE CHANGES
    % python test.py

Running the tests
~~~~~~~~~~~~~~~~~

You can quickly run the tests by having setuptools install dependencies:

::

    python setup.py test

Once your code is working, you can use
`Tox <https://tox.readthedocs.io/>`__ to run the tests with every
supported version of Python which you have installed on the local
system:

::

    tox

If you have Docker installed, you can run the tests under Linux inside a
container:

::

    % docker build -t bagit:latest . && docker run -it bagit:latest

Benchmarks
----------

If you'd like to see how increasing parallelization of bag creation on
your system effects the time to create a bag try using the included
bench utility:

::

    % ./bench.py

License
-------

|cc0|

Note: By contributing to this project, you agree to license your work
under the same terms as those that govern this project's distribution.

.. |Build Status| image:: https://travis-ci.org/LibraryOfCongress/bagit-python.svg?branch=master
   :target: http://travis-ci.org/LibraryOfCongress/bagit-python
.. |Coverage Status| image:: https://coveralls.io/repos/github/LibraryOfCongress/bagit-python/badge.svg?branch=master
   :target: https://coveralls.io/github/LibraryOfCongress/bagit-python?branch=master
.. |cc0| image:: http://i.creativecommons.org/p/zero/1.0/88x31.png
   :target: http://creativecommons.org/publicdomain/zero/1.0/


FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc