OFS is a bucket/object storage library.
It provides a common API for storing bitstreams (plus related metadata) in
'bucket/object' stores such as:
- S3, Google Storage, Eucalytus, Archive.org
- Filesystem (via pairtree)
- 'REST' Store (see remote/reststore.py - implementation at http://bitbucket.org/pudo/repod/)
- Riak (buggy)
- add a backend here - just implement the methods in base.py
Why use the library:
- Abstraction: write common code but use different storage backends
- More than a filesystem, less than a database - support for metadata as well as bitstreams
Requirements
For all boto-based stores (Google Storage, S3 etc) require boto>=2.0.
Example Usage
(local version - depends on 'pairtree', and 'simplejson')::
>>> from ofs.local import PTOFS
>>> o = PTOFS()
(Equivalent to 'o = PTOFS(storage_dir = "data", uri_base="urn:uuid:", hashing_type="md5")')
# Claim a bucket - this will add the bucket to the list of existing ones
>>> uuid_id = o.claim_bucket()
>>> uuid_id
'4aaa43cdf5ba44e2ad25acdbd1cf2f70'
# Choose a bucket name - if it exists, a new UUID one will be formed instead and returned
>>> bucket_id = o.claim_bucket("foo")
>>> bucket_id
'foo'
>>> bucket_id = o.claim_bucket("foo")
>>> bucket_id
'1bf93208521545879e79c13614cd12f0'
# Store a file:
>>> o.put_stream(bucket_id, "foo.txt", open("foo....))
{'_label': 'foo.txt', '_content_length': 10, '_checksum': 'md5:10feda25f8da2e2ebfbe646eea351224', '_last_modified': '2010-08-02T11:37:21', '_creation_date': '2010-08-02T11:37:21'}
# or:
>>> o.put_stream(bucket_id, "foo.txt", "asidaisdiasjdiajsidjasidji")
{'_label': 'foo.txt', '_content_length': 10, '_checksum': 'md5:10feda25f8da2e2ebfbe646eea351224', '_last_modified': '2010-08-02T11:37:21', '_creation_date': '2010-08-02T11:37:21'}
# adding a file with some parameters:
>>> o.put_stream(bucket_id, "foooo", "asidaisdiasjdiajsidjasidji", params={"original_uri":"http://...."})
{'_label': 'foooo', 'original_uri': 'http://....', '_last_modified': '2010-08-02T11:39:11', '_checksum': 'md5:3d690d7e0f4479c5a7038b8a4572d0fe', '_creation_date': '2010-08-02T11:39:11', '_content_length': 26}
# Get the underlying URL pointing to a resource
>>> o.get_url(bucket_id, "foo")
[typical local pairtree response:]
"file:///opt/ofs_store/pairtree_root/1b/f9/32/......./obj/foo"
[typical remote response]
"http://..."
"ftp://..."
# adding to existing metadata:
>>> o.update_metadata(bucket_id, "foooo", {'foo':'bar'})
{'_label': 'foooo', 'original_uri': 'http://....', '_last_modified': '2010-08-02T11:39:11', '_checksum': 'md5:3d690d7e0f4479c5a7038b8a4572d0fe', '_creation_date': '2010-08-02T11:39:11', '_content_length': 26, 'foo': 'bar'}
# Remove keys
>>> o.remove_metadata_keys(bucket_id, "foooo", ['foo'])
{'_label': 'foooo', 'original_uri': 'http://....', '_last_modified': '2010-08-02T11:39:11', '_checksum': 'md5:3d690d7e0f4479c5a7038b8a4572d0fe', '_creation_date': '2010-08-02T11:39:11', '_content_length': 26}
# Delete blob
>>> o.exists(bucket_id, "foooo")
True
>>> o.del_stream(bucket_id, "foooo")
>>> o.exists(bucket_id, "foooo")
False
# Iterate through ids for buckets held:
>>> for item in o.list_buckets():
... print(item)
...
447536aa0f1b411089d12399738ede8e
4a726b0a33974480a2a26d34fa0d494d
4aaa43cdf5ba44e2ad25acdbd1cf2f70
.... etc
# Display the labels in a specific bucket:
>>>o.list_labels("1bf93208521545879e79c13614cd12f0")
[u'foo.txt']
Developer
Tests use plain unittest but recommend using nose.
To run the botostore tests you'll need to copy test.ini.tmpl to test.ini and
put in details for a google storage account.
Changelog
v0.4.1: 2011-08-13
- Set checksum (md5) based on etag (botostore backends) if not set
v0.4: 2011-04-28
- New authenticate_request method for boto based backends.
- Improved update_medata in botostore (no need to download and re-upload).
v0.3: 2011-01-20
- S3Bounce backend (use authorization credentials from CKAN).
- Use setuptools plugins with ofs.backend to allow for 3rd party backends
- ofs_upload command
v0.2: 2010-11-20
- Google Storage support.
- REST store
v0.1: 2010-10-14
- Initial implemenation with PairTree and S3