Security News
Fluent Assertions Faces Backlash After Abandoning Open Source Licensing
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
HashFS
|version| |travis| |coveralls| |license|
HashFS is a content-addressable file management system. What does that mean? Simply, that HashFS manages a directory where files are saved based on the file's hash.
Typical use cases for this kind of system are ones where:
this was forked from https://github.com/dgilland/hashfs/ and it supports:
n
number of characters.hashlib.new
.Install using pip:
::
pip install hashfs
.. code-block:: python
from hashfs import HashFS
Designate a root folder for HashFS
. If the folder doesn't already exist, it will be created.
.. code-block:: python
# Set the `depth` to the number of subfolders the file's hash should be split when saving.
# Set the `width` to the desired width of each subfolder.
fs = HashFS('temp_hashfs', depth=4, width=1, algorithm='sha256')
# With depth=4 and width=1, files will be saved in the following pattern:
# temp_hashfs/a/b/c/d/efghijklmnopqrstuvwxyz
# With depth=3 and width=2, files will be saved in the following pattern:
# temp_hashfs/ab/cd/ef/ghijklmnopqrstuvwxyz
NOTE: The algorithm
value should be a valid string argument to hashlib.new()
.
HashFS
supports basic file storage, retrieval, and removal as well as some more advanced features like file repair.
Add content to the folder using either readable objects (e.g. StringIO
) or file paths (e.g. 'a/path/to/some/file'
).
.. code-block:: python
from io import StringIO
some_content = StringIO('some content')
address = fs.put(some_content)
# Or if you'd like to save the file with an extension...
address = fs.put(some_content, '.txt')
# The id of the file (i.e. the hexdigest of its contents).
address.id
# The absolute path where the file was saved.
address.abspath
# The path relative to fs.root.
address.relpath
# Whether the file previously existed.
address.is_duplicate
Get a file's HashAddress
by address ID or path. This address would be identical to the address returned by put()
.
.. code-block:: python
assert fs.get(address.id) == address
assert fs.get(address.relpath) == address
assert fs.get(address.abspath) == address
assert fs.get('invalid') is None
Get a BufferedReader
handler for an existing file by address ID or path.
.. code-block:: python
fileio = fs.open(address.id)
# Or using the full path...
fileio = fs.open(address.abspath)
# Or using a path relative to fs.root
fileio = fs.open(address.relpath)
NOTE: When getting a file that was saved with an extension, it's not necessary to supply the extension. Extensions are ignored when looking for a file based on the ID or path.
Delete a file by address ID or path.
.. code-block:: python
fs.delete(address.id)
fs.delete(address.abspath)
fs.delete(address.relpath)
NOTE: When a file is deleted, any parent directories above the file will also be deleted if they are empty directories.
Below are some of the more advanced features of HashFS
.
The HashFS
files may not always be in sync with it's depth
, width
, or algorithm
settings (e.g. if HashFS
takes ownership of a directory that wasn't previously stored using content hashes or if the HashFS
settings change). These files can be easily reindexed using repair()
.
.. code-block:: python
repaired = fs.repair()
# Or if you want to drop file extensions...
repaired = fs.repair(extensions=False)
WARNING: It's recommended that a backup of the directory be made before repairing just in case something goes wrong.
Instead of actually repairing the files, you can iterate over them for custom processing.
.. code-block:: python
for corrupted_path, expected_address in fs.corrupted():
# do something
WARNING: HashFS.corrupted()
is a generator so be aware that modifying the file system while iterating could have unexpected results.
Iterate over files.
.. code-block:: python
for file in fs.files():
# do something
# Or using the class' iter method...
for file in fs:
# do something
Iterate over folders that contain files (i.e. ignore the nested subfolders that only contain folders).
.. code-block:: python
for folder in fs.folders():
# do something
Compute the size in bytes of all files in the root
directory.
.. code-block:: python
total_bytes = fs.size()
Count the total number of files.
.. code-block:: python
total_files = fs.count()
# Or via len()...
total_files = len(fs)
For more details, please see the full documentation at http://hashfs.readthedocs.org.
.. |version| image:: http://img.shields.io/pypi/v/hashfs.svg?style=flat-square :target: https://pypi.python.org/pypi/hashfs/
.. |travis| image:: http://img.shields.io/travis/dgilland/hashfs/master.svg?style=flat-square :target: https://travis-ci.org/dgilland/hashfs
.. |coveralls| image:: http://img.shields.io/coveralls/dgilland/hashfs/master.svg?style=flat-square :target: https://coveralls.io/r/dgilland/hashfs
.. |license| image:: http://img.shields.io/pypi/l/hashfs.svg?style=flat-square :target: https://pypi.python.org/pypi/hashfs/
distutils.dir_util.mkpath
with os.path.makedirs
.shutil.move
instead of shutil.copy
to move temporary file created during put
operation to HashFS
directory.scandir
package for iterating over files/folders when platform is Python < 3.5. Scandir implementation was added to os
module starting with Python 3.5.HashFS.copy
to HashFS._copy
.is_duplicate
attribute to HashAddress
.HashFS.put()
return HashAddress
with is_duplicate=True
when file with same hash already exists on disk.HashFS.size()
method that returns the size of all files in bytes.HashFS.count()
/HashFS.__len__()
methods that return the count of all files.HashFS.__iter__()
method to support iteration. Proxies to HashFS.files()
.HashFS.__contains__()
method to support in
operator. Proxies to HashFS.exists()
.HashFS.repair()
not using extensions
argument properly.HashFS.length
parameter/property to width
. (breaking change)HashFS.get
to HashFS.open
. (breaking change)HashFS.get()
method that returns a HashAddress
or None
given a file ID or path.HashFS.get()
method that retrieves a reader object given a file ID or path.HashFS.delete()
method that deletes a file ID or path.HashFS.folders()
method that returns the folder paths that directly contain files (i.e. subpaths that only contain folders are ignored).HashFS.detokenize()
method that returns the file ID contained in a file path.HashFS.repair()
method that reindexes any files under root directory whose file path doesn't not match its tokenized file ID.Address
classs to HashAddress
. (breaking change)HashAddress.digest
to HashAddress.id
. (breaking change)HashAddress.path
to HashAddress.abspath
. (breaking change)HashAddress.relpath
which represents path relative to HashFS.root
.HashFS
class.HashFS.put()
method that saves a file path or file-like object by content hash.HashFS.files()
method that returns all files under root directory.HashFS.exists()
which checks either a file hash or file path for existence.FAQs
A content-addressable file management system.
We found that hashfs2 demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
Research
Security News
Socket researchers uncover the risks of a malicious Python package targeting Discord developers.
Security News
The UK is proposing a bold ban on ransomware payments by public entities to disrupt cybercrime, protect critical services, and lead global cybersecurity efforts.