CityHash/FarmHash
Python wrapper for FarmHash and
CityHash, a family of fast
non-cryptographic hash functions.
Getting Started
To install from PyPI:
pip install cityhash
To install in a Conda environment:
conda install -c conda-forge python-cityhash
The package exposes Python APIs for CityHash and FarmHash under cityhash
and
farmhash
namespaces, respectively. Each provides 32-, 64- and 128-bit
implementations.
Usage Examples
Stateless hashing
Usage example for FarmHash:
>>> from farmhash import FarmHash32, FarmHash64, FarmHash128
>>> FarmHash32("abc")
1961358185
>>> FarmHash64("abc")
2640714258260161385
>>> FarmHash128("abc")
76434233956484675513733017140465933893
Hardware-independent fingerprints
Fingerprints are seedless hashes that are guaranteed to be hardware- and
platform-independent. This can be useful for networking applications which
require persisting hashed values.
>>> from farmhash import Fingerprint128
>>> Fingerprint128("abc")
76434233956484675513733017140465933893
Incremental hashing
CityHash and FarmHash do not support incremental hashing and thus are not ideal
for hashing of character streams. If you require incremental hashing, consider
another hashing library, such as
MetroHash or
xxHash.
Fast hashing of NumPy arrays
The Buffer Protocol allows
Python objects to expose their data as raw byte arrays for fast access without
having to copy to a separate location in memory. NumPy is one well-known
library that extensively uses this protocol.
All hashing functions in this package will read byte arrays from objects that
expose them via the buffer protocol. Here is an example showing hashing of a
four-dimensional NumPy array:
>>> import numpy as np
>>> from farmhash import FarmHash64
>>> arr = np.zeros((256, 256, 4))
>>> FarmHash64(arr)
1550282412043536862
The NumPy arrays need to be contiguous for this to work. To convert a
non-contiguous array, use NumPy's ascontiguousarray()
function.
SSE4.2 support
For x86-64 platforms, the PyPI repository for this package includes wheels
compiled with SSE4.2 support. The 32- and 64-bit (but not the 128-bit)
variants of FarmHash significantly benefit from SSE4.2 instructions.
The vanilla CityHash functions (under cityhash
module) do not take advantage
of SSE4.2. Instead, one can use the cityhashcrc
module provided with this
package which exposes 128- and 256-bit CRC functions that do harness SSE4.2.
These functions are very fast, and beat FarmHash128
on speed (FarmHash does
not include a 256-bit function). Since FarmHash is the intended successor of
CityHash, I would be careful before using the CityHash-CRC functions, however,
and would verify whether they provide sufficient randomness for your intended
application.
Development
Local workflow
For those wanting to contribute, here is a quick start using Make commands:
git clone https://github.com/escherba/python-cityhash.git
cd python-cityhash
make env
make test
make cpp-test
make shell
To find out which Make targets are available, enter:
make help
Distribution
The package wheels are built using
cibuildwheel and are distributed to
PyPI using GitHub actions. The wheels contain compiled binaries and are
available for the following platforms: windows-amd64, ubuntu-x86,
linux-x86_64, linux-aarch64, and macosx-x86_64.
See Also
For other fast non-cryptographic hash functions available as Python extensions,
see MetroHash,
MurmurHash, and
xxHash.
Authors
The original CityHash Python bindings are due to Alexander [Amper] Marshalov.
They were rewritten in Cython by Eugene Scherba, who also added the FarmHash
bindings. The CityHash and FarmHash algorithms and their C++ implementation are
by Google.
License
This software is licensed under the MIT
License. See the included
LICENSE file for details.