New Case Study:See how Anthropic automated 95% of dependency reviews with Socket.Learn More
Socket
Sign inDemoInstall
Socket

pybktree

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

pybktree

BK-tree data structure to allow fast querying of "close" matches

  • 1.1
  • PyPI
  • Socket score

Maintainers
1

pybktree

pybktree is a generic, pure Python implementation of a BK-tree_ data structure, which allows fast querying of "close" matches (for example, matches with small hamming distance or Levenshtein distance). This module is based on the algorithm by Nick Johnson in his blog article on BK-trees_.

The library is on the Python Package Index (PyPI)_ and works on both Python 3 and Python 2.7. To install it, fire up a command prompt, activate your virtual environment if you're using one, and type:

::

pip install pybktree

Example usage:

.. code:: python

>>> tree = pybktree.BKTree(pybktree.hamming_distance, [0, 4, 5, 14])
>>> tree.add(15)              # add element 15
>>> sorted(tree)              # BKTree instances are iterable
[0, 4, 5, 14, 15]
>>> sorted(tree.find(13, 1))  # find elements at most 1 bit away from element 13
[(1, 5), (1, 15)]

For large trees and fairly small N when calling find(), using a BKTree is much faster than doing a linear search. This is especially good when you're de-duping a few hundred thousand photos -- with a linear search that would become a very slow, O(N²) operation. With a BKTree, it's more like O(N log N).

Read the code in pybktree.py_ for more details – it's pretty small!

Other BK-tree modules I found on GitHub while writing this one:

  • ahupp/bktree_: this one is pretty good, but it's not on PyPI, and it's recursive
  • ryanfox/bktree_: this one is hard to customize, search() doesn't return distances, it's slower, and was buggy (though I think he fixed it recently)

pybktree was written by Ben Hoyt_ for Jetsetter_ and is licensed with a permissive MIT license (see LICENSE.txt_).

.. _BK-tree: https://en.wikipedia.org/wiki/BK-tree .. _blog article on BK-trees: http://blog.notdot.net/2007/4/Damn-Cool-Algorithms-Part-1-BK-Trees .. _on the Python Package Index (PyPI): https://pypi.python.org/pypi/pybktree .. _pybktree.py: https://github.com/Jetsetter/pybktree/blob/master/pybktree.py .. _ahupp/bktree: https://github.com/ahupp/bktree .. _ryanfox/bktree: https://github.com/ryanfox/bktree .. _Ben Hoyt: http://benhoyt.com/ .. _Jetsetter: http://www.jetsetter.com/ .. _LICENSE.txt: https://github.com/Jetsetter/pybktree/blob/master/LICENSE.txt

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc