Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

floof

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

floof

A library for fuzzymatching

  • 0.1.11
  • PyPI
  • Socket score

Maintainers
1

floof: simple fuzzymatching / comparison library

PyPI - Downloads

What is it?

floof is a Python package that makes fuzzymatching easy. Fuzzymatching is a common data task required whenever two strings don't quite exactly match. There are many algorithms to calculate string similarity, with dozens of disparate implementations. floof aims to collect all of these in an easy-to-use package, and reduce the boilerplate needed to apply these algorithms to your data.

Usage:

Dependencies

  • [pandas - Output ]
  • [scikit-learn - Used to implement TFIDF]
  • [sparse_dot_topn - Fast sparse matrix multiplication]

Installing

The easiest way is to install floof is from PyPI using pip:

pip install floof

Running

First, import the library.

import floof

Floof provides two classes: Comparer and Matcher. Both are instantiated the same way, taking as arguments two Pandas Series, an "original" and a "lookup", although in practice the order doesn't madder.

matcher = floof.Matcher(original, lookup)
comparer = floof.Comparer(original, lookup)

All functions in the Matcher class return a crosswalk of the original strings and the best k matches from the lookup strings. The primary convenience function is floof.Matcher().match(), which applies several different similarity algorithms and produces a composite score. Given an example input of:

original_names = ["apple", "pear"]
lookup_names = ["appl", "apil", "prear"]

A matcher function would return something like:

original_namelookup_namelevenshtein_scoretfidf_scorefinal_score
appleappl908085
appleapil708577.5
pearprear959092.5

The Comparer class is meant to compare strings one-to-one. That is to say, given an input of:

original_names = ["apple", "pear"]
lookup_names = ["appl", "apil"]

A comparer function would return something like:

levensthein_score
90
95

Performance

Fuzzymatching can be very intense, as many algorithms are by nature quadratic. For each original string, you must compare against all lookup strings. Therefore, floof is by default concurrent. It also can perform common-sense speedups, like first removing exact matches from the pool, and using a non-quadratic algorithm (TFIDF) to filter the pool.

TODO:

  • Allow custom scorers

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc