Security News
Fluent Assertions Faces Backlash After Abandoning Open Source Licensing
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
A razor-thin layer over csvmatch that allows you to do fuzzy mathing with pandas dataframes.
pip install fuzzy_pandas
To borrow 100% from the original repo, say you have one CSV file such as:
name,location,codename
George Smiley,London,Beggerman
Percy Alleline,London,Tinker
Roy Bland,London,Soldier
Toby Esterhase,Vienna,Poorman
Peter Guillam,Brixton,none
Bill Haydon,London,Tailor
Oliver Lacon,London,none
Jim Prideaux,Slovakia,none
Connie Sachs,Oxford,none
And another such as:
Person Name,Location
Maria Andreyevna Ostrakova,Russia
Otto Leipzig,Estonia
George SMILEY,London
Peter Guillam,Brixton
Konny Saks,Oxford
Saul Enderby,London
Sam Collins,Vietnam
Tony Esterhase,Vienna
Claus Kretzschmar,Hamburg
You can then find which names are in both files:
import pandas as pd
import fuzzy_pandas as fpd
df1 = pd.read_csv("data1.csv")
df2 = pd.read_csv("data2.csv")
matches = fpd.fuzzy_merge(df1, df2,
left_on=['name'],
right_on=['Person Name'],
ignore_case=True,
keep='match')
print(matches)
. | name | Person Name |
---|---|---|
0 | George Smiley | George SMILEY |
1 | Peter Guillam | Peter Guillam |
Dumping this out of the code itself, apologies for lack of pretty formatting.
left_on
.right_on
.0.6
For more how-to information, check out [the examples folder](https://github.com/jsoma/fuzzy_pandas/tree/master/examples) or the [the original repo](https://github.com/maxharlow/csvmatch).
FAQs
Fuzzy matching in pandas using csvmatch
We found that fuzzy-pandas demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
Research
Security News
Socket researchers uncover the risks of a malicious Python package targeting Discord developers.
Security News
The UK is proposing a bold ban on ransomware payments by public entities to disrupt cybercrime, protect critical services, and lead global cybersecurity efforts.