
Research
Security News
Malicious npm Packages Target BSC and Ethereum to Drain Crypto Wallets
Socket uncovered four malicious npm packages that exfiltrate up to 85% of a victim’s Ethereum or BSC wallet using obfuscated JavaScript.
Supply Chain Security
Vulnerability
Quality
Maintenance
License
A Python library for string distance calculations that account for common OCR (optical character recognition) errors.
Documentation: https://niklasvonm.github.io/ocr-stringdist/
OCR-StringDist provides specialized string distance algorithms that accommodate for optical character recognition (OCR) errors. Unlike traditional string comparison algorithms, OCR-StringDist considers common OCR confusions (like "0" vs "O", "6" vs "G", etc.) when calculating distances between strings.
Note: This project is in early development. APIs may change in future releases.
pip install ocr-stringdist
find_best_candidate
to efficiently find the best matching string from a collection of candidates using any specified distance function (including the library's OCR-aware ones).import ocr_stringdist as osd
# Using default OCR distance map
distance = osd.weighted_levenshtein_distance("OCR5", "OCRS")
print(f"Distance between 'OCR5' and 'OCRS': {distance}") # Will be less than 1.0
# Custom cost map
custom_map = {("In", "h"): 0.5}
distance = osd.weighted_levenshtein_distance(
"hi", "Ini",
cost_map=custom_map,
symmetric=True,
)
print(f"Distance with custom map: {distance}")
import ocr_stringdist as osd
s = "apple"
candidates = ["apply", "apples", "orange", "appIe"] # 'appIe' has an OCR-like error
def ocr_aware_distance(s1: str, s2: str) -> float:
return osd.weighted_levenshtein_distance(s1, s2, cost_map={("l", "I"): 0.1})
best_candidate, best_dist = osd.find_best_candidate(s, candidates, ocr_aware_distance)
print(f"Best candidate for '{s}' is '{best_candidate}' with distance {best_dist}")
# Output: Best candidate for 'apple' is 'appIe' with distance 0.1
This project is inspired by jellyfish, providing the base implementations of the algorithms used here.
FAQs
String distances considering OCR errors.
We found that ocr-stringdist demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
Socket uncovered four malicious npm packages that exfiltrate up to 85% of a victim’s Ethereum or BSC wallet using obfuscated JavaScript.
Security News
TC39 advances 9 JavaScript proposals, including Array.fromAsync, Error.isError, and Explicit Resource Management, which are now headed into the ECMAScript spec.
Security News
Vite releases Rolldown-Vite, a Rust-based bundler preview offering faster builds and lower memory usage as a drop-in replacement for Vite.