match-mate
A Ruby gem for matching demographics data.
Installation
Prerequisites
Python / FuzzyWuzzy
match-mate uses FuzzyWuzzy, a Python library that uses Levenshtein Distance to calculate the differences between sequences. It has some features not currently found in any Ruby gems, particularily the ability to ignore word order and duplicated words.
Thus, it currently requires Python version 2.7 or higher, as well as
To install FuzzyWuzzy:
On Ubuntu/Debian
sudo apt install python python-pip
pip install fuzzywuzzy[speedup]
On CentOS/RHEL
sudo yum install python python-pip
pip install fuzzywuzzy[speedup]
On MacOS
Python comes pre-installed on MacOS.
pip install fuzzywuzzy[speedup]
libpostal
match-mate uses ruby_postal, a gem that provides Ruby bindings to libpostal for fast street address parsing and normalization.
Before you install, make sure you have the following prerequisites:
On Ubuntu/Debian
sudo apt-get install curl autoconf automake libtool pkg-config
On CentOS/RHEL
sudo yum install curl autoconf automake libtool pkgconfig
On MacOS
brew install curl autoconf automake libtool pkg-config
Then to install the C library:
git clone https://github.com/openvenues/libpostal
cd libpostal
./bootstrap.sh
./configure --datadir=[...some dir with a few GB of space...]
make -j4
sudo make install
# On Linux it's probably a good idea to run
sudo ldconfig
Install the Gem
To install using Bundler:
gem 'match-mate', github: 'combinaut/match-mate'
Configuration
Create an initializer and configure your Python path and match weights
MatchMate.configure do |config|
config.python_path = '/usr/local/bin/python'
config.address_weights = {
road: {
weight: 30,
threshold: 80
},
unit: {
weight: 2,
threshold: 80
},
postcode: {
weight: 30,
threshold: 100
},
city: {
weight: 8,
threshold: 80
},
house_number: {
weight: 30,
threshold: 100
}
}
end
Usage
Address Matcher
Compare two addresses and get similarity score:
address = MatchMate::Address.new "742 Evergreen Terrace Springfield IL 62704"
other_address = MatchMate::Address.new "742 Evergreen Springfield IL 62704"
matcher = MatchMate::AddressMatcher.new(address, other_address)
matcher.score