Proxihash
A GeoHash implementation geared toward indexing data for performing
proximity searches at different radii.
The algorithm underlying Proxihash is the GeoHash algorithm, however the hashes
are returned offer the bit string as a numeric value, rather than a base-32
representation. This allows bit-precision, rather than 5-bit precision, which
comes in handy for variable-distance proximity searches.
Proxihash computes for you the tileset required for a given radius search from a
given point, within the specified bit precision range. This range lets you limit
your index to certain precisions, which reflects the search distances you need
to support.
Usage
Create a proxihash with 31 bits of precision:
proxihash = Proxihash.encode(42.6, -5.6, 31)
Get the numeric value, for storage:
proxihash.value # 939008154
proxihash.num_bits # 31
Compute the tile (bounding box) represented by the proxihash:
proxihash.tile # [42.5994873046875, 42.60498046875, -5.60302734375, -5.5975341796875]
Find neighbors at the same precision:
proxihash.neighbor(-1, 1) # tile to the south-east
proxihash.neighbor( 1, -1) # tile to the north-west
# Alternatively:
proxihash.north # same as proxihash.neighbor( 1, 0)
proxihash.east # same as proxihash.neighbor( 0, 1)
proxihash.west # same as proxihash.neighbor( 0, -1)
proxihash.south # same as proxihash.neighbor(-1, 0)
Find the midpoint of the tile:
proxihash.decode # [42.60223388671875, -5.60028076171875]
Note that roundtripping through encode
then decode
may not give you the same
initial point, as decoding gives you the center of the tile. It will give you
a nearby point, however. Roundtripping through decode
then encode
will give
you back the same tile.
Indexing
The idea is to index your data with proxihashes at the precisions you care
about. These precisions depend on the search distances you need to support.
Consider that the radius of the earth is about 40,036 km, (24,873 miles). This
means 16 bits of longitude (16 + 15 = 31 bit proxihash) will give you
sub-kilometer precision. Sub-mile precision requires 29 bits. These fit nicely
into a 32-bit integer.
Searching
Once you've indexed all your data by their proxihash at the precisions you care
about, you probably want to answer a query like "find all users within 25 miles
of (lat,lng)
". To do this, you have to find the tiles to search for:
Proxihash.radius = :earth_in_miles
Proxihash.search_tiles(lat, lng, 25, min_bits: 11, max_bits: 31)
The first line sets the units to use (default is kilometers).
The second returns up to 4 tiles (proxihashes): the tile that (lat,lng)
falls
on, plus the 3 neighbors that need to be searched. The 3 neighbors depend on
which quadrant of the center tile (lat,lng)
falls on. The tile size is the
smallest that will cover the 25 mile search circle.
The precision of the tiles (proxihashes) computed depends on two things: the
distance, and the latitude.
Larger tiles (shorter proxihashes) are required for larger distances and
latitudes nearer the poles; smaller tiles (longer proxihashes) for shorter
distances and latitudes nearer the equator. The latitude makes a difference
because tiles nearer the poles are smaller, and so a circle of a given radius
can cover more tiles of the same angular size near the poles than near the
equator.
There is one caveat that arises from this: Finding neighboring tiles degenerates
toward the poles. If the search circle overlaps a pole, or finding a neighboring
tile requires crossing a pole, a PoleWrapException
is raised.
The :max_bits
option prevents you from computing an arbitrarily precise
proxihash if you take the distance from user input. It will simply cap the
precision of the proxihashes returned. The :min_bits
option makes
search_tiles
return nil if a larger tile is required. search_tiles
currently
does not return more than 4 tiles to accomodate a smaller tileset.
Contributing
- Bug reports
- Source
- Patches: Fork on Github, send pull request.
- Include tests where practical.
- Leave the version alone, or bump it in a separate commit.
Copyright
Copyright (c) George Ogata. See LICENSE for details.