Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

perdido

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

perdido

PERDIDO Geoparser python library

  • 0.1.50
  • PyPI
  • Socket score

Maintainers
1

Perdido Geoparser Python library

PyPI PyPI - License PyPI - Python Version

Installation

To install the latest stable version, you can use:

pip install --upgrade perdido

Quick start

Geoparsing

Binder Open In Colab

Import
from perdido.geoparser import Geoparser
Run geoparser
text = "J'ai rendez-vous proche de la place Bellecour, de la place des Célestins, au sud de la fontaine des Jacobins et près du pont Bonaparte."
geoparser = Geoparser()
doc = geoparser(text)

Some parameters can be set when initializing the Geoparser object:

  • version: Standard (default), Encyclopedie
  • pos_tagger: spacy (default), stanza, and treetagger
Get tokens
  • Access token attributes (text, lemma and UPOS part-of-speech tag):
for token in doc:
    print(f'{token.text}\tlemma: {token.lemma}\tpos: {token.pos}')
  • Get the IOB format:
for token in doc:
    print(token.iob_format())
  • Get a TSV-IOB format:
for token in doc:
    print(token.tsv_format())
Print the XML-TEI output
print(doc.tei)
Print the XML-TEI output with XML syntax highlighting
from display_xml import XML
XML(doc.tei, style='lovelace')
Print the GeoJSON output
print(doc.geojson)
Get the list of named entities
for entity in doc.named_entities:
    print(f'entity: {entity.text}\ttag: {entity.tag}')
    if entity.tag == 'place':
        for t in entity.toponym_candidates:
            print(f' latitude: {t.lat}\tlongitude: {t.lng}\tsource {t.source}')
Get the list of nested named entities
for nested_entity in doc.nested_named_entities:
    print(f'entity: {nested_entity.text}\ttag: {nested_entity.tag}')
    if nested_entity.tag == 'place':
        for t in nested_entity.toponym_candidates:
            print(f' latitude: {t.lat}\tlongitude: {t.lng}\tsource {t.source}')
Get the list of spatial relations
for sp_relation in doc.sp_relations:
    print(f'spatial relation: {sp_relation.text}\ttag: {sp_relation.tag}')
Shows named entities and nested named entities using the displacy library from spaCy
displacy.render(doc.to_spacy_doc(), style="ent", jupyter=True)
displacy.render(doc.to_spacy_doc(), style="span", jupyter=True)
Display the map (using folium library)
doc.get_folium_map()
Saving results
doc.to_xml('filename.xml')
doc.to_geojson('filename.geojson')
doc.to_iob('filename.tsv')
doc.to_csv('filename.csv')

Geocoding

Binder Open In Colab

Import
from perdido.geocoder import Geocoder
Geocode a single place name
geocoder = Geocoder()
doc = geocoder('Lyon')

Some parameters can be set when initializing the Geocoder object:

  • sources:
  • max_rows:
  • country_code:
  • bbox:
Geocode a list of place names
geocoder = Geocoder()
doc = geocoder(['Lyon', 'la place des Célestins', 'la fontaine des Jacobins'])
Get the geojson result
print(doc.geojson)
Get the list of toponym candidates
for t in doc.toponyms: 
    print(f'lat: {t.lat}\tlng: {t.lng}\tsource {t.source}\tsourceName {t.source_name}')
Get the toponym candidates as a GeoDataframe
print(doc.to_geodataframe())

Perdido Geoparser REST APIs

http://choucas.univ-pau.fr/docs#

Example: call REST API in Python

import requests

url = 'http://choucas.univ-pau.fr/PERDIDO/api/'
service = 'geoparsing'
data = {'content': 'Je visite la ville de Lyon, Annecy et le Mont-Blanc.'}
parameters = {'api_key': 'demo'}

r = requests.post(url+service, params=parameters, json=data)

print(r.text)

Tutorials

Cite this work

Moncla, L. and Gaio, M. (2023). Perdido: Python library for geoparsing and geocoding French texts. In proceedings of the First International Workshop on Geographic Information Extraction from Texts (GeoExT'23), ECIR Conference, Dublin, Ireland.

Acknowledgements

Perdido is an active project still under developpement.

This work was partially supported by the following projects:

Keywords

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc