GreynirEngine
A fast, efficient natural language processing engine for Icelandic
Overview
Greynir is a Python 3 (>=3.9) package,
published by Miðeind ehf., for
working with Icelandic natural language text.
Greynir can parse text into sentence trees, find lemmas,
inflect noun phrases, assign part-of-speech tags and much more.
Greynir's sentence trees can inter alia be used to extract
information from text, for instance about people, titles, entities, facts,
actions and opinions.
Full documentation for Greynir is available here.
Greynir is the engine of Greynir.is,
a natural-language front end for a database of over 10 million
sentences parsed from Icelandic news articles, and
Embla, a voice-driven virtual assistant app
for smart devices such as iOS and Android phones.
Greynir includes a hand-written
context-free grammar
for the Icelandic language, consisting of over 7,000 lines of grammatical
productions in extended Backus-Naur format.
Its fast C++ parser core is able to cope with long and ambiguous sentences,
using an Earley-type parser
as enhanced by Scott and Johnstone.
Greynir employs the Tokenizer package,
by the same authors, to tokenize text, and
uses BinPackage as its database of
Icelandic vocabulary and morphology.
Examples
Use Greynir to easily inflect noun phrases
from reynir import NounPhrase as Nl
karfa = Nl("þrír lúxus-miðar á Star Wars og tveir brimsaltir pokar af poppi")
print(f"Þú keyptir {karfa:þf}.")
print(f"Hér er kvittunin þín fyrir {karfa:þgf}.")
The program outputs the following text, correctly inflected:
Þú keyptir þrjá lúxus-miða á Star Wars og tvo brimsalta poka af poppi.
Hér er kvittunin þín fyrir þremur lúxus-miðum á Star Wars og tveimur brimsöltum pokum af poppi.
Use Greynir to parse a sentence
>>> from reynir import Greynir
>>> g = Greynir()
>>> sent = g.parse_single("Ása sá sól.")
>>> print(sent.tree.view)
P
+-S-MAIN
+-IP
+-NP-SUBJ
+-no_et_nf_kvk: 'Ása'
+-VP
+-VP
+-so_1_þf_et_p3: 'sá'
+-NP-OBJ
+-no_et_þf_kvk: 'sól'
+-'.'
>>> sent.tree.nouns
['Ása', 'sól']
>>> sent.tree.verbs
['sjá']
>>> sent.tree.flat
'P S-MAIN IP NP-SUBJ no_et_nf_kvk /NP-SUBJ VP so_1_þf_et_p3
NP-OBJ no_et_þf_kvk /NP-OBJ /VP /IP /S-MAIN p /P'
>>>
>>> sent.tree.S.IP.NP_SUBJ.lemmas
['Ása']
>>>
>>> sent.tree.S.IP.VP.lemmas
['sjá', 'sól']
>>>
>>> sent.tree.S.IP.VP.NP_OBJ.lemmas
['sól']
Prerequisites
This package runs on CPython 3.9 or newer, and on PyPy 3.9 or newer.
To find out which version of Python you have, enter:
python --version
If a binary wheel package isn't available on PyPI
for your system, you may need to have the python3-dev
package
(or its Windows equivalent) installed on your
system to set up Greynir successfully. This is
because a source distribution install requires a C++ compiler and linker:
sudo apt-get install python3-dev
Depending on your system, you may also need to install libffi-dev
:
sudo apt-get install libffi-dev
Installation
To install this package, assuming Python 3 is your default Python:
pip install reynir
If you have git installed and want to be able to edit
the source, do like so:
git clone https://github.com/mideind/GreynirEngine
cd GreynirEngine
pip install -e .
The package source code is in GreynirEngine/src/reynir
.
Tests
To run the built-in tests, install pytest,
cd
to your GreynirEngine
subdirectory (and optionally activate your
virtualenv), then run:
python -m pytest
Evaluation
A parsing test pipeline for different parsing schemas, including the Greynir schema,
has been developed. It is available here.
Documentation
Please consult Greynir's documentation for detailed
installation instructions,
a quickstart guide,
and reference information,
as well as important information about
copyright and licensing.
Troubleshooting
If parsing seems to hang, it is possible that a lock file that GreynirEngine
uses has been left locked. This can happen if a Python process that uses
GreynirEngine is killed abruptly. The solution is to delete the lock file
and try again:
On Linux and macOS:
rm /tmp/greynir-grammar
On Windows:
del %TEMP%\greynir-grammar
Copyright and licensing
Greynir is Copyright © 2016-2024 by Miðeind ehf..
The original author of this software is Vilhjálmur Þorsteinsson.
This software is licensed under the MIT License:
Permission is hereby granted, free of charge, to any person
obtaining a copy of this software and associated documentation
files (the "Software"), to deal in the Software without restriction,
including without limitation the rights to use, copy, modify, merge,
publish, distribute, sublicense, and/or sell copies of the Software,
and to permit persons to whom the Software is furnished to do so,
subject to the following conditions:
The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
If you would like to use this software in ways that are incompatible
with the standard MIT license, contact Miðeind ehf.
to negotiate custom arrangements.
GreynirEngine indirectly embeds the Database of Icelandic Morphology,
(Beygingarlýsing íslensks nútímamáls), abbreviated BÍN.
GreynirEngine does not claim any endorsement by the BÍN authors or copyright holders.
The BÍN source data are publicly available under the
CC BY-SA 4.0 license, as further
detailed here in English
and here in Icelandic.
In accordance with the BÍN license terms, credit is hereby given as follows:
Beygingarlýsing íslensks nútímamáls. Stofnun Árna Magnússonar í íslenskum fræðum.
Höfundur og ritstjóri Kristín Bjarnadóttir.