
Security News
TC39 Advances 11 Proposals for Math Precision, Binary APIs, and More
TC39 advances 11 JavaScript proposals, with two moving to Stage 4, bringing better math, binary APIs, and more features one step closer to the ECMAScript spec.
Python package for:
Python GateNLP is a natural language processing (NLP) and text processing framework implemented in Python.
It provides very flexible representations of documents, stand-off annotations with arbitrary types and features, grouped into arbitrary annotations sets, spans, corpora, annotators, pipelines and more. Documents, annotations and corpora can be easily and interactively visualized in notebooks. It provides the ability to use existing NLP tools for annotating documents out of the box: Spacy, Stanza as well as online services like Gate Cloud, ELG, Google NLP, IBM Watson and others. The result of these tools get represented as GateNLP annotations making it easy to write code that works with all of these tools in the same way or compares or combines the results of these tools.
In addition, GateNLP provides its own annotator tools: string-based and token based gazetteers, regular-expression-based annotators, and a very powerful and flexible rule-based annotator (PAMPAC) which allows to match complex pattern of annotations and text.
If you find bugs, want to requrest a feature or change, please use the issue tracker
For more general discussions about the package and communication within current and future users, please use the Dicussions
Python GateNLP is an NLP and text processing framework implemented in Python.
Python GateNLP represents documents and stand-off annotations very similar to the Java GATE framework: Annotations describe arbitrary character ranges in the text and each annotation can have an arbitrary number of features. Documents can have arbitrary features and an arbitrary number of named annotation sets, where each annotation set can have an arbitrary number of annotations which can overlap in any way. Python GateNLP documents can be exchanged with Java GATE by using the bdocjs/bdocym/bdocmp formats which are supported in Java GATE via the Format Bdoc Plugin
Other than many other Python NLP tools, GateNLP does not require a specific way of how text is split up into tokens, tokens can be represented by annotations in any way, and a document can have different ways of tokenization simoultanously, if needed. Similarly, entities can be represented by annotations without restriction: they do not need to start or end at token boundaries and can overlap arbitrarily.
GateNLP provides ways to process text and create annotations using annotating pipelines, which are sequences of one or more annotators. There are annotators for matching text against gazetteer lists and annotators for complex matching of annotation and text sequences (see PAMPAC).
There is also support for creating GateNLP annotations with other NLP packages like Spacy or Stanford Stanza.
The GateNLP document representation also optionally allows to track all changes
done to the document in a "change log" (a gatenlp.ChangeLog
instance).
Such changes can later be applied to other Python GateNLP or to Java GATE documents.
This library also implements the functionality for the interaction with a Java GATE process in two different ways:
If you have a cloned copy, you need to rename it in your local copy as well:
git branch -m master main
git fetch origin
git branch -u origin/main main
NOTE: The previous Pypi project "gatenlp" has moved to gatenlphiltlab
FAQs
GATE NLP implementation in Python.
We found that gatenlp demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 3 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
TC39 advances 11 JavaScript proposals, with two moving to Stage 4, bringing better math, binary APIs, and more features one step closer to the ECMAScript spec.
Research
/Security News
A flawed sandbox in @nestjs/devtools-integration lets attackers run code on your machine via CSRF, leading to full Remote Code Execution (RCE).
Product
Customize license detection with Socket’s new license overlays: gain control, reduce noise, and handle edge cases with precision.