CoreNLP for NodeJS
This library helps making NodeJS/Web applications using the state-of-the-art technology for Natural Language Processing: Stanford CoreNLP.
It is compatible with the latest release of CoreNLP 3.9.0.
![Try corenlp on RunKit](https://badge.runkitcdn.com/corenlp.svg)
![NPM package](https://nodei.co/npm/corenlp.png)
This project is under active development, please stay tuned for updates. More documentation and examples are comming.
Example
Assuming that StanfordCoreNLPServer is running on http://localhost:9000
....
import CoreNLP, { Properties, Pipeline } from 'corenlp';
const props = new Properties({
annotators: 'tokenize,ssplit,pos,lemma,ner,parse',
});
const pipeline = new Pipeline(props, 'English');
const sent = new CoreNLP.simple.Sentence('The little dog runs so fast.');
pipeline.annotate(sent)
.then(sent => {
console.log('parse', sent.parse());
console.log(CoreNLP.util.Tree.fromSentence(sent).dump());
})
.catch(err => {
console.log('err', err);
});
API
Read the full API documentation.
Setup
1. Install the package:
npm i --save corenlp
2. Download Stanford CoreNLP
2.1. Shortcut (recommended to give this library a first try)
Via npm
, run this command from your own project after having installed this library:
npm explore corenlp -- npm run corenlp:download
Once downloaded you can easily start the server by running
npm explore corenlp -- npm run corenlp:server
Or you can manually download the project from the Stanford's CoreNLP download section at: https://stanfordnlp.github.io/CoreNLP/download.html
You may want to download, apart of the full package, other language models (see more on that page).
2.2. Via sources
For advanced projects, when you have to customize the library a bit more, we highly recommend to download the StanfordCoreNLP from the original repository, and compile the source code by using ant jar
.
NOTE: Some functionality included in this library, for TokensRegex
, Semgrex
and Tregex
, requires the latest version from that repository, which contains some fixes needed by this library, not included in the latest stable release.
3. Configure Stanford CoreNLP
There are two method to connect your NodeJS application to Stanford CoreNLP:
- HTTP is the preferred method since it requires CoreNLP to initialize just once to serve many requests, it also avoids extra I/O given that the CLI method need to write temporary files to run recommended.
- Via Command Line Interface, this is by spawning processes from your app.
3.1. Using StanfordCoreNLPServer
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000
CoreNLP connects by default via StanfordCoreNLPServer, using port 9000. You can also opt to setup the connection differently:
import CoreNLP, { Properties, Pipeline, ConnectorServer } from 'corenlp';
const connector = new ConnectorServer({ dsn: 'http://localhost:9000' });
const props = new Properties({
annotators: 'tokenize,ssplit,pos,lemma,ner,parse',
});
const pipeline = new Pipeline(props, 'English', connector);
3.2. Use CoreNLP via CLI
CoreNLP expects by default the StanfordCoreNLP package to be placed (unzipped) inside the path ${YOUR_NPM_PROJECT_ROOT}/corenlp/
. You can also opt to setup the CLI interface differently:
import CoreNLP, { Properties, Pipeline, ConnectorCli } from 'corenlp';
const connector = new ConnectorCli({
classPath: 'corenlp/stanford-corenlp-full-2017-06-09/*',
mainClass: 'edu.stanford.nlp.pipeline.StanfordCoreNLP',
props: 'StanfordCoreNLP-spanish.properties',
});
const props = new Properties({
annotators: 'tokenize,ssplit,pos,lemma,ner,parse',
});
const pipeline = new Pipeline(props, 'English', connector);
4. Usage
4.1 Pipeline
const props = new Properties({ annotators: 'tokenize,ssplit,lemma,pos,ner' });
const pipeline = new Pipeline(props, 'English', connector);
const sent = new CoreNLP.simple.Sentence('Hello world');
pipeline.annotate(sent)
.then(sent => {
console.log(sent.words());
console.log(sent.nerTags());
})
.catch(err => {
console.log('err', err);
});
4.2 Penn TreeBank traversing
const props = new Properties();
props.setProperty('annotators', 'tokenize,ssplit,pos,lemma,ner,parse');
const pipeline = new Pipeline(props, 'Spanish');
const sent = new CoreNLP.simple.Sentence('Jorge quiere cinco empanadas de queso y carne.');
pipeline.annotate(sent)
.then(sent => {
console.log('parse', sent.parse());
const tree = CoreNLP.util.Tree.fromSentence(sent);
tree.visitLeaves(node =>
console.log(node.word(), node.pos(), node.token().ner()));
console.log(tree.dump());
})
.catch(err => {
console.log('err', err);
});
4.3 TokensRegex, Tregex and Semgrex
const props = new Properties();
props.setProperty('annotators', 'tokenize,ssplit,regexner,depparse');
const expression = new CoreNLP.simple.Expression(
'John Snow eats snow.',
'{ner:PERSON}=who <nsubj ({pos:VBZ}=action >dobj {}=what)');
const pipeline = new Pipeline(props, 'English');
pipeline.annotateSemgrex(expression, true)
.then(expression => expression.sentence(0).matches().map(match => {
console.log('match', match.group('who'), match.group('action'), match.group('what'));
}))
.catch(err => {
console.log('err', err);
});
5. Client Side
This library is isomorphic, which means that works as well on a Browser. The API is exactly the same, and you can use it directly by requiring it via a <script>
tag, using AMD (RequireJS), or within your app bundle.
The browser ready version of corenlp
can be found as dist/index.browser.min.js
, once built (npm run build
).
See the examples folder for more details.
6. External Documentation
Properties
Pipeline
Service
ConnectorServer
ConnectorCli
CoreNLP
simple
Annotable
Annotator
Document
Sentence
Token
annotator
TokenizerAnnotator
WordsToSentenceAnnotator
POSTaggerAnnotator
MorphaAnnotator
NERClassifierCombiner
ParserAnnotator
DependencyParseAnnotator
RelationExtractorAnnotator
CorefAnnotator
SentimentAnnotator
RelationExtractorAnnotator
NaturalLogicAnnotator
QuoteAnnotator
util
Tree
7. References
This library is not maintained by StanfordNLP. However, it's based on and depends on StanfordNLP/CoreNLP to function.
Manning, Christopher D., Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55-60.