Security News
tea.xyz Spam Plagues npm and RubyGems Package Registries
Tea.xyz, a crypto project aimed at rewarding open source contributions, is once again facing backlash due to an influx of spam packages flooding public package registries.
spacy-nlp
Advanced tools
Readme
Expose Spacy nlp text parsing to Nodejs (and other languages) via socketIO
# install spacy in python3
python3 -m pip install -U socketIO-client
python3 -m pip install -U spacy==2.1.3
python3 -m spacy download en_core_web_md
# install this npm package
npm i --save spacy-nlp
const spacyNLP = require("spacy-nlp");
// default port 6466
// start the server with the python client that exposes spacyIO (or use an existing socketIO server at IOPORT)
var serverPromise = spacyNLP.server({ port: process.env.IOPORT });
// Loading spacy may take up to 15s
Note that python3
is preferred. If you use python2
, at each run set the env var USE_PY2=true
.
You'll see log like:
[Sun Oct 09 2016 16:53:33 GMT-0400 (EDT)] INFO Starting poly-socketio server on port: 6466, expecting 1 IO clients
[Sun Oct 09 2016 16:53:33 GMT-0400 (EDT)] INFO Starting socketIO client for python3 at 6466
[Sun Oct 09 2016 16:53:44 GMT-0400 (EDT)] DEBUG cgkb-py mXjDqupv852zUeMPAAAA joined, 0 remains
[Sun Oct 09 2016 16:53:44 GMT-0400 (EDT)] INFO All 1 IO clients have joined
Since it uses poly-socketio
, there'll be one IO server, and one global.client
(internal to this module) in the same process, no matter how many times poly-socketio
is called. This resolves conflicts for cross-project usage.
E.g. AIVA
uses poly-socketio
to start a server for its internal cross-language communication, and uses spacy-nlp
too. spacy-nlp
will automatically use the IO server and the global.client
from AIVA
.
Once it is ready, i.e. you can use the nodejs client nlp
to parse texts:
const spacyNLP = require("spacy-nlp");
const nlp = spacyNLP.nlp;
// Note you can pass multiple sentences concat in one string.
nlp.parse("Bob Brought the pizza to Alice.").then(output => {
console.log(output);
console.log(JSON.stringify(output[0].parse_tree, null, 2));
});
// Store output into variable
const result = await nlp.parse("Bob Brought the pizza to Alice.");
And the output is the syntax parse tree with POS tagging. For the parse_tree
, NE
means Named Entity
for NER; arc
of an object is incident on it. An arc points from head
word to modifier
word. See the explanation on Tensorflow/syntaxnet.
[ { text: 'Bob Brought the pizza to Alice.',
len: 7,
tokens: [ 'Bob', 'Brought', 'the', 'pizza', 'to', 'Alice', '.' ],
noun_phrases: [ 'Bob', 'the pizza', 'Alice' ],
parse_tree:
[ { word: 'Brought',
lemma: 'bring',
NE: '',
POS_fine: 'VBD',
POS_coarse: 'VERB',
arc: 'ROOT',
modifiers:
[ { word: 'Bob',
lemma: 'Bob',
NE: 'PERSON',
POS_fine: 'NNP',
POS_coarse: 'PROPN',
arc: 'nsubj',
modifiers: [] },
{ word: 'pizza',
lemma: 'pizza',
NE: '',
POS_fine: 'NN',
POS_coarse: 'NOUN',
arc: 'dobj',
modifiers:
[ { word: 'the',
lemma: 'the',
NE: '',
POS_fine: 'DT',
POS_coarse: 'DET',
arc: 'det',
modifiers: [] } ] },
{ word: 'to',
lemma: 'to',
NE: '',
POS_fine: 'IN',
POS_coarse: 'ADP',
arc: 'prep',
modifiers:
[ { word: 'Alice',
lemma: 'Alice',
NE: 'PERSON',
POS_fine: 'NNP',
POS_coarse: 'PROPN',
arc: 'pobj',
modifiers: [] } ] },
{ word: '.',
lemma: '.',
NE: '',
POS_fine: '.',
POS_coarse: 'PUNCT',
arc: 'punct',
modifiers: [] } ] } ],
parse_list:
[ { word: 'Bob',
lemma: 'Bob',
NE: 'PERSON',
POS_fine: 'NNP',
POS_coarse: 'PROPN' },
{ word: 'Brought',
lemma: 'bring',
NE: '',
POS_fine: 'VBD',
POS_coarse: 'VERB' },
{ word: 'the',
lemma: 'the',
NE: '',
POS_fine: 'DT',
POS_coarse: 'DET' },
{ word: 'pizza',
lemma: 'pizza',
NE: '',
POS_fine: 'NN',
POS_coarse: 'NOUN' },
{ word: 'to',
lemma: 'to',
NE: '',
POS_fine: 'IN',
POS_coarse: 'ADP' },
{ word: 'Alice',
lemma: 'Alice',
NE: 'PERSON',
POS_fine: 'NNP',
POS_coarse: 'PROPN' },
{ word: '.',
lemma: '.',
NE: '',
POS_fine: '.',
POS_coarse: 'PUNCT' } ] } ]
// Available options are count (returns the total count) and text (returns the parsed strings) You can specify one or both.
const options = ["count"];
// Note you can pass multiple sentences concat in one string.
nlp
.parse_nouns(
"On 22 June 1941, the European Axis powers launched an invasion of the Soviet Union, opening the largest land theatre of war in history, which trapped the Axis, most crucially the German Wehrmacht, into a war of attrition. World War II (often abbreviated to WWII or WW2), also known as the Second World War, was a global war that lasted from 1939 to 1945.",
options
)
.then(output => {
console.log(output);
});
// Store output into variable
const result = await nlp.parse_nouns(
"On 22 June 1941, the European Axis powers launched an invasion of the Soviet Union, opening the largest land theatre of war in history, which trapped the Axis, most crucially the German Wehrmacht, into a war of attrition. World War II (often abbreviated to WWII or WW2), also known as the Second World War, was a global war that lasted from 1939 to 1945.",
options
);
// 19
// Available options are count (returns the total count) and text (returns the parsed strings) You can specify one or both.
const options = ["count"];
// Note you can pass multiple sentences concat in one string.
nlp
.parse_verbs(
"On 22 June 1941, the European Axis powers launched an invasion of the Soviet Union, opening the largest land theatre of war in history, which trapped the Axis, most crucially the German Wehrmacht, into a war of attrition. World War II (often abbreviated to WWII or WW2), also known as the Second World War, was a global war that lasted from 1939 to 1945.",
options
)
.then(output => {
console.log(output);
});
// Store output into variable
const result = await nlp.parse_verbs(
"On 22 June 1941, the European Axis powers launched an invasion of the Soviet Union, opening the largest land theatre of war in history, which trapped the Axis, most crucially the German Wehrmacht, into a war of attrition. World War II (often abbreviated to WWII or WW2), also known as the Second World War, was a global war that lasted from 1939 to 1945.",
options
);
// 7
// Available options are count (returns the total count) and text (returns the parsed strings) You can specify one or both.
const options = ["count"];
// Note you can pass multiple sentences concat in one string.
nlp
.parse_adj(
"On 22 June 1941, the European Axis powers launched an invasion of the Soviet Union, opening the largest land theatre of war in history, which trapped the Axis, most crucially the German Wehrmacht, into a war of attrition. World War II (often abbreviated to WWII or WW2), also known as the Second World War, was a global war that lasted from 1939 to 1945.",
options
)
.then(output => {
console.log(output);
});
// Store output into variable
const result = await nlp.parse_adj(
"On 22 June 1941, the European Axis powers launched an invasion of the Soviet Union, opening the largest land theatre of war in history, which trapped the Axis, most crucially the German Wehrmacht, into a war of attrition. World War II (often abbreviated to WWII or WW2), also known as the Second World War, was a global war that lasted from 1939 to 1945.",
options
);
// 8
// Available options are count (returns the total count) and text (returns the parsed strings) You can specify one or both.
const options = ["count"];
// Note you can pass multiple sentences concat in one string.
nlp
.parse_named_entities(
"On 22 June 1941, the European Axis powers launched an invasion of the Soviet Union, opening the largest land theatre of war in history, which trapped the Axis, most crucially the German Wehrmacht, into a war of attrition. World War II (often abbreviated to WWII or WW2), also known as the Second World War, was a global war that lasted from 1939 to 1945.",
options
)
.then(output => {
console.log(output);
});
// Store output into variable
const result = await nlp.parse_named_entities(
"On 22 June 1941, the European Axis powers launched an invasion of the Soviet Union, opening the largest land theatre of war in history, which trapped the Axis, most crucially the German Wehrmacht, into a war of attrition. World War II (often abbreviated to WWII or WW2), also known as the Second World War, was a global war that lasted from 1939 to 1945.",
options
);
// 8
// Available options are count (returns the total count) and text (returns the parsed strings) You can specify one or both.
const options = ["text"];
// Note you can pass multiple sentences concat in one string.
nlp
.parse_date(
"On 22 June 1941, the European Axis powers launched an invasion of the Soviet Union, opening the largest land theatre of war in history, which trapped the Axis, most crucially the German Wehrmacht, into a war of attrition. World War II (often abbreviated to WWII or WW2), also known as the Second World War, was a global war that lasted from 1939 to 1945.",
options
)
.then(output => {
console.log(output);
});
// Store output into variable
const result = await nlp.parse_date(
"On 22 June 1941, the European Axis powers launched an invasion of the Soviet Union, opening the largest land theatre of war in history, which trapped the Axis, most crucially the German Wehrmacht, into a war of attrition. World War II (often abbreviated to WWII or WW2), also known as the Second World War, was a global war that lasted from 1939 to 1945.",
options
);
// ['22 June 1941', 'from 1939 to 1945']
// Available options are count (returns the total count) and text (returns the parsed strings) You can specify one or both.
const options = ["count"];
// Note you can pass multiple sentences concat in one string.
nlp
.parse_time(
"On 22 June 1941, the European Axis powers launched an invasion of the Soviet Union, opening the largest land theatre of war in history, which trapped the Axis, most crucially the German Wehrmacht, into a war of attrition. World War II (often abbreviated to WWII or WW2), also known as the Second World War, was a global war that lasted from 1939 to 1945.",
options
)
.then(output => {
console.log(output);
});
// Store output into variable
const result = await nlp.parse_time(
"On 22 June 1941, the European Axis powers launched an invasion of the Soviet Union, opening the largest land theatre of war in history, which trapped the Axis, most crucially the German Wehrmacht, into a war of attrition. World War II (often abbreviated to WWII or WW2), also known as the Second World War, was a global war that lasted from 1939 to 1945.",
options
);
// 0
The following helper functions are not asynchronous and will not return a promise.
If you have very large text to process, it's best to split the text as Spacy has a max_length limit of 1,000,000 characters.
const text =
"On 22 June 1941, the European Axis powers launched an invasion of the Soviet Union, opening the largest land theatre of war in history, which trapped the Axis, most crucially the German Wehrmacht, into a war of attrition. World War II (often abbreviated to WWII or WW2), also known as the Second World War, was a global war that lasted from 1939 to 1945.";
const textArray = nlp.split_text(text);
// ["On 22 June 1941, the European Axis powers launched an invasion of the Soviet Union, opening the largest land theatre of", "war in history, which trapped the Axis, most crucially the German Wehrmacht, into a war of attrition. World War II (often", "abbreviated to WWII or WW2), also known as the Second World War, was a global war that lasted from 1939 to 1945."]
If you want to return an array of words, the result will include duplicate strings. To remove duplicates you can use nlp.remove_duplicates
.
const text =
"On 22 June 1941, the European Axis powers launched an invasion of the Soviet Union, opening the largest land theatre of war in history, which trapped the Axis, most crucially the German Wehrmacht, into a war of attrition. World War II (often abbreviated to WWII or WW2), also known as the Second World War, was a global war that lasted from 1939 to 1945.";
const verbArray = await nlp.parse_verbs(text);
// ["war", "war", "war", "war", "war", "world", "world", "world", "axis", "axis", "ww2", "wwii", "land", "wehrmacht", "union", "powers", "attrition"]
const result = nlp.remove_duplicates(verbArray);
// ["war", "world", "axis", "ww2", "wwii", "land", "wehrmacht", "union", "powers", "attrition"]
const text =
"On 22 June 1941, the European Axis powers launched an invasion of the Soviet Union, opening the largest land theatre of war in history, which trapped the Axis, most crucially the German Wehrmacht, into a war of attrition. World War II (often abbreviated to WWII or WW2), also known as the Second World War, was a global war that lasted from 1939 to 1945.";
// Arguments are text and cutoff (Top n Words). Returns an array of objects.
const result = nlp.top_words(text, 5);
[
{ word: "the", count: 6 },
{ word: "of", count: 3 },
{ word: "war", count: 3 },
{ word: "Axis", count: 2 },
{ word: "a", count: 2 }
];
FAQs
Expose Spacy nlp text parsing to Nodejs (and other languages) via socketIO
The npm package spacy-nlp receives a total of 55 weekly downloads. As such, spacy-nlp popularity was classified as not popular.
We found that spacy-nlp demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Tea.xyz, a crypto project aimed at rewarding open source contributions, is once again facing backlash due to an influx of spam packages flooding public package registries.
Security News
As cyber threats become more autonomous, AI-powered defenses are crucial for businesses to stay ahead of attackers who can exploit software vulnerabilities at scale.
Security News
UnitedHealth Group disclosed that the ransomware attack on Change Healthcare compromised protected health information for millions in the U.S., with estimated costs to the company expected to reach $1 billion.