Research
Security News
Malicious npm Packages Inject SSH Backdoors via Typosquatted Libraries
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
JavaScript interface for accessing linguistic annotations provided by spaCy. This project is mostly experimental and was developed for fun to play around with different ways of mimicking spaCy's Python API.
The results will still be computed in Python and made available via a REST API. The JavaScript API resembles spaCy's Python API as closely as possible (with a few exceptions, as the values are all pre-computed and it's tricky to express complex recursive relationships).
const spacy = require('spacy');
(async function() {
const nlp = spacy.load('en_core_web_sm');
const doc = await nlp('This is a text about Facebook.');
for (let ent of doc.ents) {
console.log(ent.text, ent.label)
}
for (let token of doc) {
console.log(token.text, token.pos, token.head.text);
}
})();
You can install the JavaScript package via npm:
npm install spacy
Alternatively, you can also include the .js
file:
<script src=""></script>
First, clone this repo and install the requirements. If you've installed the
package via npm, you can also use the app.py
and requirements.txt
in your
./node_modules/spacy
directory. It's recommended to use a virtual environment.
pip install -r requirements.txt
If you like, you can install more models and add them
to the list of models in app.py
. You can then run the REST API. By default,
this will serve the API via localhost:8080
:
python app.py
spacy.load
"Load" a spaCy model. This method mostly exists for consistency with the Python
API. It mostly sets up the REST API and nlp
object, but doesn't actually load
anything, since the models are already available via the REST API.
const nlp = spacy.load('en_core_web_sm');
Argument | Type | Description |
---|---|---|
model | String | Name of model to load, e.g. 'en_core_web_sm' . Needs to be available via the REST API. |
api | String | Alternative URL of REST API. Defaults to http://localhost:8080 . |
RETURNS | Language | The nlp object. |
nlp
asyncThe nlp
object created by spacy.load
can be called on a string of text
and makes a request to the REST API. The easiest way to use it is to wrap the
call in an async
function and use await
:
async function() {
const nlp = spacy.load('en_core_web_sm');
const doc = await nlp('This is a text.');
}
Argument | Type | Description |
---|---|---|
text | String | The text to process. |
RETURNS | Doc | The processed Doc . |
Doc
Just like in the original API, the Doc
object can
be constructed with an array of words
and spaces
. It also takes an
additional attrs
object, which corresponds to the JSON-serialized linguistic
annotations created in doc2json
in the app.py
.
The Doc
behaves just like the regular spaCy Doc
– you can iterate over its
tokens, index into individual tokens, access the Doc
attributes and properties
and also use native JavaScript methods like map
and slice
(since there's no
real way to make Python's slice notation like doc[2:4]
work).
import { Doc } from 'spacy/tokens';
const words = ['Hello', 'world', '!'];
const spaces = [true, false, false];
const doc = Doc(words, spaces)
console.log(doc.text) // 'Hello world!'
Argument | Type | Description |
---|---|---|
words | Array | The individual token texts. |
spaces | Array | Whether the token at this position is followed by a space or not. |
attrs | Object | JSON-serialized attributes, see doc2json . |
RETURNS | Doc | The newly constructed Doc . |
async function() {
const nlp = spacy.load('en_core_web_sm');
const doc = await nlp('Hello world');
for (let token of doc) {
console.log(token.text);
}
// Hello
// world
const token1 = doc[0];
console.log(token1.text);
// Hello
}
Name | Type | Description |
---|---|---|
text | String | The Doc text. |
length | Number | The number of tokens in the Doc . |
ents | Array | A list of Span objects, describing the named entities in the Doc . |
sents | Array | A list of Span objects, describing the sentences in the Doc . |
nounChunks | Array | A list of Span objects, describing the base noun phrases in the Doc . |
cats | Object | The document categories predicted by the text classifier, if available in the model. |
isTagged | Boolean | Whether the part-of-speech tagger has been applied to the Doc . |
isParsed | Boolean | Whether the dependency parser has been applied to the Doc . |
isSentenced | Boolean | Whether the sentence boundary detector has been applied to the Doc . |
Span
A Span
object is a slice of a Doc
and contains of one or more tokens. Just
like in the original API, it can be constructed
from a Doc
, a start and end index and an optional label, or by slicing a Doc
.
import { Doc, Span } from 'spacy/tokens';
const doc = Doc(['Hello', 'world', '!'], [true, false, false]);
const span = Span(doc, 1, 3);
console.log(span.text) // 'world!'
Argument | Type | Description |
---|---|---|
doc | Doc | The reference document. |
start | Number | The start token index. |
end | Number | The end token index. This is exclusive, i.e. "up to token X". |
label | String | Optional label. |
RETURNS | Span | The newly constructed Span . |
Name | Type | Description |
---|---|---|
text | String | The Span text. |
length | Number | The number of tokens in the Span . |
doc | Doc | The parent Doc . |
start | Number | The Span 's start index in the parent document. |
end | Number | The Span 's end index in the parent document. |
label | String | The Span 's label, if available. |
Token
For token attributes that exist as string and ID versions (e.g. Token.pos
vs.
Token.pos_
), only the string versions are exposed.
async function() {
const nlp = spacy.load('en_core_web_sm');
const doc = await nlp('Hello world');
for (let token of doc) {
console.log(token.text, token.pos, token.isLower);
}
// Hello INTJ false
// world NOUN true
}
Name | Type | Description |
---|---|---|
text | String | The token text. |
whitespace | String | Whitespace character following the token, if available. |
textWithWs | String | Token text with training whitespace. |
orth | Number | ID of the token text. |
doc | Doc | The parent Doc . |
head | Token | The syntactic parent, or "governor", of this token. |
i | Number | Index of the token in the parent document. |
entType | String | The token's named entity type. |
entIob | String | IOB code of the token's named entity tag. |
lemma | String | The token's lemma, i.e. the base form. |
norm | String | The normalised form of the token. |
lower | String | The lowercase form of the token. |
shape | String | Transform of the tokens's string, to show orthographic features. For example, "Xxxx" or "dd". |
prefix | String | A length-N substring from the start of the token. Defaults to N=1 . |
suffix | String | Length-N substring from the end of the token. Defaults to N=3 . |
pos | String | The token's coarse-grained part-of-speech tag. |
tag | String | The token's fine-grained part-of-speech tag. |
isAlpha | Boolean | Does the token consist of alphabetic characters? |
isAscii | Boolean | Does the token consist of ASCII characters? |
isDigit | Boolean | Does the token consist of digits? |
isLower | Boolean | Is the token lowercase? |
isUpper | Boolean | Is the token uppercase? |
isTitle | Boolean | Is the token titlecase? |
isPunct | Boolean | Is the token punctuation? |
isLeftPunct | Boolean | Is the token left punctuation? |
isRightPunct | Boolean | Is the token right punctuation? |
isSpace | Boolean | Is the token a whitespace character? |
isBracket | Boolean | Is the token a bracket? |
isCurrency | Boolean | Is the token a currency symbol? |
likeUrl | Boolean | Does the token resemble a URL? |
likeNum | Boolean | Does the token resemble a number? |
likeEmail | Boolean | Does the token resemble an email address? |
isOov | Boolean | Is the token out-of-vocabulary? |
isStop | Boolean | Is the token a stop word? |
isSentStart | Boolean | Does the token start a sentence? |
FAQs
JavaScript API for spaCy with Python REST API
The npm package spacy receives a total of 549 weekly downloads. As such, spacy popularity was classified as not popular.
We found that spacy demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
Security News
MITRE's 2024 CWE Top 25 highlights critical software vulnerabilities like XSS, SQL Injection, and CSRF, reflecting shifts due to a refined ranking methodology.
Security News
In this segment of the Risky Business podcast, Feross Aboukhadijeh and Patrick Gray discuss the challenges of tracking malware discovered in open source softare.