
Security News
Crates.io Users Targeted by Phishing Emails
The Rust Security Response WG is warning of phishing emails from rustfoundation.dev targeting crates.io users.
@geoffcox/pretty-good-nlp
Advanced tools
A simple natural language processing (NLP) recognizer you can use in minutes.
Pretty-good-nlp is a deterministic, match-based, recognizer for natural language processing (NLP) scenarios.
This readme covers installation and usage. You can read about the NLP concepts in the repository readme.
npm install @geoffcox/pretty-good-nlp
An intent made up of a set of examples.
const intent : Intent = {
name: 'Turn on oven',
examples: [];
};
Each example consists of an ordered set of parts to match.
const intent : Intent = {
name: 'Turn on oven',
examples: [
{
name: "Turn the oven on to 450 degrees for 2 hours",
parts: [],
},
//...
];
};
Add the parts in the order you expect them to be in the example. Each part can have literal phrases, patterns, and/or regular expressions.
const intent : Intent = {
name: 'Turn on oven',
examples: [
{
name: "Turn the oven on to 450 degrees for 2 hours",
parts: [
{ phrases: ["Turn on the oven to"] },
{ patterns: ["###"] },
{ phrases: ["degrees"] },
{ phrases: ["for"] },
{ regularExpressions: ["\\d+"] },
{ phrases: ["hours"] },
],
}
//...
];
};
Of course, you would have more than one phrase in most parts. As you think of variations that fit within the example format, add them.
phrases: ["Turn on the oven", "Turn the oven on", "Bake at", "Broil at"]
If you find a variation that doesn't fit, it might mean you need to define a new example. If you find that you are covering too many permutations of a phrase, you might need to break it up into more parts.
When a part with a variable name matches, the matched text is extract and returned as the value for that variable.
const intent : Intent = {
name: 'Turn on oven',
examples: [
{
name: "Turn the oven on to 450 degrees for 2 hours",
parts: [
{ phrases: ["Turn on the oven to"] },
{ patterns: ["###"], variable: "temperature" },
{ phrases: ["degrees"], variable: "temperatureUnit" },
{ phrases: ["for"] },
{ regularExpressions: ["\\d+"], variable: "duration" },
{ phrases: ["hours"], variable: "durationUnit" },
],
}
//...
];
};
The recognize method takes the text to recognize, the intent you created, and some options. The options are covered later in the advanced usage section.
function recognize(
text: string,
intent: Intent,
options?: RecognizeOptions
): IntentRecognition;
Recognize returns an IntentRecognition
.
It has the name of the intent, a recognition score, and a dictionary of extracted variable name/values. There is also a details object that contains more information specific to this recognizer.
export type IntentRecognition = {
name: string;
score: number;
variableValues: Record<string, string[]>;
details: {
examples: ExampleRecognition[];
textTokenMap: TokenMap;
};
About variable values:
About details:
There are words that indicate the opposite of an intent. You can handle these cases by adding parts to the example using the neverParts
property. If any of these part match then the example gets a score of 0.
const intent : Intent = {
name: 'Turn on oven',
examples: [
{
name: "Turn the oven on to 450 degrees for 2 hours",
parts: [],
neverParts: [
{ phrases: ["Don't", "Do not", "Cancel", "Stop", "Off"]}
],
},
//...
];
};
You can set options on an example part to indicate if it is more/less important than other parts.
You can weight a part relative to other parts. A weight of zero indictes an optional part. The default is 1.
You can make a part required. If a required part is not found, the entire example gets a score of 0.
You can indicate that a part can appear in any order within the example.
const intent : Intent = {
name: 'Turn on oven',
examples: [
{
name: "Turn the oven on to 450 degrees for 2 hours",
parts: [
{ phrases: ["Turn on the oven to"] },
{ patterns: ["###"], variable: "temperature", weight: 4 },
{ phrases: ["degrees"], variable: "temperatureUnit" },
{ phrases: ["for"], weight: 0 },
{ regularExpressions: ["\\d+"], variable: "duration", weight: 2 },
{ phrases: ["hours"], variable: "durationUnit" },
],
neverParts: [],
}
//...
];
};
Parts are expected to appear in order and the score is reduced for out of order parts. You can indicate that a part can appear in any order within the example by the ignoreOrder
property.
//...
parts: [
{ phrases: "Please", ignoreOrder: true}
//...
],
//...
Use the shared property to specify named sets of phrases, patterns, and regular expressions to use across intents and examples.
const options = {
shared: {
temperatureUnits: ['fahrenheit', 'celcius', 'kelvin'],
timeDurations: ['hours', 'minutes'],
datePatterns: ['####-##-##','##/##/##','##-####'],
timeRegexs: ['\\d\\d:\\d\\d', '\\d+']
}
};
You can then include them into example parts by reference.
const part : ExamplePart = {
// This resolves to ['degrees', fahrenheit', 'celcius', 'kelvin']
phrases: ['degrees', '$ref=temperatureUnits']
}
Set the maxOutOfOrderPenalty or maxNoisePenalty to control how severe the penalties are when scoring examples. Values must be between 0 and 1 inclusive.
const options = {
maxOutOfOrderPenalty: 0.2,
maxNoisePenalty: 0
};
The tokenize method takes a string of text and returns a TokenMap. A TokenMap is the original text and an array of character ranges with one range per token.
export type Tokenizer = (text: string) => TokenMap;
You can implement a tokenizer to separate words based on a particular language, or if you want to break on different delimiters. The default tokenizer breaks up words based on ' .,:;' (i.e. space, period, comma, colon, semicolon, question mark, and exclamation point). Pass your tokenizer in the options.
const options = {
tokenizer: myTokenizer,
};
FAQs
A simple natural language processing (NLP) recognizer you can use in minutes.
The npm package @geoffcox/pretty-good-nlp receives a total of 0 weekly downloads. As such, @geoffcox/pretty-good-nlp popularity was classified as not popular.
We found that @geoffcox/pretty-good-nlp demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
The Rust Security Response WG is warning of phishing emails from rustfoundation.dev targeting crates.io users.
Product
Socket now lets you customize pull request alert headers, helping security teams share clear guidance right in PRs to speed reviews and reduce back-and-forth.
Product
Socket's Rust support is moving to Beta: all users can scan Cargo projects and generate SBOMs, including Cargo.toml-only crates, with Rust-aware supply chain checks.