Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
morpheme-match
Advanced tools
morpheme-match provide match function that match token with sentence.
形態素解析したトークンを元に、文章にマッチするトークンが含まれているかをチェックするライブラリ。
Install with npm:
npm install morpheme-match
createTokenMatcher()
return function(token): { match: boolean, tokens?: Array, skipped? Array }
.
We want to check "名詞かもしれない" contain "かも" token. Write following:
See example code with azu.github.io/morpheme-match/#名詞(かも)しれない.
import {createTokenMatcher} from "morpheme-match";
const matchToken = createTokenMatcher([
{
"surface_form": "かも",
"pos": "助詞",
"pos_detail_1": "副助詞",
"pos_detail_2": "*",
"pos_detail_3": "*",
"conjugated_type": "*",
"conjugated_form": "*",
"basic_form": "かも",
"reading": "カモ",
"pronunciation": "カモ"
}
]);
const tokens = [
{
"surface_form": "名詞",
"pos": "名詞",
"pos_detail_1": "一般",
"pos_detail_2": "*",
"pos_detail_3": "*",
"conjugated_type": "*",
"conjugated_form": "*",
"basic_form": "名詞",
"reading": "メイシ",
"pronunciation": "メイシ"
},
// Hit!
{
"surface_form": "かも",
"pos": "助詞",
"pos_detail_1": "副助詞",
"pos_detail_2": "*",
"pos_detail_3": "*",
"conjugated_type": "*",
"conjugated_form": "*",
"basic_form": "かも",
"reading": "カモ",
"pronunciation": "カモ"
},
{
"surface_form": "しれ",
"pos": "動詞",
"pos_detail_1": "自立",
"pos_detail_2": "*",
"pos_detail_3": "*",
"conjugated_type": "一段",
"conjugated_form": "未然形",
"basic_form": "しれる",
"reading": "シレ",
"pronunciation": "シレ"
},
{
"surface_form": "ない",
"pos": "助動詞",
"pos_detail_1": "*",
"pos_detail_2": "*",
"pos_detail_3": "*",
"conjugated_type": "特殊・ナイ",
"conjugated_form": "基本形",
"basic_form": "ない",
"reading": "ナイ",
"pronunciation": "ナイ"
}
];
const result = tokens.some(token => {
const {match} = matchToken(token);
return match;
});
console.log(result);// true
If want to get matched token, write following:
let resultTokens = [];
const result = tokens.some(token => {
const {match, tokens, skipped} = matchToken(token);
resultTokens = tokens;
return match;
});
console.log(resultTokens);
/*
[ { surface_form: 'かも',
pos: '助詞',
pos_detail_1: '副助詞',
pos_detail_2: '*',
pos_detail_3: '*',
conjugated_type: '*',
conjugated_form: '*',
basic_form: 'かも',
reading: 'カモ',
pronunciation: 'カモ' } ]
*/
morpheme-matchは_
から始まるキーを無視するため、メタ情報は_
で書き込む事ができます。
const matchToken = createTokenMatcher([
{
"surface_form": "かも",
"pos": "助詞",
"pos_detail_1": "副助詞",
"pos_detail_2": "*",
"pos_detail_3": "*",
"conjugated_type": "*",
"conjugated_form": "*",
"basic_form": "かも",
"reading": "カモ",
"pronunciation": "カモ",
"_cature": "$1"
}
]);
キー_skippable
がtrue
の場合はマッチしない場合は無視されます。
const matchToken = createTokenMatcher([
{
"surface_form": "かも",
},
{
"surface_form": "、",
"_skippable": true,
},
{
"surface_form": "しれ",
},
]);
See Releases page.
Install devDependencies and Run npm test
:
npm i -d && npm test
Pull requests and stars are always welcome. For bugs and feature requests, please create an issue.
git checkout -b my-new-feature
git commit -am 'Add some feature'
git push origin my-new-feature
MIT © azu
FAQs
match function that match token(形態素解析) with sentence.
The npm package morpheme-match receives a total of 38,605 weekly downloads. As such, morpheme-match popularity was classified as popular.
We found that morpheme-match demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.