
Security News
Axios Maintainer Confirms Social Engineering Attack Behind npm Compromise
Axios compromise traced to social engineering, showing how attacks on maintainers can bypass controls and expose the broader software supply chain.
TN-Thai analyzer is a Thai Word segmentation module in Node.js. The segmentation algorithm is a kind of dictionary-based segmentation. Internally, the analyzer contains two segmentation algorithm named Safe and Unsafe segmentation (In Thai , soon will be in English version). The library uses Trie data structure in Double-array implementation to store Thai words. The Thai-word dictionary is coming from Lexitron (NECTEC) and Swath Program.
npm install tnthai
or
npm install tnthai --save
tnthai = require('tnthai')
var analyzer = new tnthai()
analyzer.segmenting("สวัสดีชาวโลก")
// { solution : ['สวัสดี', 'ชาวโลก'] }
analyzer.segmenting("สองสาวสุดแสนสวยใส่เสื้อสีแสดสวมสร้อยสี่แสนสามสิบเส้นส้นสูง")
// { solution: [ 'สอง', 'สาว', 'สุด', 'แสน', 'สวย', 'ใส่', 'เสื้อ', 'สี'
, 'แสด','สวม', 'สร้อย', 'สี่', 'แสน', 'สาม', 'สิบ', 'เส้น', 'ส้นสูง'] }
Filter stopword in the segmented result
analyzer.segmenting("เราคนหนึ่งคนนั้น ในวันหนึ่งวันนั้นเรายังผูกพันกันมากมาย"
, {filterStopword : true})
// {solution: [ 'คน', 'คน', ' ', 'วันหนึ่ง', 'ผูกพัน', 'มากมาย' ]}
analyze Thai and English (but not so smart)
analyzer.segmenting("สวัสดีชาวโลก Hello World!!")
// {solution: [ 'สวัสดี', 'ชาวโลก', ' ', 'Hello', ' ', 'World', '!!' ]}
give multiple Solution in segmentation
analyzer.segmenting("คนแก่ขนของ", {multiSolution : true})
// { solution:
// [ [ 'คนแก่', 'ขนของ' ],
// [ 'คนแก่', 'ขน', 'ของ' ],
// [ 'คน', 'แก่', 'ขนของ' ],
// [ 'คน', 'แก่', 'ขน', 'ของ' ] ] }
unsafe segment in case of misspell occur in the input sentences
//misspell input
analyzer.segmenting("คนแก่สขนของ", {multiSolution : true})
// { solution: [ [ 'คนแก่', 'ส', 'ขนของ' ] ] }
Applications of thai word segmentation:
gitlab url : https://gitlab.thinknet.co.th/prapeepat/TNThaiAnalyzer
Up-coming features : In version 1.1.0, there will be POS (Parts of speech) tagging feature using Probabilistic N-gram with Orchid corpus. The example usage will be like followed:
analyzer.segmenting("คนแก่ขนของ", {multiSolution : true, POSTagging : true})
// { solution:
// [ [ {'คนแก่', 'NPRP'}, {'ขนของ', 'VACT'} ],
// [ {'คนแก่', 'NPRP'}, {'ขน', 'VACT'}, {'ของ', 'NCMN'} ],
// [ {'คน', 'NCMN'}, {'แก่', 'VATT'}, {'ขนของ', 'VACT'} ],
// [ {'คน', 'NCMN'}, {'แก่', 'VATT'}, {'ขน', 'VACT'}, {'ของ', 'NCMN'} ] ] }
// NPRP ~ Proper noun , VACT ~ Active verb, NCMN ~ Common noun, VATT ~ Attributive verb
the detail of POSTagging can be found here
opened to have feedback from you guys!!!
FAQs
a portable TN-Thai analyzer from java to javascript
We found that tnthai demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 2 open source maintainers collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
Axios compromise traced to social engineering, showing how attacks on maintainers can bypass controls and expose the broader software supply chain.

Security News
Node.js has paused its bug bounty program after funding ended, removing payouts for vulnerability reports but keeping its security process unchanged.

Security News
The Axios compromise shows how time-dependent dependency resolution makes exposure harder to detect and contain.