
Research
/Security News
Critical Vulnerability in NestJS Devtools: Localhost RCE via Sandbox Escape
A flawed sandbox in @nestjs/devtools-integration lets attackers run code on your machine via CSRF, leading to full Remote Code Execution (RCE).
mygithub.libinneed.workers.dev/sugarme/tokenizer
tokenizer
is pure Go package to facilitate applying Natural Language Processing (NLP) models train/test and inference in Go.
It is heavily inspired by and based on the popular HuggingFace Tokenizers.
tokenizer
is part of an ambitious goal (together with transformer and gotch) to bring more AI/deep-learning tools to Gophers so that they can stick to the language they love and build faster software in production.
tokenizer
is built in modules located in sub-packages.
It implements various tokenizer models:
It can be used for both training new models from scratch or fine-tuning existing models. See examples detail.
This tokenizer package is compatible to load pretrained models from Huggingface. Some of them can be loaded using pretrained
subpackage.
import (
"fmt"
"log"
"github.com/sugarme/tokenizer/pretrained"
)
func main() {
// Download and cache pretrained tokenizer. In this case `bert-base-uncased` from Huggingface
// can be any model with `tokenizer.json` available. E.g. `tiiuae/falcon-7b`
configFile, err := tokenizer.CachedPath("bert-base-uncased", "tokenizer.json")
if err != nil {
panic(err)
}
tk, err := pretrained.FromFile(configFile)
if err != nil {
panic(err)
}
sentence := `The Gophers craft code using [MASK] language.`
en, err := tk.EncodeSingle(sentence)
if err != nil {
log.Fatal(err)
}
fmt.Printf("tokens: %q\n", en.Tokens)
fmt.Printf("offsets: %v\n", en.Offsets)
// Output
// tokens: ["the" "go" "##pher" "##s" "craft" "code" "using" "[MASK]" "language" "."]
// offsets: [[0 3] [4 6] [6 10] [10 11] [12 17] [18 22] [23 28] [29 35] [36 44] [44 45]]
}
All models can be loaded from files manually. pkg.go.dev for detail APIs.
tokenizer
is Apache 2.0 licensed.
FAQs
Unknown package
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
/Security News
A flawed sandbox in @nestjs/devtools-integration lets attackers run code on your machine via CSRF, leading to full Remote Code Execution (RCE).
Product
Customize license detection with Socket’s new license overlays: gain control, reduce noise, and handle edge cases with precision.
Product
Socket now supports Rust and Cargo, offering package search for all users and experimental SBOM generation for enterprise projects.