
Product
Announcing Precomputed Reachability Analysis in Socket
Socket’s precomputed reachability slashes false positives by flagging up to 80% of vulnerabilities as irrelevant, with no setup and instant results.
github.com/askeladdk/langdet
Package langdet detects natural languages in text using a straightforward implementation of trigram based text categorization. The most commonly used languages worldwide are supported out of the box, but the code is flexible enough to accept any set of languages.
Langdet first detects the writing script in order to narrow down the number of languages to test against. Some writing scripts are used by only a single language (Korean, Greek, etc). In that case the language is returned directly without needing to do trigram analysis. Otherwise, it matches each language profile under the detected writing script against the input text and returns a result set listing the languages ordered by confidence.
go get -u github.com/askeladdk/langdet
Use DetectLanguage
to detect the language of a string. It returns the BCP 47 language tag of the language with the highest probability. If no language was detected, the function returns language.Und
.
detectedLanguage := langdet.DetectLanguage(s)
Use DetectLanguageWithOptions
if you need more control. DetectLanguage
is a shorthand for this function using DefaultOptions
. Unlike DetectLanguage
, DetectLanguageWithOptions
returns a slice of Result
s listing the probabilities of all languages using the detected writing script ordered by probability.
results := langdet.DetectLanguageWithOptions(s, DefaultOptions)
Use Options
to configure the detector. Any number of writing scripts and languages can be detected by setting the Scripts
and Languages
fields. Use the Train
function to build language profiles. Use MinConfidence
and MinRelConfidence
to filter languages by confidence.
myLang := langdet.Language {
Tag: language.Make("zz"),
Trigrams: langdet.Train(trainingSet),
}
options := langdet.Options {
Scripts: []*unicode.RangeTable{
unicode.Latin,
},
Languages: map[*unicode.RangeTable]langdet.Languages {
unicode.Latin: {
Languages: []langdet.Languge {
langdet.Dutch,
langdet.French,
myLang,
},
},
},
}
results := langdet.DetectLanguageWithOptions(s, options)
Read the rest of the documentation on pkg.go.dev. It's easy-peasy!
Package langdet is released under the terms of the ISC license.
FAQs
Unknown package
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Product
Socket’s precomputed reachability slashes false positives by flagging up to 80% of vulnerabilities as irrelevant, with no setup and instant results.
Product
Socket is launching experimental protection for Chrome extensions, scanning for malware and risky permissions to prevent silent supply chain attacks.
Product
Add secure dependency scanning to Claude Desktop with Socket MCP, a one-click extension that keeps your coding conversations safe from malicious packages.