Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

github.com/askeladdk/langdet

Package Overview
Dependencies
Alerts
File Explorer
Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

github.com/askeladdk/langdet

  • v0.0.3
  • Source
  • Go
  • Socket score

Version published
Created
Source

langdet - Language Detection for Go

GoDoc Go Report Card

Overview

Package langdet detects natural languages in text using a straightforward implementation of trigram based text categorization. The most commonly used languages worldwide are supported out of the box, but the code is flexible enough to accept any set of languages.

Langdet first detects the writing script in order to narrow down the number of languages to test against. Some writing scripts are used by only a single language (Korean, Greek, etc). In that case the language is returned directly without needing to do trigram analysis. Otherwise, it matches each language profile under the detected writing script against the input text and returns a result set listing the languages ordered by confidence.

Install

go get -u github.com/askeladdk/langdet

Quickstart

Use DetectLanguage to detect the language of a string. It returns the BCP 47 language tag of the language with the highest probability. If no language was detected, the function returns language.Und.

detectedLanguage := langdet.DetectLanguage(s)

Use DetectLanguageWithOptions if you need more control. DetectLanguage is a shorthand for this function using DefaultOptions. Unlike DetectLanguage, DetectLanguageWithOptions returns a slice of Results listing the probabilities of all languages using the detected writing script ordered by probability.

results := langdet.DetectLanguageWithOptions(s, DefaultOptions)

Use Options to configure the detector. Any number of writing scripts and languages can be detected by setting the Scripts and Languages fields. Use the Train function to build language profiles. Use MinConfidence and MinRelConfidence to filter languages by confidence.

myLang := langdet.Language {
    Tag: language.Make("zz"),
    Trigrams: langdet.Train(trainingSet),
}

options := langdet.Options {
    Scripts: []*unicode.RangeTable{
        unicode.Latin,
    },
    Languages: map[*unicode.RangeTable]langdet.Languages {
        unicode.Latin: {
            Languages: []langdet.Languge {
                langdet.Dutch,
                langdet.French,
                myLang,
            },
        },
    },
}

results := langdet.DetectLanguageWithOptions(s, options)

Read the rest of the documentation on pkg.go.dev. It's easy-peasy!

License

Package langdet is released under the terms of the ISC license.

FAQs

Package last updated on 01 Feb 2023

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc