
Security News
CISA Kills Off RSS Feeds for KEVs and Cyber Alerts
CISA is discontinuing official RSS support for KEV and cybersecurity alerts, shifting updates to email and social media, disrupting automation workflows.
code.sajari.com/fuzzy
Fuzzy is a very fast spell checker and query suggester written in Golang.
Motivation:
Notes:
Config:
"threshold"
is the trigger point when a word becomes popular enough to build lookup keys for it. Setting this to "1" means any instance of a given word makes it a legitimate spelling. This typically corrects the most errors, but can also cause false positives if incorrect spellings exist in the training data. It also causes a much larger index to be built. By default this is set to 4."depth"
is the Levenshtein distance the model builds lookup keys for. For spelling correction, a setting of "2" is typically very good. At a distance of "3" the potential number of words is much, much larger, but adds little benefit to accuracy. For query prediction a larger number can be useful, but again is much more expensive. A depth of "1" and threshold of "1" for the 1st Norvig test set gives ~70% correction accuracy at ~5usec per check (e.g. ~200kHz), for many applications this will be good enough. At depths > 2, the false positives begin to hurt the accuracy.Future improvements:
Usage:
support@sajari.com
package main
import(
"github.com/sajari/fuzzy"
"fmt"
)
func main() {
model := fuzzy.NewModel()
// For testing only, this is not advisable on production
model.SetThreshold(1)
// This expands the distance searched, but costs more resources (memory and time).
// For spell checking, "2" is typically enough, for query suggestions this can be higher
model.SetDepth(5)
// Train multiple words simultaneously by passing an array of strings to the "Train" function
words := []string{"bob", "your", "uncle", "dynamite", "delicate", "biggest", "big", "bigger", "aunty", "you're"}
model.Train(words)
// Train word by word (typically triggered in your application once a given word is popular enough)
model.TrainWord("single")
// Check Spelling
fmt.Println("\nSPELL CHECKS")
fmt.Println(" Deletion test (yor) : ", model.SpellCheck("yor"))
fmt.Println(" Swap test (uncel) : ", model.SpellCheck("uncel"))
fmt.Println(" Replace test (dynemite) : ", model.SpellCheck("dynemite"))
fmt.Println(" Insert test (dellicate) : ", model.SpellCheck("dellicate"))
fmt.Println(" Two char test (dellicade) : ", model.SpellCheck("dellicade"))
// Suggest completions
fmt.Println("\nQUERY SUGGESTIONS")
fmt.Println(" \"bigge\". Did you mean?: ", model.Suggestions("bigge", false))
fmt.Println(" \"bo\". Did you mean?: ", model.Suggestions("bo", false))
fmt.Println(" \"dyn\". Did you mean?: ", model.Suggestions("dyn", false))
// Autocomplete suggestions
suggested, _ := model.Autocomplete("bi")
fmt.Printf(" \"bi\". Suggestions: %v", suggested)
}
FAQs
Unknown package
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
CISA is discontinuing official RSS support for KEV and cybersecurity alerts, shifting updates to email and social media, disrupting automation workflows.
Security News
The MCP community is launching an official registry to standardize AI tool discovery and let agents dynamically find and install MCP servers.
Research
Security News
Socket uncovers an npm Trojan stealing crypto wallets and BullX credentials via obfuscated code and Telegram exfiltration.