Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

github.com/maki-daisuke/go-triegun

Package Overview
Dependencies
Alerts
File Explorer
Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

github.com/maki-daisuke/go-triegun

  • v0.0.0-20151201102013-552f65e0c5e0
  • Source
  • Go
  • Socket score

Version published
Created
Source

go-triegun

import triegun "github.com/Maki-Daisuke/go-triegun"

Package go-triegun generates Golang code for matching string based on trie (prefix tree), which is far faster than regexp standard package.

Description

Testing whether a string contains another string is trivial and daily task. For example, detecting bot from User-Agent is a kind of this task. You can do it with using regexp package like this:

import "regexp"

var re := regexp.MustCompile("Baiduspider|bingbot|Googlebot|Twitterbot")
if re.MatchString(userAgent) {
  // Matched!
}

It looks quite easy!

But, as the the number of bot signature increases, the implementation of regexp becomes very slow. Actually, regexp is overkill to solve the problem. It can be done more simply and faster.

Here, we can use trie (prefix tree), which is good at testing if a string contains any of a set of strings as its prefix. This package generates Go code from a set of strings based upon trie. Precompiled matcher is quite faster than one of regexp.

Actually, this package is more than trie. It can match not only prefix, but also middle of string.

Benchmark

Run by my laptop (Macbook 2015, 1.3 GHz Intel Core-M):

$ go test -bench .
PASS
BenchmarkContainsRegexp-4    	   10000	    258149 ns/op
BenchmarkContainsGeneraetd-4 	  200000	      8276 ns/op
BenchmarkHasPrefixRegexp-4   	   10000	    247447 ns/op
BenchmarkHasPrefixGeneraetd-4	 1000000	      3966 ns/op
BenchmarkIsInRegexp-4        	  200000	      9054 ns/op
BenchmarkIsInGeneraetd-4     	  500000	      5119 ns/op
ok  	github.com/Maki-Daisuke/go-triegun/test	16.089s

30x faster than regexp! It can be much faster in real world program.

You can run the same benchmark test as follows:

$ go get github.com/Maki-Daisuke/go-triegun
$ cd $GOPATH/src/github.com/Maki-Daisuke/go-triegun/test
$ go generate
$ go test -bench .

Usage

There are two way to use this package:

1. Using Command

This package includes a command called triegun. You can install it just by typing this in your command line:

$ go get github.com/Maki-Daisuke/go-triegun/cmd/triegun
$ triegun -h
Usage:
  triegun [OPTIONS] [FILES...]

Application Options:
  -p, --package=           package name (default: main)
  -t, --tag=               tag name included in the generated functions
  -C, --disable-contains   Suppress generating code for Contains* functions (default: false)
  -I, --disable-isin       Suppress generating code for IsIn* functions (default: false)
  -P, --disable-hasprefix  Suppress generating code for HasPrefix* functions (default: false)

Help Options:
  -h, --help               Show this help message

triegun reads text from files specified as command arguments or STDIN if no argument is passed. Then, it generate Go code matching any of the input lines and output the code to STDOUT. For example:

$ cat signatures.txt
Baiduspider
bingbot
Googlebot
Twitterbot
$ triegun -t Bot signatures.txt > matcher.go

It generates the following six functions:

func ContainsBot(b []byte) bool
func ContainsBotString(s string) bool
func HasPrefixBot(b []byte) bool
func HasPrefixBotString(s string) bool
func IsInBot(b []byte) bool
func IsInBotString(s string) bool

You can call them as you expect:

package main

import (
  "bufio"
  "os"
)

func main(){
  r := bufio.NewReader(os.Stdin)
  line, err := r.ReadSlice('\n')
  if ContainsBot(line) {
    // do something
  }
  // or
  if ContainsBotString(string(line)) {
    // do another thing
  }
}

This way (use triegun) just works well, but does not look so cool. And sometimes, it's not useful, if you want to match against newline character ("\n") or other special characters. In those case, you can use "go generate". See the next section.

2. Using "go generate"

From Go 1.4, we can use go generate to generate Go code with using Gotools.

Given that we want to do the same example as above, at first prepare the code to generate the matchers like this:

// makenmatchers.go

// Declare `go build` ignores this file.
// +build ignore

package main

import triegun "github.com/Maki-Daisuke/go-triegun"

var signatures = []string{
  "Baiduspider",
  "bingbot",
  "Googlebot",
  "Twitterbot",
}

func main() {
	t := triegun.New()
	t.PkgName = "main"
	t.TagName = "Bot"
	t.AddString(signatures...)
	// Generate matcher code into "matchers_generated.go" with "Bot" tag.
	err := t.GenFile("matchers_generated.go")
	if err != nil {
		panic(err)
	}
}

Then, add a special comment in your main code:

package main

//go:generate go run makenmatchers.go

import (
	"bufio"
	"os"
)

func main(){
	r := bufio.NewReader(os.Stdin)
	line, err := r.ReadSlice('\n')
	if ContainsBot(line) {
		// do something
	}
	// or
	if ContainsBotString(string(line)) {
		// do another thing
	}
}

Now, run go generate:

$ go generate

This will produce file "matchers_generated.go" with the matchers code. You can now build and run your program:

$ go build

This way is recommended, because build process is clearly documented in your source code, and all you need is only Gotools to build it, you don't need make or other toolchains.

Anyway, you can choose your favorite!

License

The Simplified BSD License (2-clause). See LICENSE file also.

Author

Daisuke (yet another) Maki

FAQs

Package last updated on 01 Dec 2015

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc