Testing Framework

github.com/eblume/proto

proto gives Go operations like Map, Reduce, Filter, De/Multiplex, etc. without sacrificing idiomatic harmony or speed. The `Proto` type is a stand-in approximation for dynamic typing. Due to Go's powerful casting and type inference idioms, we can approximate the flexibility of dynamic typing even though Go is a statically typed language. Doing so sacrifices some of the benefits of static typing AND some of the benefits of dynamic typing, but this sacrifice is fundamentally required by Go until such time as a true 'Generic' type is implemented. In order to use a Proto-typed variable (from here on out, simply a 'Proto'), you will generally have to cast it to a type that you will know to use based on the semantics of your program. This package (specifically, the other files in this package) provide operations on Proto variables as well as some that make Proto variables out of 'traditionally typed' variables. Many of the operations will require the use of higher-order functions which you will need to provide, and those functions commonly will need you to manually "unbox" (cast-from-Proto) the variable to perform useful operations. Examples of the use of this package can be found in the "*_test.go" files, which contain testing code. A good example of a higher-order function which will commonly need manual-unboxing is the `Filter` function, found in "filter.go". `Filter` takes as its first argument a filter-function which will almost certainly require you to un-box the Proto channel values that it receives to perform the filtering action. Finally, a word on the entire point of this package: while it is named after the Proto type that pervades it and guides its syntax, the true nature of the `proto` package lies in cascading channels, rather than in dynamic typing. In fact this package might be more appropriately named after channels. Maybe `canal` would have been a better name. I wanted to bring the syntax and familiar patterns of functional programming idioms to the power and scalability of Go's goroutines and channels, and found that the syntax made this task very simple. You may find, as I did, that the majority of the code in this package is very 'obvious'. At first I was concerned by this - much of the code is very trivial - but now I feel pleased by the re-usability and natural 'correctness' of `proto`. Look at this package not as some monumental time-saving framework, but rather as a light scaffold for a useful and idiomatic style of programming within the existing constructs of Go. Ultimately, though, you're going to be typing the word Proto an awful lot, and thus the type became the eponym.

v0.0.0-20121219175714-96b560c0b3d4 • 11 years ago

github.com/rleighton/gosnowflake

Package gosnowflake is a pure Go Snowflake driver for the database/sql package. Clients can use the database/sql package directly. For example: Use Open to create a database handle with connection parameters: The Go Snowflake Driver supports the following connection syntaxes (or data source name formats): where all parameters must be escaped or use `Config` and `DSN` to construct a DSN string. The following example opens a database handle with the Snowflake account myaccount where the username is jsmith, password is mypassword, database is mydb, schema is testschema, and warehouse is mywh: The following connection parameters are supported: account <string>: Specifies the name of your Snowflake account, where string is the name assigned to your account by Snowflake. In the URL you received from Snowflake, your account name is the first segment in the domain (e.g. abc123 in https://abc123.snowflakecomputing.com). This parameter is optional if your account is specified after the @ character. If you are not on us-west-2 region or AWS deployment, then append the region after the account name, e.g. “<account>.<region>”. If you are not on AWS deployment, then append not only the region, but also the platform, e.g., “<account>.<region>.<platform>”. Account, region, and platform should be separated by a period (“.”), as shown above. If you are using a global url, then append connection group and "global", e.g., "account-<connection_group>.global". Account and connection group are separated by a dash ("-"), as shown above. region <string>: DEPRECATED. You may specify a region, such as “eu-central-1”, with this parameter. However, since this parameter is deprecated, it is best to specify the region as part of the account parameter. For details, see the description of the account parameter. database: Specifies the database to use by default in the client session (can be changed after login). schema: Specifies the database schema to use by default in the client session (can be changed after login). warehouse: Specifies the virtual warehouse to use by default for queries, loading, etc. in the client session (can be changed after login). role: Specifies the role to use by default for accessing Snowflake objects in the client session (can be changed after login). passcode: Specifies the passcode provided by Duo when using MFA for login. passcodeInPassword: false by default. Set to true if the MFA passcode is embedded in the login password. Appends the MFA passcode to the end of the password. loginTimeout: Specifies the timeout, in seconds, for login. The default is 60 seconds. The login request gives up after the timeout length if the HTTP response is success. authenticator: Specifies the authenticator to use for authenticating user credentials: To use the internal Snowflake authenticator, specify snowflake (Default). To authenticate through Okta, specify https://<okta_account_name>.okta.com (URL prefix for Okta). To authenticate using your IDP via a browser, specify externalbrowser. To authenticate via OAuth, specify oauth and provide an OAuth Access Token (see the token parameter below). application: Identifies your application to Snowflake Support. insecureMode: false by default. Set to true to bypass the Online Certificate Status Protocol (OCSP) certificate revocation check. IMPORTANT: Change the default value for testing or emergency situations only. token: a token that can be used to authenticate. Should be used in conjunction with the "oauth" authenticator. client_session_keep_alive: Set to true have a heartbeat in the background every hour to keep the connection alive such that the connection session will never expire. Care should be taken in using this option as it opens up the access forever as long as the process is alive. ocspFailOpen: true by default. Set to false to make OCSP check fail closed mode. validateDefaultParameters: true by default. Set to false to disable checks on existence and privileges check for Database, Schema, Warehouse and Role when setting up the connection All other parameters are taken as session parameters. For example, TIMESTAMP_OUTPUT_FORMAT session parameter can be set by adding: The Go Snowflake Driver honors the environment variables HTTP_PROXY, HTTPS_PROXY and NO_PROXY for the forward proxy setting. NO_PROXY specifies which hostname endings should be allowed to bypass the proxy server, e.g. :code:`no_proxy=.amazonaws.com` means that AWS S3 access does not need to go through the proxy. NO_PROXY does not support wildcards. Each value specified should be one of the following: The end of a hostname (or a complete hostname), for example: ".amazonaws.com" or "xy12345.snowflakecomputing.com". An IP address, for example "192.196.1.15". If more than one value is specified, values should be separated by commas, for example: By default, the driver's builtin logger is NOP; no output is generated. This is intentional for those applications that use the same set of logger parameters not to conflict with glog, which is incorporated in the driver logging framework. In order to enable debug logging for the driver, add a build tag sfdebug to the go tool command lines, for example: For tests, run the test command with the tag along with glog parameters. For example, the following command will generate all acitivty logs in the standard error. Likewise, if you build your application with the tag, you may specify the same set of glog parameters. To get the logs for a specific module, use the -vmodule option. For example, to retrieve the driver.go and connection.go module logs: Note: If your request retrieves no logs, call db.Close() or glog.flush() to flush the glog buffer. Note: The logger may be changed in the future for better logging. Currently if the applications use the same parameters as glog, you cannot collect both application and driver logs at the same time. From 0.5.0, a signal handling responsibility has moved to the applications. If you want to cancel a query/command by Ctrl+C, add a os.Interrupt trap in context to execute methods that can take the context parameter, e.g., QueryContext, ExecContext. See cmd/selectmany.go for the full example. Queries return SQL column type information in the ColumnType type. The DatabaseTypeName method returns the following strings representing Snowflake data types: Go's database/sql package limits Go's data types to the following for binding and fetching: Fetching data isn't an issue since the database data type is provided along with the data so the Go Snowflake Driver can translate Snowflake data types to Go native data types. When the client binds data to send to the server, however, the driver cannot determine the date/timestamp data types to associate with binding parameters. For example: To resolve this issue, a binding parameter flag is introduced that associates any subsequent time.Time type to the DATE, TIME, TIMESTAMP_LTZ, TIMESTAMP_NTZ or BINARY data type. The above example could be rewritten as follows: The driver fetches TIMESTAMP_TZ (timestamp with time zone) data using the offset-based Location types, which represent a collection of time offsets in use in a geographical area, such as CET (Central European Time) or UTC (Coordinated Universal Time). The offset-based Location data is generated and cached when a Go Snowflake Driver application starts, and if the given offset is not in the cache, it is generated dynamically. Currently, Snowflake doesn't support the name-based Location types, e.g., America/Los_Angeles. For more information about Location types, see the Go documentation for https://golang.org/pkg/time/#Location. Internally, this feature leverages the []byte data type. As a result, BINARY data cannot be bound without the binding parameter flag. In the following example, sf is an alias for the gosnowflake package: The driver directly downloads a result set from the cloud storage if the size is large. It is required to shift workloads from the Snowflake database to the clients for scale. The download takes place by goroutine named "Chunk Downloader" asynchronously so that the driver can fetch the next result set while the application can consume the current result set. The application may change the number of result set chunk downloader if required. Note this doesn't help reduce memory footprint by itself. Consider Custom JSON Decoder. Experimental: Custom JSON Decoder for parsing Result Set The application may have the driver use a custom JSON decoder that incrementally parses the result set as follows. This option will reduce the memory footprint to half or even quarter, but it can significantly degrade the performance depending on the environment. The test cases running on Travis Ubuntu box show five times less memory footprint while four times slower. Be cautious when using the option. (Private Preview) JWT authentication ** Not recommended for production use until GA Now JWT token is supported when compiling with a golang version of 1.10 or higher. Binary compiled with lower version of golang would return an error at runtime when users try to use JWT authentication feature. To enable this feature, one can construct DSN with fields "authenticator=SNOWFLAKE_JWT&privateKey=<your_private_key>", or using Config structure specifying: The <your_private_key> should be a base64 URL encoded PKCS8 rsa private key string. One way to encode a byte slice to URL base 64 URL format is through base64.URLEncoding.EncodeToString() function. On the server side, one can alter the public key with the SQL command: The <your_public_key> should be a base64 Standard encoded PKI public key string. One way to encode a byte slice to base 64 Standard format is through base64.StdEncoding.EncodeToString() function. To generate the valid key pair, one can do the following command on the shell script: GET and PUT operations are unsupported.

v1.5.1 • 3 years ago

github.com/draft6/mow.cli

Package cli provides a framework to build command line applications in Go with most of the burden of arguments parsing and validation placed on the framework instead of the user. To create a new application, initialize an app with cli.App. Specify a name and a brief description for the application: To attach code to execute when the app is launched, assign a function to the Action field: To assign a version to the application, use Version method and specify the flags that will be used to invoke the version command: Finally, in the main func, call Run passing in the arguments for parsing: To add one or more command line options (also known as flags), use one of the short-form StringOpt, StringsOpt, IntOpt, IntsOpt, or BoolOpt methods on App (or Cmd if adding flags to a command or subcommand). For example, to add a boolean flag to the cp command that specifies recursive mode, use the following: The first argument is a space separated list of names for the option without the dashes. Generally, both short and long forms are specified, but this is not mandatory. Additional names (aliases) can be provide if desired, but these are not shown in the auto-generated help. The second argument is the default value for the option if it is not supplied by the user. And, the third argument is the description to be shown in help messages. There is also a second set of methods on App called String, Strings, Int, Ints, and Bool, which accept a long-form struct of the type: cli.StringOpt, cli.StringsOpt, cli.IntOpt, cli.IntsOpt, cli.BoolOpt. The struct describes the option and allows the use of additional features not available in the short-form methods described above: Two features, EnvVar and SetByUser, can be defined in the long-form struct method. EnvVar is a space separated list of environment variables used to initialize the option if a value is not provided by the user. When help messages are shown, the value of any environment variables will be displayed. SetByUser is a pointer to a boolean variable that is set to true if the user specified the value on the command line. This can be useful to determine if the value of the option was explicitly set by the user or set via the default value. The result of both short- and long-forms is a pointer to a value, which will be populated after the command line arguments are parsed. You can only access the values stored in the pointers in the Action func, which is invoked after argument parsing has been completed. This precludes using the value of one option as the default value of another. On the command line, the following syntaxes are supported when specifying options. Boolean options: String and int options: Slice options (StringsOpt, IntsOpt) where option is repeated to accumulate values in a slice: To add one or more command line arguments (not prefixed by dashes), use one of the short-form StringArg, StringsArg, IntArg, IntsArg, or BoolArg methods on App (or Cmd if adding arguments to a command or subcommand). For example, to add two string arguments to our cp command, use the following calls: The first argument is the name of the argument as displayed in help messages. Argument names must be specified as all uppercase. The second argument is the default value for the argument if it is not supplied. And the third, argument is the description to be shown in help messages. There is also a second set of methods on App called String, Strings, Int, Ints, and Bool, which accept a long-form struct of the type: cli.StringArg, cli.StringsArg, cli.IntArg, cli.IntsArg, cli.BoolArg. The struct describes the arguments and allows the use of additional features not available in the short-form methods described above: Two features, EnvVar and SetByUser, can be defined in the long-form struct method. EnvVar is a space separated list of environment variables used to initialize the argument if a value is not provided by the user. When help messages are shown, the value of any environment variables will be displayed. SetByUser is a pointer to a boolean variable that is set to true if the user specified the value on the command line. This can be useful to determine if the value of the argument was explicitly set by the user or set via the default value. The result of both short- and long-forms is a pointer to a value which will be populated after the command line arguments are parsed. You can only access the values stored in the pointers in the Action func, which is invoked after argument parsing has been completed. This precludes using the value of one argument as the default value of another. The -- operator marks the end of command line options. Everything that follows will be treated as an argument, even if starts with a dash. For example, the standard POSIX touch command, which takes a filename as an argument (and possibly other options that we'll ignore here), could be defined as: If we try to create a file named "-f" via our touch command: It will fail because the -f will be parsed as an option, not as an argument. The fix is to insert -- after all flags have been specified, so the remaining arguments are parsed as arguments instead of options as follows: This ensures the -f is parsed as an argument instead of a flag named f. This package supports nesting of commands and subcommands. Declare a top-level command by calling the Command func on the top-level App struct. For example, the following creates an application called docker that will have one command called run: The first argument is the name of the command the user will specify on the command line to invoke this command. The second argument is the description of the command shown in help messages. And, the last argument is a CmdInitializer, which is a function that receives a pointer to a Cmd struct representing the command. Within this function, define the options and arguments for the command by calling the same methods as you would with top-level App struct (BoolOpt, StringArg, ...). To execute code when the command is invoked, assign a function to the Action field of the Cmd struct. Within that function, you can safely refer to the options and arguments as command line parsing will be completed at the time the function is invoked: Optionally, to provide a more extensive description of the command, assign a string to LongDesc, which is displayed when a user invokes --help. A LongDesc can be provided for Cmds as well as the top-level App: Subcommands can be added by calling Command on the Cmd struct. They can by defined to any depth if needed: Command and subcommand aliases are also supported. To define one or more aliases, specify a space-separated list of strings to the first argument of Command: With the command structure defined above, users can invoke the app in a variety of ways: As a convenience, to assign an Action to a func with no arguments, use ActionCommand when defining the Command. For example, the following two statements are equivalent: Please note that options, arguments, specs, and long descriptions cannot be provided when using ActionCommand. This is intended for very simple command invocations that take no arguments. Finally, as a side-note, it may seem a bit weird that this package uses a function to initialize a command instead of simply returning a command struct. The motivation behind this API decision is scoping: as with the standard flag package, adding an option or an argument returns a pointer to a value which will be populated when the app is run. Since you'll want to store these pointers in variables, and to avoid having dozens of them in the same scope (the main func for example or as global variables), this API was specifically tailored to take a func parameter (called CmdInitializer), which accepts the command struct. With this design, the command's specific variables are limited in scope to this function. Interceptors, or hooks, can be defined to be executed before and after a command or when any of its subcommands are executed. For example, the following app defines multiple commands as well as a global flag which toggles verbosity: Instead of duplicating the check for the verbose flag and setting the debug level in every command (and its sub-commands), a Before interceptor can be set on the top-level App instead: Whenever a valid command is called by the user, all the Before interceptors defined on the app and the intermediate commands will be called, in order from the root to the leaf. Similarly, to execute a hook after a command has been called, e.g. to cleanup resources allocated in Before interceptors, simply set the After field of the App struct or any other Command. After interceptors will be called, in order, from the leaf up to the root (the opposite order of the Before interceptors). The following diagram shows when and in which order multiple Before and After interceptors are executed: To exit the application, use cli.Exit function, which accepts an exit code and exits the app with the provided code. It is important to use cli.Exit instead of os.Exit as the former ensures that all of the After interceptors are executed before exiting. An App or Command's invocation syntax can be customized using spec strings. This can be useful to indicate that an argument is optional or that two options are mutually exclusive. The spec string is one of the key differentiators between this package and other CLI packages as it allows the developer to express usage in a simple, familiar, yet concise grammar. To define option and argument usage for the top-level App, assign a spec string to the App's Spec field: Likewise, to define option and argument usage for a command or subcommand, assign a spec string to the Command's Spec field: The spec syntax is mostly based on the conventions used in POSIX command line applications (help messages and man pages). This syntax is described in full below. If a user invokes the app or command with the incorrect syntax, the app terminates with a help message showing the proper invocation. The remainder of this section describes the many features and capabilities of the spec string grammar. Options can use both short and long option names in spec strings. In the example below, the option is mandatory and must be provided. Any options referenced in a spec string MUST be explicitly declared, otherwise this package will panic. I.e. for each item in the spec string, a corresponding *Opt or *Arg is required: Arguments are specified with all-uppercased words. In the example below, both SRC and DST must be provided by the user (two arguments). Like options, any argument referenced in a spec string MUST be explicitly declared, otherwise this package will panic: With the exception of options, the order of the elements in a spec string is respected and enforced when command line arguments are parsed. In the example below, consecutive options (-f and -g) are parsed regardless of the order they are specified (both "-f=5 -g=6" and "-g=6 -f=5" are valid). Order between options and arguments is significant (-f and -g must appear before the SRC argument). The same holds true for arguments, where SRC must appear before DST: Optionality of options and arguments is specified in a spec string by enclosing the item in square brackets []. If the user does not provide an optional value, the app will use the default value specified when the argument was defined. In the example below, if -x is not provided, heapSize will default to 1024: Choice between two or more items is specified in a spec string by separating each choice with the | operator. Choices are mutually exclusive. In the examples below, only a single choice can be provided by the user otherwise the app will terminate displaying a help message on proper usage: Repetition of options and arguments is specified in a spec string with the ... postfix operator to mark an item as repeatable. Both options and arguments support repitition. In the example below, users may invoke the command with multiple -e options and multiple SRC arguments: Grouping of options and arguments is specified in a spec string with parenthesis. When combined with the choice | and repetition ... operators, complex syntaxes can be created. The parenthesis in the example below indicate a repeatable sequence of a -e option followed by an argument, and that is mutually exclusive to a choice between -x and -y options. Option groups, or option folding, are a shorthand method to declaring a choice between multiple options. I.e. any combination of the listed options in any order with at least one option selected. The following two statements are equivalent: Option groups are typically used in conjunction with optionality [] operators. I.e. any combination of the listed options in any order or none at all. The following two statements are equivalent: All of the options can be specified using a special syntax: [OPTIONS]. This is a special token in the spec string (not optionality and not an argument called OPTIONS). It is equivalent to an optional repeatable choice between all the available options. For example, if an app or a command declares 4 options a, b, c and d, then the following two statements are equivalent: Inline option values are specified in the spec string with the =<some-text> notation immediately following an option (long or short form) to provide users with an inline description or value. The actual inline values are ignored by the spec parser as they exist only to provide a contextual hint to the user. In the example below, "absolute-path" and "in seconds" are ignored by the parser: The -- operator can be used to automatically treat everything following it as arguments. In other words, placing a -- in the spec string automatically inserts a -- in the same position in the program call arguments. This lets you write programs such as the POSIX time utility for example: Below is the full EBNF grammar for the Specs language: By combining a few of these building blocks together (while respecting the grammar above), powerful and sophisticated validation constraints can be created in a simple and concise manner without having to define in code. This is one of the key differentiators between this package and other CLI packages. Validation of usage is handled entirely by the package through the spec string. Behind the scenes, this package parses the spec string and constructs a finite state machine used to parse the command line arguments. It also handles backtracking, which allows it to handle tricky cases, or what I like to call "the cp test": Without backtracking, this deceptively simple spec string cannot be parsed correctly. For instance, docopt can't handle this case, whereas this package does. By default an auto-generated spec string is created for the app and every command unless a spec string has been set by the user. This can simplify use of the package even further for simple syntaxes. The following logic is used to create an auto-generated spec string: 1) start with an empty spec string, 2) if at least one option was declared, append "[OPTIONS]" to the spec string, and 3) for each declared argument, append it, in the order of declaration, to the spec string. For example, given this command declaration: The auto-generated spec string, which should suffice for simple cases, would be: If additional constraints are required, the spec string must be set explicitly using the grammar documented above. By default, the following types are supported for options and arguments: bool, string, int, strings (slice of strings), and ints (slice of ints). You can, however, extend this package to handle other types, e.g. time.Duration, float64, or even your own struct types. To define your own custom type, you must implement the flag.Value interface for your custom type, and then declare the option or argument using VarOpt or VarArg respectively if using the short-form methods. If using the long-form struct, then use Var instead. The following example defines a custom type for a duration. It defines a duration argument that users will be able to invoke with strings in the form of "1h31m42s": To make a custom type to behave as a boolean option, i.e. doesn't take a value, it must implement the IsBoolFlag method that returns true: To make a custom type behave as a multi-valued option or argument, i.e. takes multiple values, it must implement the Clear method, which is called whenever the values list needs to be cleared, e.g. when the value was initially populated from an environment variable, and then explicitly set from the CLI: To hide the default value of a custom type, it must implement the IsDefault method that returns a boolean. The help message generator will use the return value to decide whether or not to display the default value to users:

v1.1.0 • 5 years ago

github.com/danilodvaz/lingua-go/v2

Package lingua accurately detects the natural language of written text, be it long or short. Its task is simple: It tells you which language some provided textual data is written in. This is very useful as a preprocessing step for linguistic data in natural language processing applications such as text classification and spell checking. Other use cases, for instance, might include routing e-mails to the right geographically located customer service department, based on the e-mails' languages. Language detection is often done as part of large machine learning frameworks or natural language processing applications. In cases where you don't need the full-fledged functionality of those systems or don't want to learn the ropes of those, a small flexible library comes in handy. So far, the only other comprehensive open source library in the Go ecosystem for this task is Whatlanggo (https://github.com/abadojack/whatlanggo). Unfortunately, it has two major drawbacks: 1. Detection only works with quite lengthy text fragments. For very short text snippets such as Twitter messages, it does not provide adequate results. 2. The more languages take part in the decision process, the less accurate are the detection results. Lingua aims at eliminating these problems. It nearly does not need any configuration and yields pretty accurate results on both long and short text, even on single words and phrases. It draws on both rule-based and statistical methods but does not use any dictionaries of words. It does not need a connection to any external API or service either. Once the library has been downloaded, it can be used completely offline. Compared to other language detection libraries, Lingua's focus is on quality over quantity, that is, getting detection right for a small set of languages first before adding new ones. Currently, 75 languages are supported. They are listed as variants of type Language. Lingua is able to report accuracy statistics for some bundled test data available for each supported language. The test data for each language is split into three parts: 1. a list of single words with a minimum length of 5 characters 2. a list of word pairs with a minimum length of 10 characters 3. a list of complete grammatical sentences of various lengths Both the language models and the test data have been created from separate documents of the Wortschatz corpora (https://wortschatz.uni-leipzig.de) offered by Leipzig University, Germany. Data crawled from various news websites have been used for training, each corpus comprising one million sentences. For testing, corpora made of arbitrarily chosen websites have been used, each comprising ten thousand sentences. From each test corpus, a random unsorted subset of 1000 single words, 1000 word pairs and 1000 sentences has been extracted, respectively. Given the generated test data, I have compared the detection results of Lingua, and Whatlanggo running over the data of Lingua's supported 75 languages. Additionally, I have added Google's CLD3 (https://github.com/google/cld3/) to the comparison with the help of the gocld3 bindings (https://github.com/jmhodges/gocld3). Languages that are not supported by CLD3 or Whatlanggo are simply ignored during the detection process. The bar and box plots (https://github.com/pemistahl/lingua-go/blob/main/ACCURACY_PLOTS.md) show the measured accuracy values for all three performed tasks: Single word detection, word pair detection and sentence detection. Lingua clearly outperforms its contenders. Detailed statistics including mean, median and standard deviation values for each language and classifier are available in tabular form (https://github.com/pemistahl/lingua-go/blob/main/ACCURACY_TABLE.md) as well. Every language detector uses a probabilistic n-gram (https://en.wikipedia.org/wiki/N-gram) model trained on the character distribution in some training corpus. Most libraries only use n-grams of size 3 (trigrams) which is satisfactory for detecting the language of longer text fragments consisting of multiple sentences. For short phrases or single words, however, trigrams are not enough. The shorter the input text is, the less n-grams are available. The probabilities estimated from such few n-grams are not reliable. This is why Lingua makes use of n-grams of sizes 1 up to 5 which results in much more accurate prediction of the correct language. A second important difference is that Lingua does not only use such a statistical model, but also a rule-based engine. This engine first determines the alphabet of the input text and searches for characters which are unique in one or more languages. If exactly one language can be reliably chosen this way, the statistical model is not necessary anymore. In any case, the rule-based engine filters out languages that do not satisfy the conditions of the input text. Only then, in a second step, the probabilistic n-gram model is taken into consideration. This makes sense because loading less language models means less memory consumption and better runtime performance. In general, it is always a good idea to restrict the set of languages to be considered in the classification process using the respective api methods. If you know beforehand that certain languages are never to occur in an input text, do not let those take part in the classifcation process. The filtering mechanism of the rule-based engine is quite good, however, filtering based on your own knowledge of the input text is always preferable. There might be classification tasks where you know beforehand that your language data is definitely not written in Latin, for instance. The detection accuracy can become better in such cases if you exclude certain languages from the decision process or just explicitly include relevant languages. Knowing about the most likely language is nice but how reliable is the computed likelihood? And how less likely are the other examined languages in comparison to the most likely one? In the example below, a slice of ConfidenceValue is returned containing all possible languages sorted by their confidence value in descending order. The values that this method computes are part of a relative confidence metric, not of an absolute one. Each value is a number between 0.0 and 1.0. The most likely language is always returned with value 1.0. All other languages get values assigned which are lower than 1.0, denoting how less likely those languages are in comparison to the most likely language. The slice returned by this method does not necessarily contain all languages which the calling instance of LanguageDetector was built from. If the rule-based engine decides that a specific language is truly impossible, then it will not be part of the returned slice. Likewise, if no ngram probabilities can be found within the detector's languages for the given input text, the returned slice will be empty. The confidence value for each language not being part of the returned slice is assumed to be 0.0. By default, Lingua uses lazy-loading to load only those language models on demand which are considered relevant by the rule-based filter engine. For web services, for instance, it is rather beneficial to preload all language models into memory to avoid unexpected latency while waiting for the service response. If you want to enable the eager-loading mode, you can do it as seen below. Multiple instances of LanguageDetector share the same language models in memory which are accessed asynchronously by the instances. By default, Lingua returns the most likely language for a given input text. However, there are certain words that are spelled the same in more than one language. The word `prologue`, for instance, is both a valid English and French word. Lingua would output either English or French which might be wrong in the given context. For cases like that, it is possible to specify a minimum relative distance that the logarithmized and summed up probabilities for each possible language have to satisfy. It can be stated as seen below. Be aware that the distance between the language probabilities is dependent on the length of the input text. The longer the input text, the larger the distance between the languages. So if you want to classify very short text phrases, do not set the minimum relative distance too high. Otherwise Unknown will be returned most of the time as in the example below. This is the return value for cases where language detection is not reliably possible.

v2.0.5 • 2 years ago

github.com/kubitre/csrf

Package csrf (gorilla/csrf) provides Cross Site Request Forgery (CSRF) prevention middleware for Go web applications & services. It includes: * The `csrf.Protect` middleware/handler provides CSRF protection on routes attached to a router or a sub-router. * A `csrf.Token` function that provides the token to pass into your response, whether that be a HTML form or a JSON response body. * ... and a `csrf.TemplateField` helper that you can pass into your `html/template` templates to replace a `{{ .csrfField }}` template tag with a hidden input field. gorilla/csrf is easy to use: add the middleware to individual handlers with the below: ... and then collect the token with `csrf.Token(r)` before passing it to the template, JSON body or HTTP header (you pick!). gorilla/csrf inspects the form body (first) and HTTP headers (second) on subsequent POST/PUT/PATCH/DELETE/etc. requests for the token. Note that the authentication key passed to `csrf.Protect([]byte(key))` should be 32-bytes long and persist across application restarts. Generating a random key won't allow you to authenticate existing cookies and will break your CSRF validation. Here's the common use-case: HTML forms you want to provide CSRF protection for, in order to protect malicious POST requests being made: Note that the CSRF middleware will (by necessity) consume the request body if the token is passed via POST form values. If you need to consume this in your handler, insert your own middleware earlier in the chain to capture the request body. You can also send the CSRF token in the response header. This approach is useful if you're using a front-end JavaScript framework like Ember or Angular, or are providing a JSON API: If you're writing a client that's supposed to mimic browser behavior, make sure to send back the CSRF cookie (the default name is _gorilla_csrf, but this can be changed with the CookieName Option) along with either the X-CSRF-Token header or the gorilla.csrf.Token form field. In addition: getting CSRF protection right is important, so here's some background: * This library generates unique-per-request (masked) tokens as a mitigation against the BREACH attack (http://breachattack.com/). * The 'base' (unmasked) token is stored in the session, which means that multiple browser tabs won't cause a user problems as their per-request token is compared with the base token. * Operates on a "whitelist only" approach where safe (non-mutating) HTTP methods (GET, HEAD, OPTIONS, TRACE) are the *only* methods where token validation is not enforced. * The design is based on the battle-tested Django (https://docs.djangoproject.com/en/1.8/ref/csrf/) and Ruby on Rails (http://api.rubyonrails.org/classes/ActionController/RequestForgeryProtection.html) approaches. * Cookies are authenticated and based on the securecookie (https://github.com/gorilla/securecookie) library. They're also Secure (issued over HTTPS only) and are HttpOnly by default, because sane defaults are important. * Go's `crypto/rand` library is used to generate the 32 byte (256 bit) tokens and the one-time-pad used for masking them. This library does not seek to be adventurous.

v1.9.1 • 4 years ago

github.com/eduncan911/go-mspec

Package mspec is a BDD context/specification testing package for Go(Lang) with a strong emphases on spec'ing your feature(s) and scenarios first, before any code is written using as little syntax noise as possible. This leaves you free to think of your project and features as a whole without the distraction of writing any code with the added benefit of having tests ready for your project. [![GoDoc](https://godoc.org/github.com/eduncan911/mspec?status.svg)](https://godoc.org/github.com/eduncan911/mspec) holds the source documentation (where else?) * Uses natural language (Given/When/Then) * Stubbing * Human-readable outputs * HTML output (coming soon) * Use custom Assertions * Configuration options * Uses Testify's rich assertions * Uses Go's built-in testing.T package Install it with one line of code: There are no external dependencies and it is built against Go's internal packages. The only dependency is that you have [GOPATH setup normaly](https://golang.org/doc/code.html). Create a new file to hold your specs. Using Dan North's original BDD definitions, you spec code using the Given/When/Then storyline similar to: But this is just a static example. Let's take a real example from one of my projects: You represent these thoughts in code like this: Note that `Given`, `when` and `it` all have optional variadic parameters. This allows you to spec things out with as little or as far as you want. That's it. Now run it: Print it out and stick it on your office door for everyone to see what you are working on. This is actually colored output in Terminal: It is not uncommon to go back and tweak your stories over time as you talk with your domain experts, modifying exactly the scenarios and specifications that should happen. `GoMSpec` is a testing package for the Go framework that extends Go's built-in testing package. It is modeled after the BDD Feature Specification story workflow such as: Currently it has an included `Expectation` struct that mimics basic assertion behaviors. Future plans may allow for custom assertion packages (like testify). Getting it Importing it Writing Specs Testing it Which outputs the following: Nice eh? There is nothing like using a testing package to test itself. There is some nice rich information available. ## Examples Be sure to check out more examples in the examples/ folder. Or just open the files and take a look. That's the most important part anyways. When evaluating several BDD frameworks, [Pranavraja's Zen](https://github.com/pranavraja/zen) package for Go came close - really close; but, it was lacking the more "story" overview I've been accustomed to over the years with [Machine.Specifications](https://github.com/machine/machine.specifications) in C# (.NET land). Do note that there is something to be said for simple testing in Go (and simple coding); therefore, if you are the type to keep it short and sweet and just code, then you may want to use Pranavraja's framework as it is just the context (Desc) and specs writing. I forked his code and submitted a few bug tweaks at first. But along the way, I started to have grand visions of my soul mate [Machine.Specifications](https://github.com/machine/machine.specifications) (which is called MSpec for short) for BDD testing. The ease of defining complete stories right down to the scenarios without having to implement them intrigued me in C#. It freed me from worrying about implementation details and just focus on the feature I was writing: What did it need to do? What context was I given to start with? What should it do? So while using Pranavraja's Zen framework, I kept asking myself: Could I bring those MSpec practices to Go, using a bare-bones framework? Ok, done. And since it was so heavily inspired by Aaron's MSpec project, I kept the name going here: `GoMSpec`. While keeping backwards compatibility with his existing Zen framework, I defined several goals for this package: * Had to stay simple with Give/When/Then definitions. No complex coding. * Keep the low syntax noise from the existing Zen package. * I had to be able to write features, scenarios and specs with no implementation details needed. That last goal above is key and I think is what speaks truly about what BDD is: focus on the story, feature and/or context you are designing - focus on the Behavior! I tended to design my C# code using Machine.Specifications in this BDD-style by writing entire stories and grand specs up front - designing the system I was building, or the feature I was extending. In C# land, it's not unheard of me hitting 50 to 100 specs across a single feature and a few different contexts in an hour or two, before writing any code. Which at that point, I had everything planned out pretty much the way it should behave. So with this framework, I came up with a simple method name, `NA()`, to keep the syntax noise down. Therefore, you are free to code specs with just a little syntax noise:

v0.0.0-20170515144820-082ea5c5111c • 7 years ago

github.com/nicman42/caddy-cgi

Package cgi implements the common gateway interface (CGI) for Caddy 2, a modern, full-featured, easy-to-use web server. It has been forked from the fantastic work of Kurt Jung who wrote that plugin for Caddy 1. This plugin lets you generate dynamic content on your website by means of command line scripts. To collect information about the inbound HTTP request, your script examines certain environment variables such as PATH_INFO and QUERY_STRING. Then, to return a dynamically generated web page to the client, your script simply writes content to standard output. In the case of POST requests, your script reads additional inbound content from standard input. The advantage of CGI is that you do not need to fuss with server startup and persistence, long term memory management, sockets, and crash recovery. Your script is called when a request matches one of the patterns that you specify in your Caddyfile. As soon as your script completes its response, it terminates. This simplicity makes CGI a perfect complement to the straightforward operation and configuration of Caddy. The benefits of Caddy, including HTTPS by default, basic access authentication, and lots of middleware options extend easily to your CGI scripts. CGI has some disadvantages. For one, Caddy needs to start a new process for each request. This can adversely impact performance and, if resources are shared between CGI applications, may require the use of some interprocess synchronization mechanism such as a file lock. Your server’s responsiveness could in some circumstances be affected, such as when your web server is hit with very high demand, when your script’s dependencies require a long startup, or when concurrently running scripts take a long time to respond. However, in many cases, such as using a pre-compiled CGI application like fossil or a Lua script, the impact will generally be insignificant. Another restriction of CGI is that scripts will be run with the same permissions as Caddy itself. This can sometimes be less than ideal, for example when your script needs to read or write files associated with a different owner. Serving dynamic content exposes your server to more potential threats than serving static pages. There are a number of considerations of which you should be aware when using CGI applications. CGI scripts should be located outside of Caddy’s document root. Otherwise, an inadvertent misconfiguration could result in Caddy delivering the script as an ordinary static resource. At best, this could merely confuse the site visitor. At worst, it could expose sensitive internal information that should not leave the server. Mistrust the contents of PATH_INFO, QUERY_STRING and standard input. Most of the environment variables available to your CGI program are inherently safe because they originate with Caddy and cannot be modified by external users. This is not the case with PATH_INFO, QUERY_STRING and, in the case of POST actions, the contents of standard input. Be sure to validate and sanitize all inbound content. If you use a CGI library or framework to process your scripts, make sure you understand its limitations. An error in a CGI application is generally handled within the application itself and reported in the headers it returns. Your CGI application can be executed directly or indirectly. In the direct case, the application can be a compiled native executable or it can be a shell script that contains as its first line a shebang that identifies the interpreter to which the file’s name should be passed. Caddy must have permission to execute the application. On Posix systems this will mean making sure the application’s ownership and permission bits are set appropriately; on Windows, this may involve properly setting up the filename extension association. In the indirect case, the name of the CGI script is passed to an interpreter such as lua, perl or python. - This module needs to be installed (obviously). - The directive needs to be registered in the Caddyfile: The basic cgi directive lets you add a handler in the current caddy router location with a given script and optional arguments. The matcher is a default caddy matcher that is used to restrict the scope of this directive. The directive can be repeated any reasonable number of times. Here is the basic syntax: For example: When a request such as https://example.com/report or https://example.com/report/weekly arrives, the cgi middleware will detect the match and invoke the script named /usr/local/cgi-bin/report. The current working directory will be the same as Caddy itself. Here, it is assumed that the script is self-contained, for example a pre-compiled CGI application or a shell script. Here is an example of a standalone script, similar to one used in the cgi plugin’s test suite: The environment variables PATH_INFO and QUERY_STRING are populated and passed to the script automatically. There are a number of other standard CGI variables included that are described below. If you need to pass any special environment variables or allow any environment variables that are part of Caddy’s process to pass to your script, you will need to use the advanced directive syntax described below. Beware that in Caddy v2 it is (currently) not possible to separate the path left of the matcher from the full URL. Therefore if you require your CGI program to know the SCRIPT_NAME, make sure to pass that explicitly: In order to specify custom environment variables, pass along one or more environment variables known to Caddy, or specify more than one match pattern for a given rule, you will need to use the advanced directive syntax. That looks like this: For example, The script_name subdirective helps the cgi module to separate the path to the script from the (virtual) path afterwards (which shall be passed to the script). env can be used to define a list of key=value environment variable pairs that shall be passed to the script. pass_env can be used to define a list of environment variables of the Caddy process that shall be passed to the script. If your CGI application runs properly at the command line but fails to run from Caddy it is possible that certain environment variables may be missing. For example, the ruby gem loader evidently requires the HOME environment variable to be set; you can do this with the subdirective pass_env HOME. Another class of problematic applications require the COMPUTERNAME variable. The pass_all_env subdirective instructs Caddy to pass each environment variable it knows about to the CGI excutable. This addresses a common frustration that is caused when an executable requires an environment variable and fails without a descriptive error message when the variable cannot be found. These applications often run fine from the command prompt but fail when invoked with CGI. The risk with this subdirective is that a lot of server information is shared with the CGI executable. Use this subdirective only with CGI applications that you trust not to leak this information. If you run into unexpected results with the CGI plugin, you are able to examine the environment in which your CGI application runs. To enter inspection mode, add the subdirective inspect to your CGI configuration block. This is a development option that should not be used in production. When in inspection mode, the plugin will respond to matching requests with a page that displays variables of interest. In particular, it will show the replacement value of {match} and the environment variables to which your CGI application has access. For example, consider this example CGI block: When you request a matching URL, for example, the Caddy server will deliver a text page similar to the following. The CGI application (in this case, wapptclsh) will not be called. This information can be used to diagnose problems with how a CGI application is called. To return to operation mode, remove or comment out the inspect subdirective. In this example, the Caddyfile looks like this: Note that a request for /show gets mapped to a script named /usr/local/cgi-bin/report/gen. There is no need for any element of the script name to match any element of the match pattern. The contents of /usr/local/cgi-bin/report/gen are: The purpose of this script is to show how request information gets communicated to a CGI script. Note that POST data must be read from standard input. In this particular case, posted data gets stored in the variable POST_DATA. Your script may use a different method to read POST content. Secondly, the SCRIPT_EXEC variable is not a CGI standard. It is provided by this middleware and contains the entire command line, including all arguments, with which the CGI script was executed. When a browser requests the response looks like When a client makes a POST request, such as with the following command the response looks the same except for the following lines: This small example demonstrates how to write a CGI program in Go. The use of a bytes.Buffer makes it easy to report the content length in the CGI header. When this program is compiled and installed as /usr/local/bin/servertime, the following directive in your Caddy file will make it available:

v1.12.0 • 3 years ago

github.com/greyeye/gosnowflake

Package gosnowflake is a pure Go Snowflake driver for the database/sql package. Clients can use the database/sql package directly. For example: Use Open to create a database handle with connection parameters: The Go Snowflake Driver supports the following connection syntaxes (or data source name formats): where all parameters must be escaped or use `Config` and `DSN` to construct a DSN string. The following example opens a database handle with the Snowflake account myaccount where the username is jsmith, password is mypassword, database is mydb, schema is testschema, and warehouse is mywh: The following connection parameters are supported: account <string>: Specifies the name of your Snowflake account, where string is the name assigned to your account by Snowflake. In the URL you received from Snowflake, your account name is the first segment in the domain (e.g. abc123 in https://abc123.snowflakecomputing.com). This parameter is optional if your account is specified after the @ character. If you are not on us-west-2 region or AWS deployment, then append the region after the account name, e.g. “<account>.<region>”. If you are not on AWS deployment, then append not only the region, but also the platform, e.g., “<account>.<region>.<platform>”. Account, region, and platform should be separated by a period (“.”), as shown above. If you are using a global url, then append connection group and "global", e.g., "account-<connection_group>.global". Account and connection group are separated by a dash ("-"), as shown above. region <string>: DEPRECATED. You may specify a region, such as “eu-central-1”, with this parameter. However, since this parameter is deprecated, it is best to specify the region as part of the account parameter. For details, see the description of the account parameter. database: Specifies the database to use by default in the client session (can be changed after login). schema: Specifies the database schema to use by default in the client session (can be changed after login). warehouse: Specifies the virtual warehouse to use by default for queries, loading, etc. in the client session (can be changed after login). role: Specifies the role to use by default for accessing Snowflake objects in the client session (can be changed after login). passcode: Specifies the passcode provided by Duo when using MFA for login. passcodeInPassword: false by default. Set to true if the MFA passcode is embedded in the login password. Appends the MFA passcode to the end of the password. loginTimeout: Specifies the timeout, in seconds, for login. The default is 60 seconds. The login request gives up after the timeout length if the HTTP response is success. authenticator: Specifies the authenticator to use for authenticating user credentials: To use the internal Snowflake authenticator, specify snowflake (Default). To authenticate through Okta, specify https://<okta_account_name>.okta.com (URL prefix for Okta). To authenticate using your IDP via a browser, specify externalbrowser. To authenticate via OAuth, specify oauth and provide an OAuth Access Token (see the token parameter below). application: Identifies your application to Snowflake Support. insecureMode: false by default. Set to true to bypass the Online Certificate Status Protocol (OCSP) certificate revocation check. IMPORTANT: Change the default value for testing or emergency situations only. token: a token that can be used to authenticate. Should be used in conjunction with the "oauth" authenticator. client_session_keep_alive: Set to true have a heartbeat in the background every hour to keep the connection alive such that the connection session will never expire. Care should be taken in using this option as it opens up the access forever as long as the process is alive. ocspFailOpen: true by default. Set to false to make OCSP check fail closed mode. validateDefaultParameters: true by default. Set to false to disable checks on existence and privileges check for Database, Schema, Warehouse and Role when setting up the connection All other parameters are taken as session parameters. For example, TIMESTAMP_OUTPUT_FORMAT session parameter can be set by adding: The Go Snowflake Driver honors the environment variables HTTP_PROXY, HTTPS_PROXY and NO_PROXY for the forward proxy setting. NO_PROXY specifies which hostname endings should be allowed to bypass the proxy server, e.g. :code:`no_proxy=.amazonaws.com` means that AWS S3 access does not need to go through the proxy. NO_PROXY does not support wildcards. Each value specified should be one of the following: The end of a hostname (or a complete hostname), for example: ".amazonaws.com" or "xy12345.snowflakecomputing.com". An IP address, for example "192.196.1.15". If more than one value is specified, values should be separated by commas, for example: By default, the driver's builtin logger is NOP; no output is generated. This is intentional for those applications that use the same set of logger parameters not to conflict with glog, which is incorporated in the driver logging framework. In order to enable debug logging for the driver, add a build tag sfdebug to the go tool command lines, for example: For tests, run the test command with the tag along with glog parameters. For example, the following command will generate all acitivty logs in the standard error. Likewise, if you build your application with the tag, you may specify the same set of glog parameters. To get the logs for a specific module, use the -vmodule option. For example, to retrieve the driver.go and connection.go module logs: Note: If your request retrieves no logs, call db.Close() or glog.flush() to flush the glog buffer. Note: The logger may be changed in the future for better logging. Currently if the applications use the same parameters as glog, you cannot collect both application and driver logs at the same time. From 0.5.0, a signal handling responsibility has moved to the applications. If you want to cancel a query/command by Ctrl+C, add a os.Interrupt trap in context to execute methods that can take the context parameter, e.g., QueryContext, ExecContext. See cmd/selectmany.go for the full example. Queries return SQL column type information in the ColumnType type. The DatabaseTypeName method returns the following strings representing Snowflake data types: Go's database/sql package limits Go's data types to the following for binding and fetching: Fetching data isn't an issue since the database data type is provided along with the data so the Go Snowflake Driver can translate Snowflake data types to Go native data types. When the client binds data to send to the server, however, the driver cannot determine the date/timestamp data types to associate with binding parameters. For example: To resolve this issue, a binding parameter flag is introduced that associates any subsequent time.Time type to the DATE, TIME, TIMESTAMP_LTZ, TIMESTAMP_NTZ or BINARY data type. The above example could be rewritten as follows: The driver fetches TIMESTAMP_TZ (timestamp with time zone) data using the offset-based Location types, which represent a collection of time offsets in use in a geographical area, such as CET (Central European Time) or UTC (Coordinated Universal Time). The offset-based Location data is generated and cached when a Go Snowflake Driver application starts, and if the given offset is not in the cache, it is generated dynamically. Currently, Snowflake doesn't support the name-based Location types, e.g., America/Los_Angeles. For more information about Location types, see the Go documentation for https://golang.org/pkg/time/#Location. Internally, this feature leverages the []byte data type. As a result, BINARY data cannot be bound without the binding parameter flag. In the following example, sf is an alias for the gosnowflake package: The driver directly downloads a result set from the cloud storage if the size is large. It is required to shift workloads from the Snowflake database to the clients for scale. The download takes place by goroutine named "Chunk Downloader" asynchronously so that the driver can fetch the next result set while the application can consume the current result set. The application may change the number of result set chunk downloader if required. Note this doesn't help reduce memory footprint by itself. Consider Custom JSON Decoder. Experimental: Custom JSON Decoder for parsing Result Set The application may have the driver use a custom JSON decoder that incrementally parses the result set as follows. This option will reduce the memory footprint to half or even quarter, but it can significantly degrade the performance depending on the environment. The test cases running on Travis Ubuntu box show five times less memory footprint while four times slower. Be cautious when using the option. (Private Preview) JWT authentication ** Not recommended for production use until GA Now JWT token is supported when compiling with a golang version of 1.10 or higher. Binary compiled with lower version of golang would return an error at runtime when users try to use JWT authentication feature. To enable this feature, one can construct DSN with fields "authenticator=SNOWFLAKE_JWT&privateKey=<your_private_key>", or using Config structure specifying: The <your_private_key> should be a base64 URL encoded PKCS8 rsa private key string. One way to encode a byte slice to URL base 64 URL format is through base64.URLEncoding.EncodeToString() function. On the server side, one can alter the public key with the SQL command: The <your_public_key> should be a base64 Standard encoded PKI public key string. One way to encode a byte slice to base 64 Standard format is through base64.StdEncoding.EncodeToString() function. To generate the valid key pair, one can do the following command on the shell script: GET and PUT operations are unsupported.

v1.6.17 • 2 years ago

gitlab.com/hall/go-p9p

Package p9p implements a compliant 9P2000 client and server library for use in modern, production Go services. This package differentiates itself in that is has departed from the plan 9 implementation primitives and better follows idiomatic Go style. The package revolves around the session type, which is an enumeration of raw 9p message calls. A few calls, such as flush and version, have been elided, defering their usage to the server implementation. Sessions can be trivially proxied through clients and servers. The best place to get started is with Serve. Serve can be provided a connection and a handler. A typical implementation will call Serve as part of a listen/accept loop. As each network connection is created, Serve can be called with a handler for the specific connection. The handler can be implemented with a Session via the Dispatch function or can generate sessions for dispatch in response to client messages. (See cmd/9ps for an example) On the client side, NewSession provides a 9p session from a connection. After a version negotiation, methods can be called on the session, in parallel, and calls will be sent over the connection. Call timeouts can be controlled via the context provided to each method call. This package has the beginning of a nice client-server framework for working with 9p. Some of the abstractions aren't entirely fleshed out, but most of this can center around the Handler. Missing from this are a number of tools for implementing 9p servers. The most glaring are directory read and walk helpers. Other, more complex additions might be a system to manage in memory filesystem trees that expose multi-user sessions. The largest difference between this package and other 9p packages is simplification of the types needed to implement a server. To avoid confusing bugs and odd behavior, the components are separated by each level of the protocol. One example is that requests and responses are separated and they no longer hold mutable state. This means that framing, transport management, encoding, and dispatching are componentized. Little work will be required to swap out encodings, transports or connection implementations. This package has been wired from top to bottom to support context-based resource management. Everything from startup to shutdown can have timeouts using contexts. Not all close methods are fully in place, but we are very close to having controlled, predictable cleanup for both servers and clients. Timeouts can be very granular or very course, depending on the context of the timeout. For example, it is very easy to set a short timeout for a stat call but a long timeout for reading data. Currently, there is not multiversion support. The hooks and functionality are in place to add multi-version support. Generally, the correct space to do this is in the codec. Types, such as Dir, simply need to be extended to support the possibility of extra fields. The real question to ask here is what is the role of the version number in the 9p protocol. It really comes down to the level of support required. Do we just need it at the protocol level, or do handlers and sessions need to be have differently based on negotiated versions? This package has a number of TODOs to make it easier to use. Most of the existing code provides a solid base to work from. Don't be discouraged by the sawdust. In addition, the testing is embarassingly lacking. With time, we can get full testing going and ensure we have confidence in the implementation.

v0.0.0-20190115050131-196dc12268f3 • 5 years ago

github.com/chirayu/lingua-go

Package lingua accurately detects the natural language of written text, be it long or short. Its task is simple: It tells you which language some provided textual data is written in. This is very useful as a preprocessing step for linguistic data in natural language processing applications such as text classification and spell checking. Other use cases, for instance, might include routing e-mails to the right geographically located customer service department, based on the e-mails' languages. Language detection is often done as part of large machine learning frameworks or natural language processing applications. In cases where you don't need the full-fledged functionality of those systems or don't want to learn the ropes of those, a small flexible library comes in handy. So far, the only other comprehensive open source library in the Go ecosystem for this task is Whatlanggo (https://github.com/abadojack/whatlanggo). Unfortunately, it has two major drawbacks: 1. Detection only works with quite lengthy text fragments. For very short text snippets such as Twitter messages, it does not provide adequate results. 2. The more languages take part in the decision process, the less accurate are the detection results. Lingua aims at eliminating these problems. It nearly does not need any configuration and yields pretty accurate results on both long and short text, even on single words and phrases. It draws on both rule-based and statistical methods but does not use any dictionaries of words. It does not need a connection to any external API or service either. Once the library has been downloaded, it can be used completely offline. Compared to other language detection libraries, Lingua's focus is on quality over quantity, that is, getting detection right for a small set of languages first before adding new ones. Currently, 75 languages are supported. They are listed as variants of type Language. Lingua is able to report accuracy statistics for some bundled test data available for each supported language. The test data for each language is split into three parts: 1. a list of single words with a minimum length of 5 characters 2. a list of word pairs with a minimum length of 10 characters 3. a list of complete grammatical sentences of various lengths Both the language models and the test data have been created from separate documents of the Wortschatz corpora (https://wortschatz.uni-leipzig.de) offered by Leipzig University, Germany. Data crawled from various news websites have been used for training, each corpus comprising one million sentences. For testing, corpora made of arbitrarily chosen websites have been used, each comprising ten thousand sentences. From each test corpus, a random unsorted subset of 1000 single words, 1000 word pairs and 1000 sentences has been extracted, respectively. Given the generated test data, I have compared the detection results of Lingua, and Whatlanggo running over the data of Lingua's supported 75 languages. Additionally, I have added Google's CLD3 (https://github.com/google/cld3/) to the comparison with the help of the gocld3 bindings (https://github.com/jmhodges/gocld3). Languages that are not supported by CLD3 or Whatlanggo are simply ignored during the detection process. The bar and box plots (https://github.com/pemistahl/lingua-go/blob/main/ACCURACY_PLOTS.md) show the measured accuracy values for all three performed tasks: Single word detection, word pair detection and sentence detection. Lingua clearly outperforms its contenders. Detailed statistics including mean, median and standard deviation values for each language and classifier are available in tabular form (https://github.com/pemistahl/lingua-go/blob/main/ACCURACY_TABLE.md) as well. Every language detector uses a probabilistic n-gram (https://en.wikipedia.org/wiki/N-gram) model trained on the character distribution in some training corpus. Most libraries only use n-grams of size 3 (trigrams) which is satisfactory for detecting the language of longer text fragments consisting of multiple sentences. For short phrases or single words, however, trigrams are not enough. The shorter the input text is, the less n-grams are available. The probabilities estimated from such few n-grams are not reliable. This is why Lingua makes use of n-grams of sizes 1 up to 5 which results in much more accurate prediction of the correct language. A second important difference is that Lingua does not only use such a statistical model, but also a rule-based engine. This engine first determines the alphabet of the input text and searches for characters which are unique in one or more languages. If exactly one language can be reliably chosen this way, the statistical model is not necessary anymore. In any case, the rule-based engine filters out languages that do not satisfy the conditions of the input text. Only then, in a second step, the probabilistic n-gram model is taken into consideration. This makes sense because loading less language models means less memory consumption and better runtime performance. In general, it is always a good idea to restrict the set of languages to be considered in the classification process using the respective api methods. If you know beforehand that certain languages are never to occur in an input text, do not let those take part in the classifcation process. The filtering mechanism of the rule-based engine is quite good, however, filtering based on your own knowledge of the input text is always preferable. There might be classification tasks where you know beforehand that your language data is definitely not written in Latin, for instance. The detection accuracy can become better in such cases if you exclude certain languages from the decision process or just explicitly include relevant languages. Knowing about the most likely language is nice but how reliable is the computed likelihood? And how less likely are the other examined languages in comparison to the most likely one? In the example below, a slice of ConfidenceValue is returned containing all possible languages sorted by their confidence value in descending order. The values that this method computes are part of a relative confidence metric, not of an absolute one. Each value is a number between 0.0 and 1.0. The most likely language is always returned with value 1.0. All other languages get values assigned which are lower than 1.0, denoting how less likely those languages are in comparison to the most likely language. The slice returned by this method does not necessarily contain all languages which the calling instance of LanguageDetector was built from. If the rule-based engine decides that a specific language is truly impossible, then it will not be part of the returned slice. Likewise, if no ngram probabilities can be found within the detector's languages for the given input text, the returned slice will be empty. The confidence value for each language not being part of the returned slice is assumed to be 0.0. By default, Lingua uses lazy-loading to load only those language models on demand which are considered relevant by the rule-based filter engine. For web services, for instance, it is rather beneficial to preload all language models into memory to avoid unexpected latency while waiting for the service response. If you want to enable the eager-loading mode, you can do it as seen below. Multiple instances of LanguageDetector share the same language models in memory which are accessed asynchronously by the instances. By default, Lingua returns the most likely language for a given input text. However, there are certain words that are spelled the same in more than one language. The word `prologue`, for instance, is both a valid English and French word. Lingua would output either English or French which might be wrong in the given context. For cases like that, it is possible to specify a minimum relative distance that the logarithmized and summed up probabilities for each possible language have to satisfy. It can be stated as seen below. Be aware that the distance between the language probabilities is dependent on the length of the input text. The longer the input text, the larger the distance between the languages. So if you want to classify very short text phrases, do not set the minimum relative distance too high. Otherwise Unknown will be returned most of the time as in the example below. This is the return value for cases where language detection is not reliably possible.

v1.0.0 • 3 years ago

github.com/allenluce/ginkgo

github.com/SamWhited/toml

github.com/eblume/proto

github.com/manosnoam/ginkgo

github.com/rleighton/gosnowflake

github.com/bilus/scenarigo

github.com/draft6/mow.cli

github.com/danilodvaz/lingua-go/v2

github.com/vayan/gomega

github.com/cxzgb123/toml

github.com/tmsong/toml

github.com/gmatty/toml

github.com/vitaliy-muminov/godog

github.com/ds-test-framework/model-checker

github.com/S3cuRiTy-Er1C/httpexpect

github.com/devnev/testeach

github.com/kubitre/csrf

github.com/cat2neat/gtcp

github.com/eduncan911/go-mspec

github.com/pmcgrath/godog

github.com/gtourkas/httpexpect

github.com/sneal/ginkgo

github.com/ClareChu/hiboot

github.com/nicman42/caddy-cgi

github.com/remogatto/gorgasm

github.com/motomux/toml-1

github.com/zendesk/httpexpect

github.com/lexandro/godog

github.com/khoad/godog

github.com/frankdmartinez/st

github.com/fpbear/toml

github.com/markgunnels/health

github.com/eahydra/gomega

github.com/bmartynov/fx

github.com/ethan-202285/hiboot

github.com/sha1sum/ginkgo

github.com/greyeye/gosnowflake

github.com/shaneutt/kubernetes-testing-framework

github.com/maps90/godog

github.com/magicmoose/ginkgo

bitbucket.org/airtm/terraform-testing-framework

gitlab.com/hall/go-p9p

github.com/ansd/gomega

github.com/chirayu/lingua-go

github.com/libkermit/requirement

github.com/thatgerber/testcli

github.com/johanhugg/gnomock

github.com/lexandro/go-assert

github.com/vancelongwill/snapshot

github.com/xeger/ginkgo