Statistics

github.com/spacemonkeygo/monkit/v3

Package monkit is a flexible code instrumenting and data collection library. I'm going to try and sell you as fast as I can on this library. Example usage We've got tools that capture distribution information (including quantiles) about int64, float64, and bool types. We have tools that capture data about events (we've got meters for deltas, rates, etc). We have rich tools for capturing information about tasks and functions, and literally anything that can generate a name and a number. Almost just as importantly, the amount of boilerplate and code you have to write to get these features is very minimal. Data that's hard to measure probably won't get measured. This data can be collected and sent to Graphite (http://graphite.wikidot.com/) or any other time-series database. Here's a selection of live stats from one of our storage nodes: This library generates call graphs of your live process for you. These call graphs aren't created through sampling. They're full pictures of all of the interesting functions you've annotated, along with quantile information about their successes, failures, how often they panic, return an error (if so instrumented), how many are currently running, etc. The data can be returned in dot format, in json, in text, and can be about just the functions that are currently executing, or all the functions the monitoring system has ever seen. Here's another example of one of our production nodes: https://raw.githubusercontent.com/spacemonkeygo/monkit/master/images/callgraph2.png This library generates trace graphs of your live process for you directly, without requiring standing up some tracing system such as Zipkin (though you can do that too). Inspired by Google's Dapper (http://research.google.com/pubs/pub36356.html) and Twitter's Zipkin (http://zipkin.io), we have process-internal trace graphs, triggerable by a number of different methods. You get this trace information for free whenever you use Go contexts (https://blog.golang.org/context) and function monitoring. The output formats are svg and json. Additionally, the library supports trace observation plugins, and we've written a plugin that sends this data to Zipkin (http://github.com/spacemonkeygo/monkit-zipkin). https://raw.githubusercontent.com/spacemonkeygo/monkit/master/images/trace.png Before our crazy Go rewrite of everything (https://www.spacemonkey.com/blog/posts/go-space-monkey) (and before we had even seen Google's Dapper paper), we were a Python shop, and all of our "interesting" functions were decorated with a helper that collected timing information and sent it to Graphite. When we transliterated to Go, we wanted to preserve that functionality, so the first version of our monitoring package was born. Over time it started to get janky, especially as we found Zipkin and started adding tracing functionality to it. We rewrote all of our Go code to use Google contexts, and then realized we could get call graph information. We decided a refactor and then an all-out rethinking of our monitoring package was best, and so now we have this library. Sometimes you really want callstack contextual information without having to pass arguments through everything on the call stack. In other languages, many people implement this with thread-local storage. Example: let's say you have written a big system that responds to user requests. All of your libraries log using your log library. During initial development everything is easy to debug, since there's low user load, but now you've scaled and there's OVER TEN USERS and it's kind of hard to tell what log lines were caused by what. Wouldn't it be nice to add request ids to all of the log lines kicked off by that request? Then you could grep for all log lines caused by a specific request id. Geez, it would suck to have to pass all contextual debugging information through all of your callsites. Google solved this problem by always passing a context.Context interface through from call to call. A Context is basically just a mapping of arbitrary keys to arbitrary values that users can add new values for. This way if you decide to add a request context, you can add it to your Context and then all callsites that decend from that place will have the new data in their contexts. It is admittedly very verbose to add contexts to every function call. Painfully so. I hope to write more about it in the future, but Google also wrote up their thoughts about it (https://blog.golang.org/context), which you can go read. For now, just swallow your disgust and let's keep moving. Let's make a super simple Varnish (https://www.varnish-cache.org/) clone. Open up gedit! (Okay just kidding, open whatever text editor you want.) For this motivating program, we won't even add the caching, though there's comments for where to add it if you'd like. For now, let's just make a barebones system that will proxy HTTP requests. We'll call it VLite, but maybe we should call it VReallyLite. Run and build this and open localhost:8080 in your browser. If you use the default proxy target, it should inform you that the world hasn't been destroyed yet. The first thing you'll want to do is add the small amount of boilerplate to make the instrumentation we're going to add to your process observable later. Import the basic monkit packages: and then register environmental statistics and kick off a goroutine in your main method to serve debug requests: Rebuild, and then check out localhost:9000/stats (or localhost:9000/stats/json, if you prefer) in your browser! Remember what I said about Google's contexts (https://blog.golang.org/context)? It might seem a bit overkill for such a small project, but it's time to add them. To help out here, I've created a library that constructs contexts for you for incoming HTTP requests. Nothing that's about to happen requires my webhelp library (https://godoc.org/github.com/jtolds/webhelp), but here is the code now refactored to receive and pass contexts through our two per-request calls. You can create a new context for a request however you want. One reason to use something like webhelp is that the cancelation feature of Contexts is hooked up to the HTTP request getting canceled. Let's start to get statistics about how many requests we receive! First, this package (main) will need to get a monitoring Scope. Add this global definition right after all your imports, much like you'd create a logger with many logging libraries: Now, make the error return value of HandleHTTP named (so, (err error)), and add this defer line as the very first instruction of HandleHTTP: Let's also add the same line (albeit modified for the lack of error) to Proxy, replacing &err with nil: You should now have something like: We'll unpack what's going on here, but for now: For this new funcs dataset, if you want a graph, you can download a dot graph at localhost:9000/funcs/dot and json information from localhost:9000/funcs/json. You should see something like: with a similar report for the Proxy method, or a graph like: https://raw.githubusercontent.com/spacemonkeygo/monkit/master/images/handlehttp.png This data reports the overall callgraph of execution for known traces, along with how many of each function are currently running, the most running concurrently (the highwater), how many were successful along with quantile timing information, how many errors there were (with quantile timing information if applicable), and how many panics there were. Since the Proxy method isn't capturing a returned err value, and since HandleHTTP always returns nil, this example won't ever have failures. If you're wondering about the success count being higher than you expected, keep in mind your browser probably requested a favicon.ico. Cool, eh? How it works is an interesting line of code - there's three function calls. If you look at the Go spec, all of the function calls will run at the time the function starts except for the very last one. The first function call, mon.Task(), creates or looks up a wrapper around a Func. You could get this yourself by requesting mon.Func() inside of the appropriate function or mon.FuncNamed(). Both mon.Task() and mon.Func() are inspecting runtime.Caller to determine the name of the function. Because this is a heavy operation, you can actually store the result of mon.Task() and reuse it somehow else if you prefer, so instead of you could instead use which is more performant every time after the first time. runtime.Caller only gets called once. Careful! Don't use the same myFuncMon in different functions unless you want to screw up your statistics! The second function call starts all the various stop watches and bookkeeping to keep track of the function. It also mutates the context pointer it's given to extend the context with information about what current span (in Zipkin parlance) is active. Notably, you *can* pass nil for the context if you really don't want a context. You just lose callgraph information. The last function call stops all the stop watches ad makes a note of any observed errors or panics (it repanics after observing them). Turns out, we don't even need to change our program anymore to get rich tracing information! Open your browser and go to localhost:9000/trace/svg?regex=HandleHTTP. It won't load, and in fact, it's waiting for you to open another tab and refresh localhost:8080 again. Once you retrigger the actual application behavior, the trace regex will capture a trace starting on the first function that matches the supplied regex, and return an svg. Go back to your first tab, and you should see a relatively uninteresting but super promising svg. Let's make the trace more interesting. Add a to your HandleHTTP method, rebuild, and restart. Load localhost:8080, then start a new request to your trace URL, then reload localhost:8080 again. Flip back to your trace, and you should see that the Proxy method only takes a portion of the time of HandleHTTP! https://cdn.rawgit.com/spacemonkeygo/monkit/master/images/trace.svg There's multiple ways to select a trace. You can select by regex using the preselect method (default), which first evaluates the regex on all known functions for sanity checking. Sometimes, however, the function you want to trace may not yet be known to monkit, in which case you'll want to turn preselection off. You may have a bad regex, or you may be in this case if you get the error "Bad Request: regex preselect matches 0 functions." Another way to select a trace is by providing a trace id, which we'll get to next! Make sure to check out what the addition of the time.Sleep call did to the other reports. It's easy to write plugins for monkit! Check out our first one that exports data to Zipkin (http://zipkin.io/)'s Scribe API: https://github.com/spacemonkeygo/monkit-zipkin We plan to have more (for HTrace, OpenTracing, etc, etc), soon!

v3.0.24 • 9 months ago

github.com/hazelcast/hazelcast-go-client

Package hazelcast provides the Hazelcast Go client. Hazelcast is an open-source distributed in-memory data store and computation platform. It provides a wide variety of distributed data structures and concurrency primitives. Hazelcast Go client is a way to communicate to Hazelcast IMDG clusters and access the cluster data. If you are using Hazelcast and Go Client on the same computer, generally the default configuration should be fine. This is great for trying out the client. However, if you run the client on a different computer than any of the cluster members, you may need to do some simple configurations such as specifying the member addresses. The Hazelcast members and clients have their own configuration options. You may need to reflect some of the member side configurations on the client side to properly connect to the cluster. In order to configure the client, you only need to create a new `hazelcast.Config{}`, which you can pass to `hazelcast.StartNewClientWithConnfig` function: Calling hazelcast.StartNewClientWithConfig with the default configuration is equivalent to hazelcast.StartNewClient. The default configuration assumes Hazelcast is running at localhost:5701 with the cluster name set to dev. If you run Hazelcast members in a different server than the client, you need to make certain changes to client settings. Assuming Hazelcast members are running at hz1.server.com:5701, hz2.server.com:5701 and hz3.server.com:5701 with cluster name production, you would use the configuration below. Note that addresses must include port numbers: You can also load configuration from JSON: If you are changing several options in a configuration section, you may have to repeatedly specify the configuration section: You can simplify the code above by getting a reference to config.Cluster and update it: Note that you should get a reference to the configuration section you are updating, otherwise you would update a copy of it, which doesn't modify the configuration. There are a few options that require a duration, such as config.Cluster.HeartbeatInterval, config.Cluster.Network.ConnectionTimeout and others. You must use types.Duration instead of time.Duration with those options, since types.Duration values support human readable durations when deserialized from text: That corresponds to the following JSON configuration. Refer to https://golang.org/pkg/time/#ParseDuration for the available duration strings: Here are all configuration items with their default values: Checkout the nearcache package for the documentation about the Near Cache. You can listen to creation and destroy events for distributed objects by attaching a listener to the client. A distributed object is created when first referenced unless it already exists. Here is an example: If you don't want to receive any distributed object events, use client.RemoveDistributedObjectListener: Running SQL queries require Hazelcast 5.0 and up. Check out the Hazelcast SQL documentation here: https://docs.hazelcast.com/hazelcast/latest/sql/sql-overview The SQL support should be enabled in Hazelcast server configuration: The client supports two kinds of queries: The ones returning rows (select statements and a few others) and the rest (insert, update, etc.). The former kinds of queries are executed with QuerySQL method and the latter ones are executed with ExecSQL method. Use the question mark (?) for placeholders. To connect to a data source and query it as if it is a table, a mapping should be created. Currently, mappings for Map, Kafka and file data sources are supported. You can read the details about mappings here: https://docs.hazelcast.com/hazelcast/latest/sql/sql-overview#mappings The following data types are supported when inserting/updating. The names in parantheses correspond to SQL types: Using Date/Time In order to force using a specific date/time type, create a time.Time value and cast it to the target type: Hazelcast Management Center can monitor your clients if client-side statistics are enabled. You can enable statistics by setting config.Stats.Enabled to true. Optionally, the period of statistics collection can be set using config.Stats.Period setting. The labels set in configuration appear in the Management Center console:

v1.4.2 • last year

github.com/mna/redisc

Package redisc implements a redis cluster client on top of the redigo client package. It supports all commands that can be executed on a redis cluster, including pub-sub, scripts and read-only connections to read data from replicas. See http://redis.io/topics/cluster-spec for details. The package defines two main types: Cluster and Conn. Both are described in more details below, but the Cluster manages the mapping of keys (or more exactly, hash slots computed from keys) to a group of nodes that form a redis cluster, and a Conn manages a connection to this cluster. The package is designed such that for simple uses, or when keys have been carefully named to play well with a redis cluster, a Cluster value can be used as a drop-in replacement for a redis.Pool from the redigo package. Similarly, the Conn type implements redigo's redis.Conn interface (and the augmented redis.ConnWithTimeout one too), so the API to execute commands is the same - in fact the redisc package uses the redigo package as its only third-party dependency. When more control is needed, the package offers some extra behaviour specific to working with a redis cluster: Slot and SplitBySlot functions to compute the slot for a given key and to split a list of keys into groups of keys from the same slot, so that each group can safely be handled using the same connection. *Conn.Bind (or the BindConn package-level helper function) to explicitly specify the keys that will be used with the connection so that the right node is selected, instead of relying on the automatic detection based on the first parameter of the command. *Conn.ReadOnly (or the ReadOnlyConn package-level helper function) to mark a connection as read-only, allowing commands to be served by a replica instead of the master. RetryConn to wrap a connection into one that automatically follows redirections when the cluster moves slots around. Helper functions to deal with cluster-specific errors. The Cluster type manages a redis cluster and offers an interface compatible with redigo's redis.Pool: Along with some additional methods specific to a cluster: If the CreatePool function field is set, then a redis.Pool is created to manage connections to each of the cluster's nodes. A call to Get returns a connection from that pool. The Dial method, on the other hand, guarantees that the returned connection will not be managed by a pool, even if CreatePool is set. It calls redigo's redis.Dial function to create the unpooled connection, passing along any DialOptions set on the cluster. If the cluster's CreatePool field is nil, Get behaves the same as Dial. The Refresh method refreshes the cluster's internal mapping of hash slots to nodes. It should typically be called only once, after the cluster is created and before it is used, so that the first connections already benefit from smart routing. It is automatically kept up-to-date based on the redis MOVED responses afterwards. The EachNode method visits each node in the cluster and calls the provided function with a connection to that node, which may be useful to run diagnostics commands on each node or to collect keys across the whole cluster. The Stats method returns the pool statistics for each node, with the node's address as key of the map. A cluster must be closed once it is no longer used to release its resources. The connection returned from Get or Dial is a redigo redis.Conn interface (that also implements redis.ConnWithTimeout), with a concrete type of *Conn. In addition to the interface's required methods, *Conn adds the following methods: The returned connection is not yet connected to any node; it is "bound" to a specific node only when a call to Do, Send, Receive or Bind is made. For Do, Send and Receive, the node selection is implicit, it uses the first parameter of the command, and computes the hash slot assuming that first parameter is a key. It then binds the connection to the node corresponding to that slot. If there are no parameters for the command, or if there is no command (e.g. in a call to Receive), a random node is selected. Bind is explicit, it gives control to the caller over which node to select by specifying a list of keys that the caller wishes to handle with the connection. All keys must belong to the same slot, and the connection must not already be bound to a node, otherwise an error is returned. On success, the connection is bound to the node holding the slot of the specified key(s). Because the connection is returned as a redis.Conn interface, a type assertion must be used to access the underlying *Conn and to be able to call Bind: The BindConn package-level function is provided as a helper for this common use-case. The ReadOnly method marks the connection as read-only, meaning that it will attempt to connect to a replica instead of the master node for its slot. Once bound to a node, the READONLY redis command is sent automatically, so it doesn't have to be sent explicitly before use. ReadOnly must be called before the connection is bound to a node, otherwise an error is returned. For the same reason as for Bind, a type assertion must be used to call ReadOnly on a *Conn, so a package-level helper function is also provided, ReadOnlyConn. There is no ReadWrite method, because it can be sent as a normal redis command and will essentially end that connection (all commands will now return MOVED errors). If the connection was wrapped in a RetryConn call, then it will automatically follow the redirection to the master node (see the Redirections section). The connection must be closed after use, to release the underlying resources. The redis cluster may return MOVED and ASK errors when the node that received the command doesn't currently hold the slot corresponding to the key. The package cannot reliably handle those redirections automatically because the redirection error may be returned for a pipeline of commands, some of which may have succeeded. However, a connection can be wrapped by a call to RetryConn, which returns a redis.Conn interface where only calls to Do, Close and Err can succeed. That means pipelining is not supported, and only a single command can be executed at a time, but it will automatically handle MOVED and ASK replies, as well as TRYAGAIN errors. Note that even if RetryConn is not used, the cluster always updates its mapping of slots to nodes automatically by keeping track of MOVED replies. The concurrency model is similar to that of the redigo package: Cluster methods are safe to call concurrently (like redis.Pool). Connections do not support concurrent calls to write methods (Send, Flush) or concurrent calls to the read method (Receive). Connections do allow a concurrent reader and writer. Because the Do method combines the functionality of Send, Flush and Receive, it cannot be called concurrently with other methods. The Bind and ReadOnly methods are safe to call concurrently, but there is not much point in doing so for as both will fail if the connection is already bound. Create and use a cluster.

v1.4.0 • 2 years ago

github.com/decred/dcrd/peer

Package peer provides a common base for creating and managing Decred network peers. This package builds upon the wire package, which provides the fundamental primitives necessary to speak the Decred wire protocol, in order to simplify the process of creating fully functional peers. In essence, it provides a common base for creating concurrent safe fully validating nodes, Simplified Payment Verification (SPV) nodes, proxies, etc. A quick overview of the major features peer provides are as follows: All peer configuration is handled with the Config struct. This allows the caller to specify things such as the user agent name and version, the decred network to use, which services it supports, and callbacks to invoke when decred messages are received. See the documentation for each field of the Config struct for more details. A peer can either be inbound or outbound. The caller is responsible for establishing the connection to remote peers and listening for incoming peers. This provides high flexibility for things such as connecting via proxies, acting as a proxy, creating bridge peers, choosing whether to listen for inbound peers, etc. NewOutboundPeer and NewInboundPeer functions must be followed by calling Connect with a net.Conn instance to the peer. This will start all async I/O goroutines and initiate the protocol negotiation process. Once finished with the peer call Disconnect to disconnect from the peer and clean up all resources. WaitForDisconnect can be used to block until peer disconnection and resource cleanup has completed. In order to do anything useful with a peer, it is necessary to react to decred messages. This is accomplished by creating an instance of the MessageListeners struct with the callbacks to be invoke specified and setting the Listeners field of the Config struct specified when creating a peer to it. For convenience, a callback hook for all of the currently supported decred messages is exposed which receives the peer instance and the concrete message type. In addition, a hook for OnRead is provided so even custom messages types for which this package does not directly provide a hook, as long as they implement the wire.Message interface, can be used. Finally, the OnWrite hook is provided, which in conjunction with OnRead, can be used to track server-wide byte counts. It is often useful to use closures which encapsulate state when specifying the callback handlers. This provides a clean method for accessing that state when callbacks are invoked. The QueueMessage function provides the fundamental means to send messages to the remote peer. As the name implies, this employs a non-blocking queue. A done channel which will be notified when the message is actually sent can optionally be specified. There are certain message types which are better sent using other functions which provide additional functionality. Of special interest are inventory messages. Rather than manually sending MsgInv messages via Queuemessage, the inventory vectors should be queued using the QueueInventory function. It employs batching and trickling along with intelligent known remote peer inventory detection and avoidance through the use of a most-recently used algorithm. In addition to the bare QueueMessage function previously described, the PushAddrMsg, PushGetBlocksMsg, PushGetHeadersMsg, and PushRejectMsg functions are provided as a convenience. While it is of course possible to create and send these message manually via QueueMessage, these helper functions provided additional useful functionality that is typically desired. For example, the PushAddrMsg function automatically limits the addresses to the maximum number allowed by the message and randomizes the chosen addresses when there are too many. This allows the caller to simply provide a slice of known addresses, such as that returned by the addrmgr package, without having to worry about the details. Next, the PushGetBlocksMsg and PushGetHeadersMsg functions will construct proper messages using a block locator and ignore back to back duplicate requests. Finally, the PushRejectMsg function can be used to easily create and send an appropriate reject message based on the provided parameters as well as optionally provides a flag to cause it to block until the message is actually sent. A snapshot of the current peer statistics can be obtained with the StatsSnapshot function. This includes statistics such as the total number of bytes read and written, the remote address, user agent, and negotiated protocol version. This package provides extensive logging capabilities through the UseLogger function which allows a slog.Logger to be specified. For example, logging at the debug level provides summaries of every message sent and received, and logging at the trace level provides full dumps of parsed messages as well as the raw message bytes using a format similar to hexdump -C. This package supports all improvement proposals supported by the wire package. (https://godoc.org/github.com/decred/dcrd/wire#hdr-Bitcoin_Improvement_Proposals) This example demonstrates the basic process for initializing and creating an outbound peer. Peers negotiate by exchanging version and verack messages. For demonstration, a simple handler for version message is attached to the peer.

v1.2.1 • 2 years ago

github.com/decred/dcrd/peer/v3

Package peer provides a common base for creating and managing Decred network peers. This package builds upon the wire package, which provides the fundamental primitives necessary to speak the Decred wire protocol, in order to simplify the process of creating fully functional peers. In essence, it provides a common base for creating concurrent safe fully validating nodes, Simplified Payment Verification (SPV) nodes, proxies, etc. A quick overview of the major features peer provides are as follows: All peer configuration is handled with the Config struct. This allows the caller to specify things such as the user agent name and version, the decred network to use, which services it supports, and callbacks to invoke when decred messages are received. See the documentation for each field of the Config struct for more details. A peer can either be inbound or outbound. The caller is responsible for establishing the connection to remote peers and listening for incoming peers. This provides high flexibility for things such as connecting via proxies, acting as a proxy, creating bridge peers, choosing whether to listen for inbound peers, etc. NewOutboundPeer and NewInboundPeer functions must be followed by calling Connect with a net.Conn instance to the peer. This will start all async I/O goroutines and initiate the protocol negotiation process. Once finished with the peer call Disconnect to disconnect from the peer and clean up all resources. WaitForDisconnect can be used to block until peer disconnection and resource cleanup has completed. In order to do anything useful with a peer, it is necessary to react to decred messages. This is accomplished by creating an instance of the MessageListeners struct with the callbacks to be invoke specified and setting the Listeners field of the Config struct specified when creating a peer to it. For convenience, a callback hook for all of the currently supported decred messages is exposed which receives the peer instance and the concrete message type. In addition, a hook for OnRead is provided so even custom messages types for which this package does not directly provide a hook, as long as they implement the wire.Message interface, can be used. Finally, the OnWrite hook is provided, which in conjunction with OnRead, can be used to track server-wide byte counts. It is often useful to use closures which encapsulate state when specifying the callback handlers. This provides a clean method for accessing that state when callbacks are invoked. The QueueMessage function provides the fundamental means to send messages to the remote peer. As the name implies, this employs a non-blocking queue. A done channel which will be notified when the message is actually sent can optionally be specified. There are certain message types which are better sent using other functions which provide additional functionality. Of special interest are inventory messages. Rather than manually sending MsgInv messages via Queuemessage, the inventory vectors should be queued using the QueueInventory function. It employs batching and trickling along with intelligent known remote peer inventory detection and avoidance through the use of a most-recently used algorithm. In addition to the bare QueueMessage function previously described, the PushAddrMsg, PushGetBlocksMsg, and PushGetHeadersMsg functions are provided as a convenience. While it is of course possible to create and send these messages manually via QueueMessage, these helper functions provided additional useful functionality that is typically desired. For example, the PushAddrMsg function automatically limits the addresses to the maximum number allowed by the message and randomizes the chosen addresses when there are too many. This allows the caller to simply provide a slice of known addresses, such as that returned by the addrmgr package, without having to worry about the details. Finally, the PushGetBlocksMsg and PushGetHeadersMsg functions will construct proper messages using a block locator and ignore back to back duplicate requests. A snapshot of the current peer statistics can be obtained with the StatsSnapshot function. This includes statistics such as the total number of bytes read and written, the remote address, user agent, and negotiated protocol version. This package provides extensive logging capabilities through the UseLogger function which allows a slog.Logger to be specified. For example, logging at the debug level provides summaries of every message sent and received, and logging at the trace level provides full dumps of parsed messages as well as the raw message bytes using a format similar to hexdump -C. This package supports all improvement proposals supported by the wire package. This example demonstrates the basic process for initializing and creating an outbound peer. Peers negotiate by exchanging version and verack messages. For demonstration, a simple handler for version message is attached to the peer.

v3.1.3 • 10 months ago

github.com/pemistahl/lingua-go

Package lingua accurately detects the natural language of written text, be it long or short. Its task is simple: It tells you which language some text is written in. This is very useful as a preprocessing step for linguistic data in natural language processing applications such as text classification and spell checking. Other use cases, for instance, might include routing e-mails to the right geographically located customer service department, based on the e-mails' languages. Language detection is often done as part of large machine learning frameworks or natural language processing applications. In cases where you don't need the full-fledged functionality of those systems or don't want to learn the ropes of those, a small flexible library comes in handy. So far, the only other comprehensive open source library in the Go ecosystem for this task is Whatlanggo (https://github.com/abadojack/whatlanggo). Unfortunately, it has two major drawbacks: 1. Detection only works with quite lengthy text fragments. For very short text snippets such as Twitter messages, it does not provide adequate results. 2. The more languages take part in the decision process, the less accurate are the detection results. Lingua aims at eliminating these problems. It nearly does not need any configuration and yields pretty accurate results on both long and short text, even on single words and phrases. It draws on both rule-based and statistical methods but does not use any dictionaries of words. It does not need a connection to any external API or service either. Once the library has been downloaded, it can be used completely offline. Compared to other language detection libraries, Lingua's focus is on quality over quantity, that is, getting detection right for a small set of languages first before adding new ones. Currently, 75 languages are supported. They are listed as variants of type Language. Lingua is able to report accuracy statistics for some bundled test data available for each supported language. The test data for each language is split into three parts: 1. a list of single words with a minimum length of 5 characters 2. a list of word pairs with a minimum length of 10 characters 3. a list of complete grammatical sentences of various lengths Both the language models and the test data have been created from separate documents of the Wortschatz corpora (https://wortschatz.uni-leipzig.de) offered by Leipzig University, Germany. Data crawled from various news websites have been used for training, each corpus comprising one million sentences. For testing, corpora made of arbitrarily chosen websites have been used, each comprising ten thousand sentences. From each test corpus, a random unsorted subset of 1000 single words, 1000 word pairs and 1000 sentences has been extracted, respectively. Given the generated test data, I have compared the detection results of Lingua, and Whatlanggo running over the data of Lingua's supported 75 languages. Additionally, I have added Google's CLD3 (https://github.com/google/cld3/) to the comparison with the help of the gocld3 bindings (https://github.com/jmhodges/gocld3). Languages that are not supported by CLD3 or Whatlanggo are simply ignored during the detection process. Lingua clearly outperforms its contenders. Every language detector uses a probabilistic n-gram (https://en.wikipedia.org/wiki/N-gram) model trained on the character distribution in some training corpus. Most libraries only use n-grams of size 3 (trigrams) which is satisfactory for detecting the language of longer text fragments consisting of multiple sentences. For short phrases or single words, however, trigrams are not enough. The shorter the input text is, the less n-grams are available. The probabilities estimated from such few n-grams are not reliable. This is why Lingua makes use of n-grams of sizes 1 up to 5 which results in much more accurate prediction of the correct language. A second important difference is that Lingua does not only use such a statistical model, but also a rule-based engine. This engine first determines the alphabet of the input text and searches for characters which are unique in one or more languages. If exactly one language can be reliably chosen this way, the statistical model is not necessary anymore. In any case, the rule-based engine filters out languages that do not satisfy the conditions of the input text. Only then, in a second step, the probabilistic n-gram model is taken into consideration. This makes sense because loading less language models means less memory consumption and better runtime performance. In general, it is always a good idea to restrict the set of languages to be considered in the classification process using the respective api methods. If you know beforehand that certain languages are never to occur in an input text, do not let those take part in the classifcation process. The filtering mechanism of the rule-based engine is quite good, however, filtering based on your own knowledge of the input text is always preferable. There might be classification tasks where you know beforehand that your language data is definitely not written in Latin, for instance. The detection accuracy can become better in such cases if you exclude certain languages from the decision process or just explicitly include relevant languages. Knowing about the most likely language is nice but how reliable is the computed likelihood? And how less likely are the other examined languages in comparison to the most likely one? In the example below, a slice of ConfidenceValue is returned containing those languages which the calling instance of LanguageDetector has been built from. The entries are sorted by their confidence value in descending order. Each value is a probability between 0.0 and 1.0. The probabilities of all languages will sum to 1.0. If the language is unambiguously identified by the rule engine, the value 1.0 will always be returned for this language. The other languages will receive a value of 0.0. By default, Lingua uses lazy-loading to load only those language models on demand which are considered relevant by the rule-based filter engine. For web services, for instance, it is rather beneficial to preload all language models into memory to avoid unexpected latency while waiting for the service response. If you want to enable the eager-loading mode, you can do it as seen below. Multiple instances of LanguageDetector share the same language models in memory which are accessed asynchronously by the instances. By default, Lingua returns the most likely language for a given input text. However, there are certain words that are spelled the same in more than one language. The word `prologue`, for instance, is both a valid English and French word. Lingua would output either English or French which might be wrong in the given context. For cases like that, it is possible to specify a minimum relative distance that the logarithmized and summed up probabilities for each possible language have to satisfy. It can be stated as seen below. Be aware that the distance between the language probabilities is dependent on the length of the input text. The longer the input text, the larger the distance between the languages. So if you want to classify very short text phrases, do not set the minimum relative distance too high. Otherwise Unknown will be returned most of the time as in the example below. This is the return value for cases where language detection is not reliably possible.

v1.4.0 • 2 years ago

github.com/decred/dcrd/peer/v2

Package peer provides a common base for creating and managing Decred network peers. This package builds upon the wire package, which provides the fundamental primitives necessary to speak the Decred wire protocol, in order to simplify the process of creating fully functional peers. In essence, it provides a common base for creating concurrent safe fully validating nodes, Simplified Payment Verification (SPV) nodes, proxies, etc. A quick overview of the major features peer provides are as follows: All peer configuration is handled with the Config struct. This allows the caller to specify things such as the user agent name and version, the decred network to use, which services it supports, and callbacks to invoke when decred messages are received. See the documentation for each field of the Config struct for more details. A peer can either be inbound or outbound. The caller is responsible for establishing the connection to remote peers and listening for incoming peers. This provides high flexibility for things such as connecting via proxies, acting as a proxy, creating bridge peers, choosing whether to listen for inbound peers, etc. NewOutboundPeer and NewInboundPeer functions must be followed by calling Connect with a net.Conn instance to the peer. This will start all async I/O goroutines and initiate the protocol negotiation process. Once finished with the peer call Disconnect to disconnect from the peer and clean up all resources. WaitForDisconnect can be used to block until peer disconnection and resource cleanup has completed. In order to do anything useful with a peer, it is necessary to react to decred messages. This is accomplished by creating an instance of the MessageListeners struct with the callbacks to be invoke specified and setting the Listeners field of the Config struct specified when creating a peer to it. For convenience, a callback hook for all of the currently supported decred messages is exposed which receives the peer instance and the concrete message type. In addition, a hook for OnRead is provided so even custom messages types for which this package does not directly provide a hook, as long as they implement the wire.Message interface, can be used. Finally, the OnWrite hook is provided, which in conjunction with OnRead, can be used to track server-wide byte counts. It is often useful to use closures which encapsulate state when specifying the callback handlers. This provides a clean method for accessing that state when callbacks are invoked. The QueueMessage function provides the fundamental means to send messages to the remote peer. As the name implies, this employs a non-blocking queue. A done channel which will be notified when the message is actually sent can optionally be specified. There are certain message types which are better sent using other functions which provide additional functionality. Of special interest are inventory messages. Rather than manually sending MsgInv messages via Queuemessage, the inventory vectors should be queued using the QueueInventory function. It employs batching and trickling along with intelligent known remote peer inventory detection and avoidance through the use of a most-recently used algorithm. In addition to the bare QueueMessage function previously described, the PushAddrMsg, PushGetBlocksMsg, PushGetHeadersMsg, and PushRejectMsg functions are provided as a convenience. While it is of course possible to create and send these message manually via QueueMessage, these helper functions provided additional useful functionality that is typically desired. For example, the PushAddrMsg function automatically limits the addresses to the maximum number allowed by the message and randomizes the chosen addresses when there are too many. This allows the caller to simply provide a slice of known addresses, such as that returned by the addrmgr package, without having to worry about the details. Next, the PushGetBlocksMsg and PushGetHeadersMsg functions will construct proper messages using a block locator and ignore back to back duplicate requests. Finally, the PushRejectMsg function can be used to easily create and send an appropriate reject message based on the provided parameters as well as optionally provides a flag to cause it to block until the message is actually sent. A snapshot of the current peer statistics can be obtained with the StatsSnapshot function. This includes statistics such as the total number of bytes read and written, the remote address, user agent, and negotiated protocol version. This package provides extensive logging capabilities through the UseLogger function which allows a slog.Logger to be specified. For example, logging at the debug level provides summaries of every message sent and received, and logging at the trace level provides full dumps of parsed messages as well as the raw message bytes using a format similar to hexdump -C. This package supports all improvement proposals supported by the wire package. This example demonstrates the basic process for initializing and creating an outbound peer. Peers negotiate by exchanging version and verack messages. For demonstration, a simple handler for version message is attached to the peer.

v2.2.1 • 2 years ago

gopkg.in/spacemonkeygo/monkit.v2

Package monkit is a flexible code instrumenting and data collection library. I'm going to try and sell you as fast as I can on this library. Example usage We've got tools that capture distribution information (including quantiles) about int64, float64, and bool types. We have tools that capture data about events (we've got meters for deltas, rates, etc). We have rich tools for capturing information about tasks and functions, and literally anything that can generate a name and a number. Almost just as importantly, the amount of boilerplate and code you have to write to get these features is very minimal. Data that's hard to measure probably won't get measured. This data can be collected and sent to Graphite (http://graphite.wikidot.com/) or any other time-series database. Here's a selection of live stats from one of our storage nodes: This library generates call graphs of your live process for you. These call graphs aren't created through sampling. They're full pictures of all of the interesting functions you've annotated, along with quantile information about their successes, failures, how often they panic, return an error (if so instrumented), how many are currently running, etc. The data can be returned in dot format, in json, in text, and can be about just the functions that are currently executing, or all the functions the monitoring system has ever seen. Here's another example of one of our production nodes: https://raw.githubusercontent.com/spacemonkeygo/monkit/master/images/callgraph2.png This library generates trace graphs of your live process for you directly, without requiring standing up some tracing system such as Zipkin (though you can do that too). Inspired by Google's Dapper (http://research.google.com/pubs/pub36356.html) and Twitter's Zipkin (http://zipkin.io), we have process-internal trace graphs, triggerable by a number of different methods. You get this trace information for free whenever you use Go contexts (https://blog.golang.org/context) and function monitoring. The output formats are svg and json. Additionally, the library supports trace observation plugins, and we've written a plugin that sends this data to Zipkin (http://github.com/spacemonkeygo/monkit-zipkin). https://raw.githubusercontent.com/spacemonkeygo/monkit/master/images/trace.png Before our crazy Go rewrite of everything (https://www.spacemonkey.com/blog/posts/go-space-monkey) (and before we had even seen Google's Dapper paper), we were a Python shop, and all of our "interesting" functions were decorated with a helper that collected timing information and sent it to Graphite. When we transliterated to Go, we wanted to preserve that functionality, so the first version of our monitoring package was born. Over time it started to get janky, especially as we found Zipkin and started adding tracing functionality to it. We rewrote all of our Go code to use Google contexts, and then realized we could get call graph information. We decided a refactor and then an all-out rethinking of our monitoring package was best, and so now we have this library. Sometimes you really want callstack contextual information without having to pass arguments through everything on the call stack. In other languages, many people implement this with thread-local storage. Example: let's say you have written a big system that responds to user requests. All of your libraries log using your log library. During initial development everything is easy to debug, since there's low user load, but now you've scaled and there's OVER TEN USERS and it's kind of hard to tell what log lines were caused by what. Wouldn't it be nice to add request ids to all of the log lines kicked off by that request? Then you could grep for all log lines caused by a specific request id. Geez, it would suck to have to pass all contextual debugging information through all of your callsites. Google solved this problem by always passing a context.Context interface through from call to call. A Context is basically just a mapping of arbitrary keys to arbitrary values that users can add new values for. This way if you decide to add a request context, you can add it to your Context and then all callsites that decend from that place will have the new data in their contexts. It is admittedly very verbose to add contexts to every function call. Painfully so. I hope to write more about it in the future, but Google also wrote up their thoughts about it (https://blog.golang.org/context), which you can go read. For now, just swallow your disgust and let's keep moving. Let's make a super simple Varnish (https://www.varnish-cache.org/) clone. Open up gedit! (Okay just kidding, open whatever text editor you want.) For this motivating program, we won't even add the caching, though there's comments for where to add it if you'd like. For now, let's just make a barebones system that will proxy HTTP requests. We'll call it VLite, but maybe we should call it VReallyLite. Run and build this and open localhost:8080 in your browser. If you use the default proxy target, it should inform you that the world hasn't been destroyed yet. The first thing you'll want to do is add the small amount of boilerplate to make the instrumentation we're going to add to your process observable later. Import the basic monkit packages: and then register environmental statistics and kick off a goroutine in your main method to serve debug requests: Rebuild, and then check out localhost:9000/stats (or localhost:9000/stats/json, if you prefer) in your browser! Remember what I said about Google's contexts (https://blog.golang.org/context)? It might seem a bit overkill for such a small project, but it's time to add them. To help out here, I've created a library that constructs contexts for you for incoming HTTP requests. Nothing that's about to happen requires my webhelp library (https://godoc.org/github.com/jtolds/webhelp), but here is the code now refactored to receive and pass contexts through our two per-request calls. You can create a new context for a request however you want. One reason to use something like webhelp is that the cancelation feature of Contexts is hooked up to the HTTP request getting canceled. Let's start to get statistics about how many requests we receive! First, this package (main) will need to get a monitoring Scope. Add this global definition right after all your imports, much like you'd create a logger with many logging libraries: Now, make the error return value of HandleHTTP named (so, (err error)), and add this defer line as the very first instruction of HandleHTTP: Let's also add the same line (albeit modified for the lack of error) to Proxy, replacing &err with nil: You should now have something like: We'll unpack what's going on here, but for now: For this new funcs dataset, if you want a graph, you can download a dot graph at localhost:9000/funcs/dot and json information from localhost:9000/funcs/json. You should see something like: with a similar report for the Proxy method, or a graph like: https://raw.githubusercontent.com/spacemonkeygo/monkit/master/images/handlehttp.png This data reports the overall callgraph of execution for known traces, along with how many of each function are currently running, the most running concurrently (the highwater), how many were successful along with quantile timing information, how many errors there were (with quantile timing information if applicable), and how many panics there were. Since the Proxy method isn't capturing a returned err value, and since HandleHTTP always returns nil, this example won't ever have failures. If you're wondering about the success count being higher than you expected, keep in mind your browser probably requested a favicon.ico. Cool, eh? How it works is an interesting line of code - there's three function calls. If you look at the Go spec, all of the function calls will run at the time the function starts except for the very last one. The first function call, mon.Task(), creates or looks up a wrapper around a Func. You could get this yourself by requesting mon.Func() inside of the appropriate function or mon.FuncNamed(). Both mon.Task() and mon.Func() are inspecting runtime.Caller to determine the name of the function. Because this is a heavy operation, you can actually store the result of mon.Task() and reuse it somehow else if you prefer, so instead of you could instead use which is more performant every time after the first time. runtime.Caller only gets called once. Careful! Don't use the same myFuncMon in different functions unless you want to screw up your statistics! The second function call starts all the various stop watches and bookkeeping to keep track of the function. It also mutates the context pointer it's given to extend the context with information about what current span (in Zipkin parlance) is active. Notably, you *can* pass nil for the context if you really don't want a context. You just lose callgraph information. The last function call stops all the stop watches ad makes a note of any observed errors or panics (it repanics after observing them). Turns out, we don't even need to change our program anymore to get rich tracing information! Open your browser and go to localhost:9000/trace/svg?regex=HandleHTTP. It won't load, and in fact, it's waiting for you to open another tab and refresh localhost:8080 again. Once you retrigger the actual application behavior, the trace regex will capture a trace starting on the first function that matches the supplied regex, and return an svg. Go back to your first tab, and you should see a relatively uninteresting but super promising svg. Let's make the trace more interesting. Add a to your HandleHTTP method, rebuild, and restart. Load localhost:8080, then start a new request to your trace URL, then reload localhost:8080 again. Flip back to your trace, and you should see that the Proxy method only takes a portion of the time of HandleHTTP! https://cdn.rawgit.com/spacemonkeygo/monkit/master/images/trace.svg There's multiple ways to select a trace. You can select by regex using the preselect method (default), which first evaluates the regex on all known functions for sanity checking. Sometimes, however, the function you want to trace may not yet be known to monkit, in which case you'll want to turn preselection off. You may have a bad regex, or you may be in this case if you get the error "Bad Request: regex preselect matches 0 functions." Another way to select a trace is by providing a trace id, which we'll get to next! Make sure to check out what the addition of the time.Sleep call did to the other reports. It's easy to write plugins for monkit! Check out our first one that exports data to Zipkin (http://zipkin.io/)'s Scribe API: https://github.com/spacemonkeygo/monkit-zipkin We plan to have more (for HTrace, OpenTracing, etc, etc), soon!

v2.0.0-20190623001553-09813957f0a8 • 6 years ago

gonum.org/v1/gonum

github.com/montanaflynn/stats

github.com/aws/aws-sdk-go-v2/service/elasticache

github.com/az4zil/statistics

github.com/henrygeorgist/go-statistics

github.com/bediverezero/statistics-duplicate-files

github.com/rumyantseva/velobike-statistics

github.com/bsaunders95/accounts-statistics-tool

github.com/jingminglang/redis_hotkey_statistics

github.com/andrewgbruce/statistics-for-data-scientists

github.com/juxuny/statistics

bitbucket.org/mike-dev/statisticservice

github.com/Mother-Earth/statistics-go

github.com/xing4git/code-line-statistics

github.com/Riqueemn/statistics

github.com/BSaunders95/accounts-statistics-tool

github.com/PacktPublishing/Statistics-for-Data-Science

github.com/vicdus/uscis-case-statistics

github.com/riqueemn/statistics

github.com/packtpublishing/statistics-for-data-science

github.com/coderguang/gzyouai/tools/workExtraStatistics

github.com/alexanders95/statistics

github.com/awcodify/statistics

github.com/mother-earth/statistics-go

github.com/mreis-ch/accounts-statistics-tool

github.com/hydrologicengineeringcenter/go-statistics

github.com/zondatw/serverless_short_url/statisticsserver

github.com/grd/stat

go-micro.dev/v4/auth/cmd/dashboard/handler/statistics

go-micro.dev/v4/client/cmd/dashboard/handler/statistics

go-micro.dev/v4/cmd/cmd/dashboard/handler/statistics

go-micro.dev/v4/broker/cmd/dashboard/handler/statistics

github.com/spacemonkeygo/monkit/v3

github.com/influxdata/influxql

github.com/keptn/keptn/statistics-service

github.com/hazelcast/hazelcast-go-client

github.com/safchain/ethtool

github.com/HenryGeorgist/go-statistics

github.com/mna/redisc

github.com/pion/rtcp

github.com/decred/dcrd/peer

github.com/vdobler/chart

github.com/decred/dcrd/peer/v3

github.com/pemistahl/lingua-go

github.com/decred/dcrd/peer/v2

github.com/james-bowman/nlp

github.com/grd/statistics

gopkg.in/spacemonkeygo/monkit.v2

gitlab.torproject.org/tpo/anti-censorship/geoip

github.com/statisticsnorway/dapla-cli