Package face implements face recognition for Go using dlib, a popular machine learning toolkit. This example shows the basic usage of the package: create an recognizer, recognize faces, classify them using few known ones.
Package hector is a golang based machine learning lib. It intend to implement all famous machine learning algoirhtms by golang. Currently, it only support algorithms which can solve binary classification problems. Supported algorithms include: 1. Decision Tree (CART, Random Forest, GBDT) 2. Logistic Regression 3. SVM 4. Neural Network Package hector is a golang based machine learning lib. It intend to implement all famous machine learning algoirhtms by golang. Currently, it only support algorithms which can solve binary classification problems. Supported algorithms include: 1. Decision Tree (CART, Random Forest, GBDT) 2. Logistic Regression 3. SVM 4. Neural Network
Package codegurusecurity provides the API client, operations, and parameter types for Amazon CodeGuru Security. Amazon CodeGuru Security is in preview release and is subject to change. This section provides documentation for the Amazon CodeGuru Security API operations. CodeGuru Security is a service that uses program analysis and machine learning to detect security policy violations and vulnerabilities, and recommends ways to address these security risks. By proactively detecting and providing recommendations for addressing security risks, CodeGuru Security improves the overall security of your application code. For more information about CodeGuru Security, see the Amazon CodeGuru Security User Guide.
Package neptunedata provides the API client, operations, and parameter types for Amazon NeptuneData. The Amazon Neptune data API provides SDK support for more than 40 of Neptune's data operations, including data loading, query execution, data inquiry, and machine learning. It supports the Gremlin and openCypher query languages, and is available in all SDK languages. It automatically signs API requests and greatly simplifies integrating Neptune into your applications.
** Copyright 2014 Edward Walker ** ** Licensed under the Apache License, Version 2.0 (the "License"); ** you may not use this file except in compliance with the License. ** You may obtain a copy of the License at ** ** http ://www.apache.org/licenses/LICENSE-2.0 ** ** Unless required by applicable law or agreed to in writing, software ** distributed under the License is distributed on an "AS IS" BASIS, ** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ** See the License for the specific language governing permissions and ** limitations under the License. ** ** Description: Caches Q matrix rows. The cache implements a LRU (Last Recently Used) eviction policy. ** @author: Ed Walker ** Copyright 2014 Edward Walker ** ** Licensed under the Apache License, Version 2.0 (the "License"); ** you may not use this file except in compliance with the License. ** You may obtain a copy of the License at ** ** http ://www.apache.org/licenses/LICENSE-2.0 ** ** Unless required by applicable law or agreed to in writing, software ** distributed under the License is distributed on an "AS IS" BASIS, ** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ** See the License for the specific language governing permissions and ** limitations under the License. ** ** Description: Implements the linear, radial-basis function, sigmoid, and polynomial kernels ** @author: Ed Walker ** Copyright 2014 Edward Walker ** ** Licensed under the Apache License, Version 2.0 (the "License"); ** you may not use this file except in compliance with the License. ** You may obtain a copy of the License at ** ** http ://www.apache.org/licenses/LICENSE-2.0 ** ** Unless required by applicable law or agreed to in writing, software ** distributed under the License is distributed on an "AS IS" BASIS, ** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ** See the License for the specific language governing permissions and ** limitations under the License. ** ** Description: Model describes the properties of the Support Vector Machine after training. ** @author: Ed Walker ** Copyright 2014 Edward Walker ** ** Licensed under the Apache License, Version 2.0 (the "License"); ** you may not use this file except in compliance with the License. ** You may obtain a copy of the License at ** ** http ://www.apache.org/licenses/LICENSE-2.0 ** ** Unless required by applicable law or agreed to in writing, software ** distributed under the License is distributed on an "AS IS" BASIS, ** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ** See the License for the specific language governing permissions and ** limitations under the License. ** ** Description: Input/output routines for the Support Vector Machine model ** @author: Ed Walker ** Copyright 2014 Edward Walker ** ** Licensed under the Apache License, Version 2.0 (the "License"); ** you may not use this file except in compliance with the License. ** You may obtain a copy of the License at ** ** http ://www.apache.org/licenses/LICENSE-2.0 ** ** Unless required by applicable law or agreed to in writing, software ** distributed under the License is distributed on an "AS IS" BASIS, ** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ** See the License for the specific language governing permissions and ** limitations under the License. ** ** Description: Useful types/methods for running loops in parallel. ** @author: Ed Walker ** Copyright 2014 Edward Walker ** ** Licensed under the Apache License, Version 2.0 (the "License"); ** you may not use this file except in compliance with the License. ** You may obtain a copy of the License at ** ** http ://www.apache.org/licenses/LICENSE-2.0 ** ** Unless required by applicable law or agreed to in writing, software ** distributed under the License is distributed on an "AS IS" BASIS, ** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ** See the License for the specific language governing permissions and ** limitations under the License. ** ** Description: Describes the parameters of the Supper Vector Machine solver ** @author: Ed Walker ** Copyright 2014 Edward Walker ** ** Licensed under the Apache License, Version 2.0 (the "License"); ** you may not use this file except in compliance with the License. ** You may obtain a copy of the License at ** ** http ://www.apache.org/licenses/LICENSE-2.0 ** ** Unless required by applicable law or agreed to in writing, software ** distributed under the License is distributed on an "AS IS" BASIS, ** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ** See the License for the specific language governing permissions and ** limitations under the License. ** ** Description: Prediciton related APIs ** @author: Ed Walker ** Copyright 2014 Edward Walker ** ** Licensed under the Apache License, Version 2.0 (the "License"); ** you may not use this file except in compliance with the License. ** You may obtain a copy of the License at ** ** http ://www.apache.org/licenses/LICENSE-2.0 ** ** Unless required by applicable law or agreed to in writing, software ** distributed under the License is distributed on an "AS IS" BASIS, ** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ** See the License for the specific language governing permissions and ** limitations under the License. ** ** Description: Probability estimation APIs ** @author: Ed Walker ** Copyright 2014 Edward Walker ** ** Licensed under the Apache License, Version 2.0 (the "License"); ** you may not use this file except in compliance with the License. ** You may obtain a copy of the License at ** ** http ://www.apache.org/licenses/LICENSE-2.0 ** ** Unless required by applicable law or agreed to in writing, software ** distributed under the License is distributed on an "AS IS" BASIS, ** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ** See the License for the specific language governing permissions and ** limitations under the License. ** ** Description: Describes problem, i.e. label/vector set ** @author: Ed Walker ** Copyright 2014 Edward Walker ** ** Licensed under the Apache License, Version 2.0 (the "License"); ** you may not use this file except in compliance with the License. ** You may obtain a copy of the License at ** ** http ://www.apache.org/licenses/LICENSE-2.0 ** ** Unless required by applicable law or agreed to in writing, software ** distributed under the License is distributed on an "AS IS" BASIS, ** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ** See the License for the specific language governing permissions and ** limitations under the License. ** ** Description: Q matrix for Support Vector Classification (svcQ), Support Vector Regression (svrQ), ** and One-Class Support Vector Machines (oneClassQ) ** @author: Ed Walker ** Copyright 2014 Edward Walker ** ** Licensed under the Apache License, Version 2.0 (the "License"); ** you may not use this file except in compliance with the License. ** You may obtain a copy of the License at ** ** http ://www.apache.org/licenses/LICENSE-2.0 ** ** Unless required by applicable law or agreed to in writing, software ** distributed under the License is distributed on an "AS IS" BASIS, ** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ** See the License for the specific language governing permissions and ** limitations under the License. ** ** Description: Sequential Minimal Optimization (SMO) solver ** Ref: C.-C. Chang, C.-J. Lin. "LIBSVM: A library for support vector machines". ACM Transactions on Intelligent Systems and Technology 2 (2011) ** @author: Ed Walker ** Copyright 2014 Edward Walker ** ** Licensed under the Apache License, Version 2.0 (the "License"); ** you may not use this file except in compliance with the License. ** You may obtain a copy of the License at ** ** http ://www.apache.org/licenses/LICENSE-2.0 ** ** Unless required by applicable law or agreed to in writing, software ** distributed under the License is distributed on an "AS IS" BASIS, ** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ** See the License for the specific language governing permissions and ** limitations under the License. ** ** Description: Functions for calling the solver for different problem scenerios, i.e. SVC, SVR, or One-Class ** @author: Ed Walker ** Copyright 2014 Edward Walker ** ** Licensed under the Apache License, Version 2.0 (the "License"); ** you may not use this file except in compliance with the License. ** You may obtain a copy of the License at ** ** http ://www.apache.org/licenses/LICENSE-2.0 ** ** Unless required by applicable law or agreed to in writing, software ** distributed under the License is distributed on an "AS IS" BASIS, ** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ** See the License for the specific language governing permissions and ** limitations under the License. ** ** Description: Useful functions used in various parts of the library ** @author: Ed Walker ** Copyright 2014 Edward Walker ** ** Licensed under the Apache License, Version 2.0 (the "License"); ** you may not use this file except in compliance with the License. ** You may obtain a copy of the License at ** ** http ://www.apache.org/licenses/LICENSE-2.0 ** ** Unless required by applicable law or agreed to in writing, software ** distributed under the License is distributed on an "AS IS" BASIS, ** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ** See the License for the specific language governing permissions and ** limitations under the License. ** ** Description: Working-set selection ** Ref: R.-E. Fan, P.-H. Chen, and C.-J. Lin. "Working set selection using second order information for training SVM". Journal of Machine Learning Research 6 (2005) ** @author: Ed Walker ** Copyright 2014 Edward Walker ** ** Licensed under the Apache License, Version 2.0 (the "License"); ** you may not use this file except in compliance with the License. ** You may obtain a copy of the License at ** ** http ://www.apache.org/licenses/LICENSE-2.0 ** ** Unless required by applicable law or agreed to in writing, software ** distributed under the License is distributed on an "AS IS" BASIS, ** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ** See the License for the specific language governing permissions and ** limitations under the License. ** ** Description: Cross validation API ** @author: Ed Walker
Package face implements face recognition for Go using dlib, a popular machine learning toolkit. This example shows the basic usage of the package: create an recognizer, recognize faces, classify them using few known ones.
Package wren provides bindings to the Wren scripting language. (To learn more about Wren, visit http://wren.io/index.html) This package facilitates running Wren code from Go. At its simplest, all you need to do is create a new virtual machine instance and interpret some Wren code: However, it's also possible to register foreign classes and methods in Go that can be called from Wren, and to execute Wren code directly from Go. Due to Go's inability to generate C-exported functions at runtime, the number of foreign methods able to be registered with the Wren VM through this package is limited to 128. This number is completely arbitrary, though, and can be changed by modifying the directive at the bottom of wren.go and running "go generate". If you feel like this number is a terrible default, pull requests will be happily accepted.
Package statemachine provides a convenient implementation of a Finite-state Machine (FSM) in Go. Learn more about FSM on Wikipedia: Let's begin with an object that can suitably adopt state machine behaviour: Here we're embedding `statemachine.Machine` into our Turnstile struct. Embedding, although not necessary, is a suitable way to apply the state machine behaviour to the struct itself. Turnstile's DisplayMsg variable stores the text that is displayed to the user, and may store either "Pay" or "Go", depending on the state. Now let's set our state machine's definitions utilizing `MachineBuilder`: Now that our turnstile is ready, let's take it for a spin:
Package face implements face recognition for Go using dlib, a popular machine learning toolkit. This example shows the basic usage of the package: create an recognizer, recognize faces, classify them using few known ones.
Gaby is an experimental new bot running in the Go issue tracker as @gabyhelp, to try to help automate various mundane things that a machine can do reasonably well, as well as to try to discover new things that a machine can do reasonably well. The name gaby is short for “Go AI Bot”, because one of the purposes of the experiment is to learn what LLMs can be used for effectively, including identifying what they should not be used for. Some of the gaby functionality will involve LLMs; other functionality will not. The guiding principle is to create something that helps maintainers and that maintainers like, which means to use LLMs when they make sense and help but not when they don't. In the long term, the intention is for this code base or a successor version to take over the current functionality of “gopherbot” and become @gopherbot, at which point the @gabyhelp account will be retired. At the moment we are not accepting new code contributions or PRs. We hope to move this code to somewhere more official soon, at which point we will accept contributions. The GitHub Discussion is a good place to leave feedback about @gabyhelp. The bot functionality is implemented in internal packages in subdirectories. This comment gives a brief tour of the structure. An explicit goal for the Gaby code base is that it run well in many different environments, ranging from a maintainer's home server or even Raspberry Pi all the way up to a hosted cloud. (At the moment, Gaby runs on a Linux server in my basement.) Due to this emphasis on portability, Gaby defines its own interfaces for all the functionality it needs from the surrounding environment and then also defines a variety of implementations of those interfaces. Another explicit goal for the Gaby code base is that it be very well tested. (See my [Go Testing talk] for more about why this is so important.) Abstracting the various external functionality into interfaces also helps make testing easier, and some packages also provide explicit testing support. The result of both these goals is that Gaby defines some basic functionality like time-ordered indexing for itself instead of relying on some specific other implementation. In the grand scheme of things, these are a small amount of code to maintain, and the benefits to both portability and testability are significant. Code interacting with services like GitHub and code running on cloud servers is typically difficult to test and therefore undertested. It is an explicit requirement this repo to test all the code, even (and especially) when testing is difficult. A useful command to have available when working in the code is rsc.io/uncover, which prints the package source lines not covered by a unit test. A useful invocation is: The first “go test” command checks that the test passes. The second repeats the test with coverage enabled. Running the test twice this way makes sure that any syntax or type errors reported by the compiler are reported without coverage, because coverage can mangle the error output. After both tests pass and second writes a coverage profile, running “uncover /tmp/c.out” prints the uncovered lines. In this output, there are three error paths that are untested. In general, error paths should be tested, so tests should be written to cover these lines of code. In limited cases, it may not be practical to test a certain section, such as when code is unreachable but left in case of future changes or mistaken assumptions. That part of the code can be labeled with a comment beginning “// Unreachable” or “// unreachable” (usually with explanatory text following), and then uncover will not report it. If a code section should be tested but the test is being deferred to later, that section can be labeled “// Untested” or “// untested” instead. The rsc.io/gaby/internal/testutil package provides a few other testing helpers. The overview of the code now proceeds from bottom up, starting with storage and working up to the actual bot. Gaby needs to manage a few secret keys used to access services. The rsc.io/gaby/internal/secret package defines the interface for obtaining those secrets. The only implementations at the moment are an in-memory map and a disk-based implementation that reads $HOME/.netrc. Future implementations may include other file formats as well as cloud-based secret storage services. Secret storage is intentionally separated from the main database storage, described below. The main database should hold public data, not secrets. Gaby defines the interface it expects from a large language model. The llm.Embedder interface abstracts an LLM that can take a collection of documents and return their vector embeddings, each of type llm.Vector. The only real implementation to date is rsc.io/gaby/internal/gemini. It would be good to add an offline implementation using Ollama as well. For tests that need an embedder but don't care about the quality of the embeddings, llm.QuoteEmbedder copies a prefix of the text into the vector (preserving vector unit length) in a deterministic way. This is good enough for testing functionality like vector search and simplifies tests by avoiding a dependence on a real LLM. At the moment, only the embedding interface is defined. In the future we expect to add more interfaces around text generation and tool use. As noted above, Gaby defines interfaces for all the functionality it needs from its external environment, to admit a wide variety of implementations for both execution and testing. The lowest level interface is storage, defined in rsc.io/gaby/internal/storage. Gaby requires a key-value store that supports ordered traversal of key ranges and atomic batch writes up to a modest size limit (at least a few megabytes). The basic interface is storage.DB. storage.MemDB returns an in-memory implementation useful for testing. Other implementations can be put through their paces using storage.TestDB. The only real storage.DB implementation is rsc.io/gaby/internal/pebble, which is a LevelDB-derived on-disk key-value store developed and used as part of CockroachDB. It is a production-quality local storage implementation and maintains the database as a directory of files. In the future we plan to add an implementation using Google Cloud Firestore, which provides a production-quality key-value lookup as a Cloud service without fixed baseline server costs. (Firestore is the successor to Google Cloud Datastore.) The storage.DB makes the simplifying assumption that storage never fails, or rather that if storage has failed then you'd rather crash your program than try to proceed through typically untested code paths. As such, methods like Get and Set do not return errors. They panic on failure, and clients of a DB can call the DB's Panic method to invoke the same kind of panic if they notice any corruption. It remains to be seen whether this decision is kept. In addition to the usual methods like Get, Set, and Delete, storage.DB defines Lock and Unlock methods that acquire and release named mutexes managed by the database layer. The purpose of these methods is to enable coordination when multiple instances of a Gaby program are running on a serverless cloud execution platform. So far Gaby has only run on an underground basement server (the opposite of cloud), so these have not been exercised much and the APIs may change. In addition to the regular database, package storage also defines storage.VectorDB, a vector database for use with LLM embeddings. The basic operations are Set, Get, and Search. storage.MemVectorDB returns an in-memory implementation that stores the actual vectors in a storage.DB for persistence but also keeps a copy in memory and searches by comparing against all the vectors. When backed by a storage.MemDB, this implementation is useful for testing, but when backed by a persistent database, the implementation suffices for small-scale production use (say, up to a million documents, which would require 3 GB of vectors). It is possible that the package ordering here is wrong and that VectorDB should be defined in the llm package, built on top of storage, and not the current “storage builds on llm”. Because Gaby makes minimal demands of its storage layer, any structure we want to impose must be implemented on top of it. Gaby uses the rsc.io/ordered encoding format to produce database keys that order in useful ways. For example, ordered.Encode("issue", 123) < ordered.Encode("issue", 1001), so that keys of this form can be used to scan through issues in numeric order. In contrast, using something like fmt.Sprintf("issue%d", n) would visit issue 1001 before issue 123 because "1001" < "123". Using this kind of encoding is common when using NoSQL key-value storage. See the rsc.io/ordered package for the details of the specific encoding. One of the implied jobs Gaby has is to collect all the relevant information about an open source project: its issues, its code changes, its documentation, and so on. Those sources are always changing, so derived operations like adding embeddings for documents need to be able to identify what is new and what has been processed already. To enable this, Gaby implements time-stamped—or just “timed”—storage, in which a collection of key-value pairs also has a “by time” index of ((timestamp, key), no-value) pairs to make it possible to scan only the key-value pairs modified after the previous scan. This kind of incremental scan only has to remember the last timestamp processed and then start an ordered key range scan just after that timestamp. This convention is implemented by rsc.io/gaby/internal/timed, along with a [timed.Watcher] that formalizes the incremental scan pattern. Various package take care of downloading state from issue trackers and the like, but then all that state needs to be unified into a common document format that can be indexed and searched. That document format is defined by rsc.io/gaby/internal/docs. A document consists of an ID (conventionally a URL), a document title, and document text. Documents are stored using timed storage, enabling incremental processing of newly added documents . The next stop for any new document is embedding it into a vector and storing that vector in a vector database. The rsc.io/gaby/internal/embeddocs package does this, and there is very little to it, given the abstractions of a document store with incremental scanning, an LLM embedder, and a vector database, all of which are provided by other packages. None of the packages mentioned so far involve network operations, but the next few do. It is important to test those but also equally important not to depend on external network services in the tests. Instead, the package rsc.io/gaby/internal/httprr provides an HTTP record/replay system specifically designed to help testing. It can be run once in a mode that does use external network servers and records the HTTP exchanges, but by default tests look up the expected responses in the previously recorded log, replaying those responses. The result is that code making HTTP request can be tested with real server traffic once and then re-tested with recordings of that traffic afterward. This avoids having to write entire fakes of services but also avoids needing the services to stay available in order for tests to pass. It also typically makes the tests much faster than using the real servers. Gaby uses GitHub in two main ways. First, it downloads an entire copy of the issue tracker state, with incremental updates, into timed storage. Second, it performs actions in the issue tracker, like editing issues or comments, applying labels, or posting new comments. These operations are provided by rsc.io/gaby/internal/github. Gaby downloads the issue tracker state using GitHub's REST API, which makes incremental updating very easy but does not provide access to a few newer features such as project boards and discussions, which are only available in the GraphQL API. Sync'ing using the GraphQL API is left for future work: there is enough data available from the REST API that for now we can focus on what to do with that data and not that a few newer GitHub features are missing. The github package provides two important aids for testing. For issue tracker state, it also allows loading issue data from a simple text-based issue description, avoiding any actual GitHub use at all and making it easier to modify the test data. For issue tracker actions, the github package defaults in tests to not actually making changes, instead diverting edits into an in-memory log. Tests can then check the log to see whether the right edits were requested. The rsc.io/gaby/internal/githubdocs package takes care of adding content from the downloaded GitHub state into the general document store. Currently the only GitHub-derived documents are one document per issue, consisting of the issue title and body. It may be worth experimenting with incorporating issue comments in some way, although they bring with them a significant amount of potential noise. Gaby will need to download and store Gerrit state into the database and then derive documents from it. That code has not yet been written, although rsc.io/gerrit/reviewdb provides a basic version that can be adapted. Gaby will also need to download and store project documentation into the database and derive documents from it corresponding to cutting the page at each heading. That code has been written but is not yet tested well enough to commit. It will be added later. The simplest job Gaby has is to go around fixing new comments, including issue descriptions (which look like comments but are a different kind of GitHub data). The rsc.io/gaby/internal/commentfix package implements this, watching GitHub state incrementally and applying a few kinds of rewrite rules to each new comment or issue body. The commentfix package allows automatically editing text, automatically editing URLs, and automatically hyperlinking text. The next job Gaby has is to respond to new issues with related issues and documents. The rsc.io/gaby/internal/related package implements this, watching GitHub state incrementally for new issues, filtering out ones that should be ignored, and then finding related issues and documents and posting a list. This package was originally intended to identify and automatically close duplicates, but the difference between a duplicate and a very similar or not-quite-fixed issue is too difficult a judgement to make for an LLM. Even so, the act of bringing forward related context that may have been forgotten or never known by the people reading the issue has turned out to be incredibly helpful. All of these pieces are put together in the main program, this package, rsc.io/gaby. The actual main package has no tests yet but is also incredibly straightforward. It does need tests, but we also need to identify ways that the hard-coded policies in the package can be lifted out into data that a natural language interface can manipulate. For example the current policy choices in package main amount to: These could be stored somewhere as data and manipulated and added to by the LLM in response to prompts from maintainers. And other features could be added and configured in a similar way. Exactly how to do this is an important thing to learn in future experimentation. As mentioned above, the two jobs Gaby does already are both fairly simple and straightforward. It seems like a general approach that should work well is well-written, well-tested deterministic traditional functionality such as the comment fixer and related-docs poster, configured by LLMs in response to specific directions or eventually higher-level goals specified by project maintainers. Other functionality that is worth exploring is rules for automatically labeling issues, rules for identifying issues or CLs that need to be pinged, rules for identifying CLs that need maintainer attention or that need submitting, and so on. Another stretch goal might be to identify when an issue needs more information and ask for that information. Of course, it would be very important not to ask for information that is already present or irrelevant, so getting that right would be a very high bar. There is no guarantee that today's LLMs work well enough to build a useful version of that. Another important area of future work will be running Gaby on top of cloud databases and then moving Gaby's own execution into the cloud. Getting it a server with a URL will enable GitHub callbacks instead of the current 2-minute polling loop, which will enable interactive conversations with Gaby. Overall, we believe that there are a few good ideas for ways that LLM-based bots can help make project maintainers' jobs easier and less monotonous, and they are waiting to be found. There are also many bad ideas, and they must be filtered out. Understanding the difference will take significant care, thought, and experimentation. We have work to do.
Package hector is a golang based machine learning lib. It intend to implement all famous machine learning algoirhtms by golang. Currently, it only support algorithms which can solve binary classification problems. Supported algorithms include: 1. Decision Tree (CART, Random Forest, GBDT) 2. Logistic Regression 3. SVM 4. Neural Network Package hector is a golang based machine learning lib. It intend to implement all famous machine learning algoirhtms by golang. Currently, it only support algorithms which can solve binary classification problems. Supported algorithms include: 1. Decision Tree (CART, Random Forest, GBDT) 2. Logistic Regression 3. SVM 4. Neural Network
Package gorgonia is a library that helps facilitate machine learning in Go. Write and evaluate mathematical equations involving multidimensional arrays easily. Do differentiation with them just as easily. Autodiff showcases automatic differentiation Basic example of representing mathematical equations as graphs. In this example, we want to represent the following equation Gorgonia provides an API that is fairly idiomatic - most of the functions in in the API return (T, error). This is useful for many cases, such as an interactive shell for deep learning. However, it must also be acknowledged that this makes composing functions together a bit cumbersome. To that end, Gorgonia provides two alternative methods. First, the `Lift` based functions; Second the `Must` function Linear Regression Example The formula for a straight line is We want to find an `m` and a `c` that fits the equation well. We'll do it in both float32 and float64 to showcase the extensibility of Gorgonia This example showcases the reasons for the more confusing functions. This example showcases dealing with errors. This is part 2 of the raison d'être of the more complicated functions - dealing with errors SymbolicDiff showcases symbolic differentiation
Package golearn is a machine learning library for Go.
Package lingua accurately detects the natural language of written text, be it long or short. Its task is simple: It tells you which language some text is written in. This is very useful as a preprocessing step for linguistic data in natural language processing applications such as text classification and spell checking. Other use cases, for instance, might include routing e-mails to the right geographically located customer service department, based on the e-mails' languages. Language detection is often done as part of large machine learning frameworks or natural language processing applications. In cases where you don't need the full-fledged functionality of those systems or don't want to learn the ropes of those, a small flexible library comes in handy. So far, the only other comprehensive open source library in the Go ecosystem for this task is Whatlanggo (https://github.com/abadojack/whatlanggo). Unfortunately, it has two major drawbacks: 1. Detection only works with quite lengthy text fragments. For very short text snippets such as Twitter messages, it does not provide adequate results. 2. The more languages take part in the decision process, the less accurate are the detection results. Lingua aims at eliminating these problems. It nearly does not need any configuration and yields pretty accurate results on both long and short text, even on single words and phrases. It draws on both rule-based and statistical methods but does not use any dictionaries of words. It does not need a connection to any external API or service either. Once the library has been downloaded, it can be used completely offline. Compared to other language detection libraries, Lingua's focus is on quality over quantity, that is, getting detection right for a small set of languages first before adding new ones. Currently, 75 languages are supported. They are listed as variants of type Language. Lingua is able to report accuracy statistics for some bundled test data available for each supported language. The test data for each language is split into three parts: 1. a list of single words with a minimum length of 5 characters 2. a list of word pairs with a minimum length of 10 characters 3. a list of complete grammatical sentences of various lengths Both the language models and the test data have been created from separate documents of the Wortschatz corpora (https://wortschatz.uni-leipzig.de) offered by Leipzig University, Germany. Data crawled from various news websites have been used for training, each corpus comprising one million sentences. For testing, corpora made of arbitrarily chosen websites have been used, each comprising ten thousand sentences. From each test corpus, a random unsorted subset of 1000 single words, 1000 word pairs and 1000 sentences has been extracted, respectively. Given the generated test data, I have compared the detection results of Lingua, and Whatlanggo running over the data of Lingua's supported 75 languages. Additionally, I have added Google's CLD3 (https://github.com/google/cld3/) to the comparison with the help of the gocld3 bindings (https://github.com/jmhodges/gocld3). Languages that are not supported by CLD3 or Whatlanggo are simply ignored during the detection process. Lingua clearly outperforms its contenders. Every language detector uses a probabilistic n-gram (https://en.wikipedia.org/wiki/N-gram) model trained on the character distribution in some training corpus. Most libraries only use n-grams of size 3 (trigrams) which is satisfactory for detecting the language of longer text fragments consisting of multiple sentences. For short phrases or single words, however, trigrams are not enough. The shorter the input text is, the less n-grams are available. The probabilities estimated from such few n-grams are not reliable. This is why Lingua makes use of n-grams of sizes 1 up to 5 which results in much more accurate prediction of the correct language. A second important difference is that Lingua does not only use such a statistical model, but also a rule-based engine. This engine first determines the alphabet of the input text and searches for characters which are unique in one or more languages. If exactly one language can be reliably chosen this way, the statistical model is not necessary anymore. In any case, the rule-based engine filters out languages that do not satisfy the conditions of the input text. Only then, in a second step, the probabilistic n-gram model is taken into consideration. This makes sense because loading less language models means less memory consumption and better runtime performance. In general, it is always a good idea to restrict the set of languages to be considered in the classification process using the respective api methods. If you know beforehand that certain languages are never to occur in an input text, do not let those take part in the classifcation process. The filtering mechanism of the rule-based engine is quite good, however, filtering based on your own knowledge of the input text is always preferable. There might be classification tasks where you know beforehand that your language data is definitely not written in Latin, for instance. The detection accuracy can become better in such cases if you exclude certain languages from the decision process or just explicitly include relevant languages. Knowing about the most likely language is nice but how reliable is the computed likelihood? And how less likely are the other examined languages in comparison to the most likely one? In the example below, a slice of ConfidenceValue is returned containing those languages which the calling instance of LanguageDetector has been built from. The entries are sorted by their confidence value in descending order. Each value is a probability between 0.0 and 1.0. The probabilities of all languages will sum to 1.0. If the language is unambiguously identified by the rule engine, the value 1.0 will always be returned for this language. The other languages will receive a value of 0.0. By default, Lingua uses lazy-loading to load only those language models on demand which are considered relevant by the rule-based filter engine. For web services, for instance, it is rather beneficial to preload all language models into memory to avoid unexpected latency while waiting for the service response. If you want to enable the eager-loading mode, you can do it as seen below. Multiple instances of LanguageDetector share the same language models in memory which are accessed asynchronously by the instances. By default, Lingua returns the most likely language for a given input text. However, there are certain words that are spelled the same in more than one language. The word `prologue`, for instance, is both a valid English and French word. Lingua would output either English or French which might be wrong in the given context. For cases like that, it is possible to specify a minimum relative distance that the logarithmized and summed up probabilities for each possible language have to satisfy. It can be stated as seen below. Be aware that the distance between the language probabilities is dependent on the length of the input text. The longer the input text, the larger the distance between the languages. So if you want to classify very short text phrases, do not set the minimum relative distance too high. Otherwise Unknown will be returned most of the time as in the example below. This is the return value for cases where language detection is not reliably possible.