Package redpanda is the SDK for Redpanda's inline Data Transforms, based on WebAssembly. This library provides a framework for transforming records written within Redpanda from an input to an output topic. This example shows the basic usage of the package: This is a "transform" that does nothing but copies the same data to an new topic. This example shows a filter that uses a regexp to filter records from one topic into another. The filter can be determined when the transform is deployed by using environment variables to specify the pattern. This example shows a transform that converts CSV into JSON.
Package ccgo translates C to Go source code. This v3 package is obsolete. Please use current ccgo/v4: Invocation 2021-12-23: v3.13.0 add clang support. To compile the resulting Go programs the package modernc.org/libc has to be installed. CCGO_CPP selects which command is used by the C front end to obtain target configuration. Defaults to `cpp`. Ignored when --load-config <path> is used. TARGET_GOARCH selects the GOARCH of the resulting Go code. Defaults to $GOARCH or runtime.GOARCH if $GOARCH is not set. Ignored when --load-config <path> is used. TARGET_GOOS selects the GOOS of the resulting Go code. Defaults to $GOOS or runtime.GOOS if $GOOS is not set. Ignored when --load-config <path> is used. To compile for the host invoke something like To cross compile set TARGET_GOARCH and/or TARGET_GOOS, not GOARCH/GOOS. Cross compile depends on availability of C stdlib headers for the target platform as well on the set of predefined macros for the target platform. For example, to cross compile on a Linux host, targeting windows/amd64, it's necessary to have mingw64 installed in $PATH. Then invoke something like Only files with extension .c, .h or .json are recognized as input files. A .json file is interpreted as a compile database. All other command line arguments following the .json file are interpreted as items that should be found in the database and included in the output file. Each item should be on object file (.o) or static archive (.a) or a command (no extension). Command line options requiring an argument. -Dfoo Equals `#define foo 1`. -Dfoo=bar Equals `#define foo bar`. -Ipath Add path to the list of include files search path. The option is a capital letter I (India), not a lowercase letter l (Lima). -limport-path The package at <import-path> must have been produced without using the -nocapi option, ie. the package must have a proper capi_$GOOS_$GOARCH.go file. The option is a lowercase letter l (Lima), not a capital letter I (India). -Ufoo Equals `#undef foo`. -compiledb name When this option appears anywhere, most preceding options are ignored and all following command line arguments are interpreted as a command with arguments that will be executed to produce the compilation database. For example: This will execute `make -DFOO -w` and attempts to extract the compile and archive commands. Only POSIX operating systems are supported. The supported build system must output information about entering directories that is compatible with GNU make. The only compilers supported are `gcc` and `clang`. The only archiver supported is `ar`. Format specification: https://clang.llvm.org/docs/JSONCompilationDatabase.html Note: This option produces also information about libraries created with `ar cr` and include it in the json file, which is above the specification. -crt-import-path path Unless disabled by the -nostdlib option, every produced Go file imports the C runtime library. Default is `modernc.org/libc`. -export-defines "" Export C numeric/string defines as Go constants by capitalizing the first letter of the define's name. -export-defines prefix Export C numeric/string defines as Go constants by prefixing the define's name with `prefix`. Name conflicts are resolved by adding a numeric suffix. -export-enums "" Export C enum constants as Go constants by capitalizing the first letter of the enum constant name. -export-enums prefix Export C enum constants as Go constants by prefixing the enum constant name with `prefix`. Name conflicts are resolved by adding a numeric suffix. -export-externs "" Export C extern definitions as Go definitions by capitalizing the first letter of the definition name. -export-externs prefix Export C extern definitions as Go definitions by prefixing the definition name with `prefix`. Name conflicts are resolved by adding a numeric suffix. -export-fields "" Export C struct fields as Go fields by capitalizing the first letter of the field name. -export-fields prefix Export C struct fields as Go fields by prefixing the field name with `prefix`. Name conflicts are resolved by adding a numeric suffix. -export-structs "" Export tagged C struct/union types as Go types by capitalizing the first letter of the tag name. -export-structs prefix Export tagged C struct/union types as Go types by prefixing the tag name with `prefix`. Name conflicts are resolved by adding a numeric suffix. -export-typedefs "" Export C typedefs as Go types by capitalizing the first letter of the typedef name. -export-structs prefix Export C typedefs as as Go types by prefixing the typedef name with `prefix`. Name conflicts are resolved by adding a numeric suffix. -static-locals-prefix prefix Prefix C static local declarators names with 'prefix'. -host-config-cmd command This option has the same effect as setting `CCGO_CPP=command`. -host-config-opts comma-separated-list The separated items of the list are added to the invocation of the configuration command. -pkgname name Set the resulting Go package name to 'name'. Defaults to `main`. -script filename Ccgo does not yet have a concept of object files. All C files that are needed for producing the resulting Go file have to be compiled together and "linked" in memory. There are some problems with this approach, one of them is the situation when foo.c has to be compiled using, for example `-Dbar=42` and "linked" with baz.c that needs to be compiled with `-Dbar=314`. Or `bar` must not defined at all for baz.c, etc. A script in a named file is a CSV file. It is opened like this (error handling omitted): The first field of every record in the CSV file is the directory to use. The remaining fields are the arguments of the ccgo command. This way different C files can be translated using different options. The CSV file may look something like: -volatile comma-separated-list The separated items of the list are added to the list of file scope extern variables the will be accessed atomically, like if their C declarator used the 'volatile' type specifier. Currently only C scalar types of size 4 and 8 bytes are supported. Other types/sizes will ignore both the volatile specifier and the -volatile option. -save-config path This option copies every header included during compilation or compile database generation to a file under the path argument. Additionally the host configuration, ie. predefined macros, include search paths, os and architecture is stored in path/config.json. When this option is used, no Go code is generated, meaning no link phase occurs and thus the memory consumption should stay low. Passing an empty string as an argument of -save-config is the same as if the option is not present at all. Possibly useful when the option set is generated in code. This option is ignored when -compiledb <path> is used. --load-config path Note that this option must have the double dash prefix to distinguish it from -lfoo, the [traditional] short form of `-l foo`. This option configures the compiler using path/config.json. The include paths are adjusted to be relative to path. For example: Assume on machine A the default C preprocessor reports a system include search path "/usr/include". Running ccgo on A with -save-config /tmp/foo to compile foo.c that #includes <stdlib.h>, which is found in /usr/include/stdlib.h on the host results in Assume /tmp/foo from machine A will be recursively copied to machine B, that may run a different operating system and/or architecture. Let the copy be for example in /tmp/bar. Using --load-config /tmp/bar will instruct ccgo to configure its preprocessor with a system include path /tmp/bar/usr/include and thus use the original machine A stdlib.h found there. When the --load-config is used, no host configuration from a machine B cross C preprocessor/compiler is needed to transpile the foo.c source on machine B as if the compiler would be running on machine A. The particular usefulness of this mechanism is for transpiling big projects for 32 bit architectures. There the lack if ccgo having an object format and thus linking everything in RAM can need too much memory for the system to handle. The way around this is possibly to run something like on machine A, transfer path/* to machine B and run the link phase there with eg. Note that the C sources for the project must be in the same path on both machines because the compile database stores absolute paths. It might be convenient to put the sources in path/src, the config in path/config, for example, and transfer the [archive of] path/ to the same directory on the second machine. That also solves the issue when ./configure generates files and the result differs per operating system or architecture. Passing an empty string as an argument of -load-config is the same as if the option is not present at all. Possibly useful when the option set is generated in code. These command line options don't take arguments. -E When this option is present the compiler does not produce any Go files and instead prints the preprocessor output to stdout. -all-errors Normally only the first 10 or so errors are shown. With this option the compiler will show all errors. -header Using this option suppresses producing of any function definitions. This is possibly useful for producing Go files from C header files. Including function signatures with -header. -func-sig Add this option to include fucntion signature when compiling headers (using -header). -nostdinc This option disables the default C include search paths. -nostdlib This option disables importing of the runtime library by the resulting Go code. -trace-pinning This option will print the positions and names of local declarators that are being pinned. -version Ignore all other options, print version and exit. -verbose-compiledb Enable verbose output when -compiledb is present. -ignore-undefined This option tells the linker to not insist on finding definitions for declarators that are not implicitly declared and used - but not defined. This might be useful when the intent is to define the missing function in Go functions manually. Name conflict resolution for such declarator names may or may not be applied. -ignore-unsupported-alignment This option tells the compiler to not complain about alignments that Go cannot support. -trace-included-files This option outputs the path names of all included files. This option is ignored when -compiledb <path> is used. There may exist other options not listed above. Those should be considered temporary and/or unsupported and may be removed without notice. Alternatively, they may eventually get promoted to "documented" options.
Package json2csv provides JSON to CSV functions.
datatools package is a collection of Go based command line tools for working with JSON content @Author R. S. Doiel, <rsdoiel@caltech.edu> Copyright (c) 2021, Caltech All rights not granted herein are expressly reserved by Caltech. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. datatools.go is a package for working with various types of data (e.g. CSV, XLSX, JSON) in support of the utilities included in the datatools.go package. Copyright (c) 2021, Caltech All rights not granted herein are expressly reserved by Caltech. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. datatools package is a collection of Go based command line tools for working with JSON content @Author R. S. Doiel, <rsdoiel@caltech.edu> Copyright (c) 2021, Caltech All rights not granted herein are expressly reserved by Caltech. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. datatools package is a collection of Go based command line tools for working with JSON content @Author R. S. Doiel, <rsdoiel@caltech.edu> Copyright (c) 2021, Caltech All rights not granted herein are expressly reserved by Caltech. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Package redpanda is the SDK for Redpanda's inline Data Transforms, based on WebAssembly. This library provides a framework for transforming records written within Redpanda from an input to an output topic. This example shows the basic usage of the package: This is a "transform" that does nothing but copies the same data to an new topic. This example shows a filter that uses a regexp to filter records from one topic into another. The filter can be determined when the transform is deployed by using environment variables to specify the pattern. This example shows a transform that converts CSV into JSON.
Package transform is the SDK for Redpanda's inline Data Transforms, based on WebAssembly. This library provides a framework for transforming records written within Redpanda from an input to an output topic. This version of the SDK is compatible with Redpanda 24.1 or greater. This example shows the basic usage of the package: This is a "transform" that does nothing but copies the same data to an new topic. This example shows a filter that uses a regexp to filter records from one topic into another. The filter can be determined when the transform is deployed by using environment variables to specify the pattern. This example shows a transform that converts CSV into JSON. This example shows the basic usage of the package: This is a transform that validates the data is valid JSON, and outputs invalid JSON to a dead letter queue.
Command txt is a templating language for shell programming. The input to the template comes from stdin. It is parsed in one of five ways. The default is to split stdin into records and fields, using the -R and -F flags respectively, similar to awk(1), and dot is set to a list of records (see below). If header is set, dot is a list of maps with the specified names as keys. If the -L flag is specified stdin is broken into records as with the default, but the fields are defined by the capture groups of the regular expression -L. Records that do not match -L are skipped. If -L contains named capture groups each record is a dictionary of only the named captures' values for that record. Otherwise, dot is a list of records (see below) of the capture groups' values. If header is set, dot is a list of maps with the specified names as keys, overriding any names from capture groups. If the -csv flag is specified, stdin is treated as a CSV file, as recognized by the encoding/csv package. If the -header flag is not specified, the first record is used as the header. Dot is set to a list of maps, with the header for each column as the key. If the -json flag is specified, stdin is treated as JSON. Dot is set as the decoded JSON. If the -no-stdin flag is specified, stdin is not read. Dot is not set. When using -F or -L without a header, or in the case of -L without named capture groups, dot is a list of records. Each record has two fields, Fields and Line. Line is the complete unaltered input of that record. Fields are the values of each field in that record. If dot is a record is the same as Records have a method F that takes an integer n and returns the nth field if it exists and the empty string otherwise. If n is negative it returns the (n-1)th field from the end. If n is positive and the nth field exists, then is equivalent to The templating language is documented at http://golang.org/pkg/text/template with the single difference that if the first line at the top of the file begins with #! that line is skipped. If the -html flag is used, escaping functions are automatically added to all outputs based on context. Any command line arguments after the flags are treated as filenames of templates. The templates are named after the basename of the respective filename. The first file listed is the main template, unless the -template flag specifies otherwise. If the -e flag is used to define an inline template, it is always the main template, and the -template flag is illegal. All regular expressions are RE2 regular expression with the Perl syntax and semantics. The syntax is documented at http://golang.org/pkg/regexp/syntax/#hdr-Syntax Built in functions are documented at http://golang.org/pkg/text/template#hdr-Functions The following additional functions are defined:
This contains a highly customizable general-purpose flow exporter. For building either "go build", "go install", or the program provided in go-flows-build can be used. The latter allows for customizing builtin modules and can help with building modules as plugins or combined binaries. This flow exporter can convert network packets into flows, extract features, and export those via various file formats. In general the flow exporter reads the feature definitions (i.e. a description of which features to extract, and the flow key), and builds an execution graph (this can be viewed with the option callgraph) from this definition. Packets are then read from a source and processed in the following pipeline: () parts in the pipeline are fixed, [] parts can be configured via the specification, {} can be configured via the specification and provided via modules, and everything else can be provided from a module and configured from the command line. source is a packet source, which must provide single packets as []byte sequences and metadata like capture time, and dropped/filtered packets. The []byte-buffer can be reused for the next packet. For examples look at modules/sources. filter is a packet filter, which must return true for a given packet if it should be filtered out. For examples look at modules/filters. parse is a fixed step that parses the packet with gopacket. label is an optional step, that can provide an arbitrary label for every packet. For examples look at modules/labels. key is a fixed step that calculates the flow key. Key parameters can be configured via the specification. table, flow, record are fixed steps that are described in more detail in the flows package. merge merges the output from every table into one record stream. The feature step calculates the actual feature values. Features can be provided via modules, and the selection of which features to calculate must be provided via the specification. Features are described in more detail in the flows package. If a flow ends (e.g. because of timeout, or tcp-rst) it gets exported via the exported, which must be provided as a module and configured via the command line. For examples look at modules/exporters. The whole pipeline is executed concurrently with the following four subpipelines running concurrently: The "table"-pipeline exists n times, where n can be configured on the command line. Packets are divided onto the different "table"-pipelines according to flow-key. WARNING: Due to this concurrent processing flow output is neither order nor deterministic (without sorting)! To ensure deterministic output, flow output can be order by start time, stop time (default), or export time. Specification files are JSON files based on the NTARC format (https://nta-meta-analysis.readthedocs.io/en/latest/). Only version 2 files can be used. It is also possible to use a simpler format, if a paper specification is not needed. Simpleformat specification: V2-formated file: Unlike in the NTARC specification active_timeout and idle_timeout MUST be specified (there are no defaults). If bidirectional is true, every flow contains packets from both directions. key features give a list of features, which are used to compute a flow key. features is a formated list of features to export. This list can also contain combinations of features and operations (https://nta-meta-analysis.readthedocs.io/en/latest/features.html). Only single pass operations can ever be supported due to design restrictions in the flow exporter. In addition to the features specified in the nta-meta-analysis, two addional types of features are present: Filter features which can exclude packets from a whole flow, and control features which can change flow behaviour like exporting the flow before the end, restarting the flow, or discarding the flow. _per_packet allows exporting one flow per packet. If _allow_zero is true, then packets are accepted, where one of the parts of the flow key would be zero (e.g. non-IP packets for flow keys that contain IP-Addresses). If _expire_TCP is set to false, no TCP-based expiry is carried out (e.g. RST packets). TCP expiry is only carried out if at least the five-tuple is part of the flow key. A list of supported features can be queried with "./go-flows features" The examples directory contains several example flow specifications that can be used. The general syntax on the command line is "go-flows run <commands>" where <commands> is a list of "<verb> <which> [options] [--]" sequences. <verb> can be one of features, export, source filter, or label, and <which> is the actual module. The options can be queried from the help of the different modules (e.g. go-flows <verb>s <which>; e.g. go-flows exporters ipfix). Example: The following list describes all the different things contained in the subdirectories. Features most follow the conventions in https://nta-meta-analysis.readthedocs.io/en/latest/features.html, which states that names must follow the ipfix iana assignments (https://www.iana.org/assignments/ipfix/ipfix.xhtml), or start with an _ for common features or __ for uncommon ones. Feature names must be camelCase. The flow exporter has the full list of ipfix iana assignments already builtin which means that for these features one needs to only specifiy the name - all type information is automatically added by the flow extractor. For implementing features most of the time flows.BaseFeature is a good start point. Features need to override the needed methods: Start(*EventContext) gets called when a flow starts. Do cleanup here (features might be reused!). MUST call flows.BaseFeature.Start from this function! Event(interface{}, *EventContext, interface{}) gets called for every packet belonging to the current flow Stop(FlowEndReason, *EventContext) gets called when a flow finishes (before export) SetValue(new interface{}, when *EventContext, self interface{}) Call this one for setting a value. It stores the new value and forwards it to all dependent features. Less commonly used functions See also documentation of subpackage flows for more details about which base to choose. A simple example is the protocolIdentifier: This feature doesn't need a Start or Stop (since both functions don't provide a packet). For every packet, it checks, if the protocolIdentifier has already been set, and if it hasn't been, it sets a new value. The new value provided to Event will always be a packet.Buffer for features that expect a raw packet. For other features, this will be the actual value emitted from other features. E.g. for the specification the minfeature will receive the uint8 emitted by this feature. The final component missing from the code is the feature registration. This has to be done in init with one of the Register* functions from the flows packet. For the protocolIdentifier this looks like the following: Since protocolIdentifier is one of the iana assigned ipfix features, RegisterStandardFeature can be used, which automatically adds the rest of the ipfix information element specification. The second argument is what this feature implementation returns which in this case is a single value per flow - a FlowFeature. The third argument must be a function that returns a new feature instance. The last argument specifies the input to this features, which is a raw packet. The flows package contains a list of implemented types and Register functions. For more examples have a look at the provided features. Common part of sources/filters/labels/exporters Sources, filters, labels, and exportes must register themselves with the matching Register* function: where a name and a short description have to be provideded. The helpX function gets called if the help for this module is invoked and must write the help to os.Stderr. The newX function must parse the given arguments and return a new X. This function must have the following signature: name can be a provided name for the id, but can be empty. opts holds the parameters from a JSON specification or util.UseStringOption if args need to be parsed. args holds the rest of the arguments in case it is a command line invocation. Needed arguments must be parsed from this array and the remaining ones returned (arguments). If successful the created module must be returned as ret - otherwise an error. This function must only parse arguments and prepare the state of the module. Opening files etc. must happen in Init() All modules must fulfill the util.Module interface which contains an Init and an ID function. ID must return a string for the callgraph representation (most of the time a combination of modulename|parameter). Init will be called during intialization. Side effects like creating files must happen in Init and not during the new function! Examples of the different modules can be found in the modules directory. Sources must implement the packet.Source interface: ReadPacket gets called for reading the next packet. This function must return the layer type, the raw data of a single packet, capture information, how many packets have been skipped and filtered since the last invocation, or an error. Stop might be called asynchronously (be careful with races) to stop an ongoing capture. After or during this happening ReadPacket must return io.EOF as error. This function is only called to stop the flow exporter early (e.g. ctrl+c). data is not kept around by the flow exported which means, the source an reuse the same data buffer for every ReadPacket. Filters must implement the packet.Filter interface: Matches will be called for every packet with the capture info and the raw data as argument. If this function returns false, then the current packet gets filtered out (i.e. processing of this packet stops and the next one is used). Don't hold on to data! This will be reused for the next packet. Labels must implement the packet.Label interface: This function can return an arbitrary value as label for the packet (can also be nil for no label). If the label source is empty io.EOF must be returned. Exporters must implement the flow.Exporter interface: Fields, Export, Finish will never be called concurrently, are expected to be blocking until finished, and, therefore, don't need to take care about synchronization. The Fields function gets called before processing starts and provides a list of feature names that will be exported (e.g. the csv exporter uses this to create the csv header). Export gets called for every record that must be exported. Arguments are a template for this list of features, the actual features values, and an export time. Finish will be called after all packets and flows have been processed. This function must flush data and wait for this process to finish.
Package freeGeoIP or go-freeGeoIP is a Golang client for Free IP Geolocation information API with inbuilt cache support to increase the 15k per hour rate limit of the application https://freegeoip.app/ By default, the client will cache the IP Geolocation information for 24 hours, but the expiry can be set manually. If you want set the information cache with no expiration time set the expiry function to nil. You can use the package using the following command: freegeoip.app provides a free IP geolocation API for software developers. It uses a database of IP addresses that are associated to cities along with other relevant information like time zone, latitude and longitude. You're allowed up to 15,000 queries per hour by default. Once this limit is reached, all of your requests will result in HTTP 403, forbidden, until your quota is cleared. The HTTP API takes GET requests in the following schema: Supported formats are: csv, xml, json and jsonp. If no IP or hostname is provided, then your own IP is looked up. Contributors are more than welcome and much appreciated. Please feel free to open a PR to improve anything you don't like, or would like to add. Please make your changes in a specific branch and request to pull into master! If you can please make sure all the changes work properly and does not affect the existing functioning. No PR is too small! Even the smallest effort is countable. This project is licensed under the MIT license.(https://github.com/Shivam010/go-freeGeoIP/blob/master/LICENSE)
Package ratchet is a library for performing data pipeline / ETL tasks in Go. The main construct in Ratchet is Pipeline. A Pipeline has a series of PipelineStages, which will each perform some type of data processing, and then send new data on to the next stage. Each PipelineStage consists of one or more DataProcessors, which are responsible for receiving, processing, and then sending data on to the next stage of processing. DataProcessors each run in their own goroutine, and therefore all data processing can be executing concurrently. Here is a conceptual drawing of a fairly simple Pipeline: In this example, we have a Pipeline consisting of 3 PipelineStages. The first stage has a DataProcessor that runs queries on a SQL database, the second is doing custom transformation work on that data, and the third stage branches into 2 DataProcessors, one writing the resulting data to a CSV file, and the other inserting into another SQL database. In the example above, Stage 1 and Stage 3 are using built-in DataProcessors (see the "processors" package/subdirectory). However, Stage 2 is using a custom implementation of DataProcessor. By using a combination of built-in processors, and supporting the writing of any Go code to process data, Ratchet makes it possible to write very custom and fast data pipeline systems. See the DataProcessor documentation to learn more. Since each DataProcessor is running in it's own goroutine, SQLReader can continue pulling and sending data while each subsequent stage is also processing data. Optimally-designed pipelines have processors that can each run in an isolated fashion, processing data without having to worry about what's coming next down the pipeline. All data payloads sent between DataProcessors are of type data.JSON ([]byte). This provides a good balance of consistency and flexibility. See the "data" package for details and helper functions for dealing with data.JSON. Another good read for handling JSON data in Go is http://blog.golang.org/json-and-go. Note that many of the concepts in Ratchet were taken from the Golang blog's post on pipelines (http://blog.golang.org/pipelines). While the details discussed in that blog post are largely abstracted away by Ratchet, it is still an interesting read and will help explain the general concepts being applied. There are two ways to construct and run a Pipeline. The first is a basic, non-branching Pipeline. For example: This is a 3-stage Pipeline that queries some SQL data in stage 1, does some custom data transformation in stage 2, and then writes the resulting data to a SQL table in stage 3. The code to create and run this basic Pipeline would look something like: The second way to construct a Pipeline is using a PipelineLayout. This method allows for more complex Pipeline configurations that support branching between stages that are running multiple DataProcessors. Here is a (fairly complex) example: This Pipeline consists of 4 stages where each DataProcessor is choosing which DataProcessors in the subsequent stage should receive the data it sends. The SQLReader in stage 2, for example, is sending data to only 2 processors in the next stage, while the Custom DataProcessor in stage 2 is sending it's data to 3. The code for constructing and running a Pipeline like this would look like: This example is only conceptual, the main points being to explain the flexibility you have when designing your Pipeline's layout and to demonstrate the syntax for constructing a new PipelineLayout.
Package deepdiff is a structured data differ that aims for near-linear time complexity. It's intended to calculate differences & apply patches to structured data ranging from 0-500MBish of encoded JSON Diffing structured data carries additional complexity when compared to the standard unix diff utility, which operates on lines of text. By using the structure of data itself, deepdiff is able to provide a rich description of changes that maps onto the structure of the data itself. deepdiff ignores semantically irrelevant changes like whitespace, and can isolate changes like column changes to tabular data to only the relevant switches Most algorithms in this space have quadratic time complexity, which from our testing makes them very slow on 3MB JSON documents and unable to complete on 5MB or more. deepdiff currently hovers around the 0.9Sec/MB range on 4 core processors Instead of operating on JSON directly, deepdiff operates on document trees consisting of the go types created by unmarshaling from JSON, which aretwo complex types: and five scalar types: by operating on native go types deepdiff can compare documents encoded in different formats, for example decoded CSV or CBOR. deepdiff is based off an algorithm designed for diffing XML documents outlined in: Detecting Changes in XML Documents by Grégory Cobéna & Amélie Marian https://ieeexplore.ieee.org/document/994696 it's been adapted to fit purposes of diffing for Qri: https://github.com/qri-io/qri the guiding use case for this work deepdiff also includes a tool for applying patches, see documentation for details
Package decoder - this unmarshals or decodes values from a consul KV store into a struct. The following types are supported: By default, the decoder packages looks for the struct tag "decoder". However, this can be overridden inside the Decoder struct as shown below. For the purposes of examples, we'll stick with the default "decoder" tag. By default, in the absence of a decoder tag, it will look for a consul key name with the same name as the struct field. Only exported struct fields are considered. The name comparison is case-insensitive by default, but this is configurable in the Decoder struct. the tag "-" indicates to skip the field. The modifier ",json" appended to the end signals that the value is to be interpreted as json and unmarshaled rather than interpreted. Similarly, the modififier ",csv" allows comma separated values to be read into a slice, and ",ssv" allows space separated values to be read intoa slice. For csv and ssv, slices of string, numeric and boolean are supported.
Package fmtstruct writes slices of structs as formatted output in JSON, CSV, or tabular format. Everything else is just configuration, flag parsing, and convenience functions. # Anticipated questions: Why only structs? A: that's what I needed.
Package quandl provides easy access to the Quandl API It provides methods for getting response from Quandl in several formats. Basic usage looks like this: and will return a native Go object. To use the data in the response, iterate through its Data property: To receive a raw response from Quandl (CSV, JSON, XML) you can use: To pass options to the Quandl API, use something like this:
The dawa package can be used to de-serialize structures received from "Danmarks Adressers Web API (DAWA)" (Addresses of Denmark Web API). This package allows to de-serialize JSON responses from the web api into typed structs. The package also allows importing JSON or CSV downloads from the official web page. See the /examples folder for more information. Package home: https://github.com/klauspost/dawa Information abou the format and download/API options, see http://dawa.aws.dk/ Description text in Danish: Danmarks Adressers Web API (DAWA) udstiller data og funktionalitet vedrørende Danmarks adresser, adgangsadresser, vejnavne samt postnumre. DAWA anvendes til etablering af adressefunktionalitet i it-systemer. Målgruppen for nærværende website er udviklere, som ønsker at indbygge adressefunktionalitet i deres it-systemer.
Package freegeoip provides an API for searching the geolocation of IP addresses. It uses a database that can be either a local file or a remote resource from a URL. Local databases are monitored by fsnotify and reloaded when the file is either updated or overwritten. Remote databases are automatically downloaded and updated in background so you can focus on using the API and not managing the database. Also, the freegeoip package provides http handlers that any Go http server (net/http) can use. These handlers can process IP geolocation lookup requests and return data in multiple formats like CSV, XML, JSON and JSONP. It has also an API for supporting custom formats.
package goetl is a library for performing data pipeline / ETL tasks in Go. The main construct in goetl is Pipeline. A Pipeline has a series of PipelineStages, which will each perform some type of data processing, and then send new data on to the next stage. Each PipelineStage consists of one or more Processors, which are responsible for receiving, processing, and then sending data on to the next stage of processing. DataProcessors each run in their own goroutine, and therefore all data processing can be executing concurrently. Here is a conceptual drawing of a fairly simple Pipeline: In this example, we have a Pipeline consisting of 3 PipelineStages. The first stage has a Processor that runs queries on a SQL database, the second is doing custom transformation work on that data, and the third stage branches into 2 Processors, one writing the resulting data to a CSV file, and the other inserting into another SQL database. In the example above, Stage 1 and Stage 3 are using built-in Processors (see the "processors" package/subdirectory). However, Stage 2 is using a custom implementation of Processor. By using a combination of built-in processors, and supporting the writing of any Go code to process data, goetl makes it possible to write very custom and fast data pipeline systems. See the Processor documentation to learn more. Since each Processor is running in its own goroutine, SQLReader can continue pulling and sending data while each subsequent stage is also processing data. Optimally-designed pipelines have processors that can each run in an isolated fashion, processing data without having to worry about what's coming next down the pipeline. All data payloads sent between Processors implement the etldata.Payload interface. Built-in processors send data flows using the type etldata.JSON. This provides a good balance of consistency and flexibility. See the "data" package for details and helper functions for dealing with etldata.Payload and etldata.JSON. Another good read for handling JSON data in Go is http://blog.golang.org/json-and-go. Note that many of the concepts in goetl were taken from the Golang blog's post on pipelines (http://blog.golang.org/pipelines). While the details discussed in that blog post are largely abstracted away by goetl, it is still an interesting read and will help explain the general concepts being applied. There are two ways to construct and run a Pipeline. The first is a basic, non-branching Pipeline. For example: This is a 3-stage Pipeline that queries some SQL data in stage 1, does some custom data transformation in stage 2, and then writes the resulting data to a SQL table in stage 3. The code to create and run this basic Pipeline would look something like: The second way to construct a Pipeline is using a PipelineLayout. This method allows for more complex Pipeline configurations that support branching between stages that are running multiple DataProcessors. Here is a (fairly complex) example: This Pipeline consists of 4 stages where each Processor is choosing which Processors in the subsequent stage should receive the data it sends. The SQLReader in stage 2, for example, is sending data to only 2 processors in the next stage, while the Custom Processor in stage 2 is sending its data to 3. The code for constructing and running a Pipeline like this would look like: This example is only conceptual, the main points being to explain the flexibility you have when designing your Pipeline's layout and to demonstrate the syntax for constructing a new PipelineLayout.
Package pipes is a library for performing data pipeline / ETL tasks in Go. The main construct in pipes is Pipeline. A Pipeline has a series of PipelineStages, which will each perform some type of data processing, and then send new data on to the next stage. Each PipelineStage consists of one or more DataProcessors, which are responsible for receiving, processing, and then sending data on to the next stage of processing. DataProcessors each run in their own goroutine, and therefore all data processing can be executing concurrently. Here is a conceptual drawing of a fairly simple Pipeline: In this example, we have a Pipeline consisting of 3 PipelineStages. The first stage has a DataProcessor that runs queries on a SQL database, the second is doing custom transformation work on that data, and the third stage branches into 2 DataProcessors, one writing the resulting data to a CSV file, and the other inserting into another SQL database. In the example above, Stage 1 and Stage 3 are using built-in DataProcessors (see the "processors" package/subdirectory). However, Stage 2 is using a custom implementation of DataProcessor. By using a combination of built-in processors, and supporting the writing of any Go code to process data, pipes makes it possible to write very custom and fast data pipeline systems. See the DataProcessor documentation to learn more. Since each DataProcessor is running in it's own goroutine, SQLReader can continue pulling and sending data while each subsequent stage is also processing data. Optimally-designed pipelines have processors that can each run in an isolated fashion, processing data without having to worry about what's coming next down the pipeline. All data payloads sent between DataProcessors are of type data.JSON ([]byte). This provides a good balance of consistency and flexibility. See the "data" package for details and helper functions for dealing with data.JSON. Another good read for handling JSON data in Go is http://blog.golang.org/json-and-go. Note that many of the concepts in pipes were taken from the Golang blog's post on pipelines (http://blog.golang.org/pipelines). While the details discussed in that blog post are largely abstracted away by pipes, it is still an interesting read and will help explain the general concepts being applied. There are two ways to construct and run a Pipeline. The first is a basic, non-branching Pipeline. For example: This is a 3-stage Pipeline that queries some SQL data in stage 1, does some custom data transformation in stage 2, and then writes the resulting data to a SQL table in stage 3. The code to create and run this basic Pipeline would look something like: The second way to construct a Pipeline is using a PipelineLayout. This method allows for more complex Pipeline configurations that support branching between stages that are running multiple DataProcessors. Here is a (fairly complex) example: This Pipeline consists of 4 stages where each DataProcessor is choosing which DataProcessors in the subsequent stage should receive the data it sends. The SQLReader in stage 2, for example, is sending data to only 2 processors in the next stage, while the Custom DataProcessor in stage 2 is sending it's data to 3. The code for constructing and running a Pipeline like this would look like: This example is only conceptual, the main points being to explain the flexibility you have when designing your Pipeline's layout and to demonstrate the syntax for constructing a new PipelineLayout.
Package ratchet is a library for performing data pipeline / ETL tasks in Go. The main construct in Ratchet is Pipeline. A Pipeline has a series of PipelineStages, which will each perform some type of data processing, and then send new data on to the next stage. Each PipelineStage consists of one or more DataProcessors, which are responsible for receiving, processing, and then sending data on to the next stage of processing. DataProcessors each run in their own goroutine, and therefore all data processing can be executing concurrently. Here is a conceptual drawing of a fairly simple Pipeline: In this example, we have a Pipeline consisting of 3 PipelineStages. The first stage has a DataProcessor that runs queries on a SQL database, the second is doing custom transformation work on that data, and the third stage branches into 2 DataProcessors, one writing the resulting data to a CSV file, and the other inserting into another SQL database. In the example above, Stage 1 and Stage 3 are using built-in DataProcessors (see the "processors" package/subdirectory). However, Stage 2 is using a custom implementation of DataProcessor. By using a combination of built-in processors, and supporting the writing of any Go code to process data, Ratchet makes it possible to write very custom and fast data pipeline systems. See the DataProcessor documentation to learn more. Since each DataProcessor is running in it's own goroutine, SQLReader can continue pulling and sending data while each subsequent stage is also processing data. Optimally-designed pipelines have processors that can each run in an isolated fashion, processing data without having to worry about what's coming next down the pipeline. All data payloads sent between DataProcessors are of type data.JSON ([]byte). This provides a good balance of consistency and flexibility. See the "data" package for details and helper functions for dealing with data.JSON. Another good read for handling JSON data in Go is http://blog.golang.org/json-and-go. Note that many of the concepts in Ratchet were taken from the Golang blog's post on pipelines (http://blog.golang.org/pipelines). While the details discussed in that blog post are largely abstracted away by Ratchet, it is still an interesting read and will help explain the general concepts being applied. There are two ways to construct and run a Pipeline. The first is a basic, non-branching Pipeline. For example: This is a 3-stage Pipeline that queries some SQL data in stage 1, does some custom data transformation in stage 2, and then writes the resulting data to a SQL table in stage 3. The code to create and run this basic Pipeline would look something like: The second way to construct a Pipeline is using a PipelineLayout. This method allows for more complex Pipeline configurations that support branching between stages that are running multiple DataProcessors. Here is a (fairly complex) example: This Pipeline consists of 4 stages where each DataProcessor is choosing which DataProcessors in the subsequent stage should receive the data it sends. The SQLReader in stage 2, for example, is sending data to only 2 processors in the next stage, while the Custom DataProcessor in stage 2 is sending it's data to 3. The code for constructing and running a Pipeline like this would look like: This example is only conceptual, the main points being to explain the flexibility you have when designing your Pipeline's layout and to demonstrate the syntax for constructing a new PipelineLayout.
Package freegeoip provides an API for searching the geolocation of IP addresses. It uses a database that can be either a local file or a remote resource from a URL. Local databases are monitored by fsnotify and reloaded when the file is either updated or overwritten. Remote databases are automatically downloaded and updated in background so you can focus on using the API and not managing the database. Also, the freegeoip package provides http handlers that any Go http server (net/http) can use. These handlers can process IP geolocation lookup requests and return data in multiple formats like CSV, XML, JSON and JSONP. It has also an API for supporting custom formats.
Package freegeoip provides an API for searching the geolocation of IP addresses. It uses a database that can be either a local file or a remote resource from a URL. Local databases are monitored by fsnotify and reloaded when the file is either updated or overwritten. Remote databases are automatically downloaded and updated in background so you can focus on using the API and not managing the database. Also, the freegeoip package provides http handlers that any Go http server (net/http) can use. These handlers can process IP geolocation lookup requests and return data in multiple formats like CSV, XML, JSON and JSONP. It has also an API for supporting custom formats.
Package ratchet is a library for performing data pipeline / ETL tasks in Go. The main construct in Ratchet is Pipeline. A Pipeline has a series of PipelineStages, which will each perform some type of data processing, and then send new data on to the next stage. Each PipelineStage consists of one or more DataProcessors, which are responsible for receiving, processing, and then sending data on to the next stage of processing. DataProcessors each run in their own goroutine, and therefore all data processing can be executing concurrently. Here is a conceptual drawing of a fairly simple Pipeline: In this example, we have a Pipeline consisting of 3 PipelineStages. The first stage has a DataProcessor that runs queries on a SQL database, the second is doing custom transformation work on that data, and the third stage branches into 2 DataProcessors, one writing the resulting data to a CSV file, and the other inserting into another SQL database. In the example above, Stage 1 and Stage 3 are using built-in DataProcessors (see the "processors" package/subdirectory). However, Stage 2 is using a custom implementation of DataProcessor. By using a combination of built-in processors, and supporting the writing of any Go code to process data, Ratchet makes it possible to write very custom and fast data pipeline systems. See the DataProcessor documentation to learn more. Since each DataProcessor is running in it's own goroutine, SQLReader can continue pulling and sending data while each subsequent stage is also processing data. Optimally-designed pipelines have processors that can each run in an isolated fashion, processing data without having to worry about what's coming next down the pipeline. All data payloads sent between DataProcessors are of type data.JSON ([]byte). This provides a good balance of consistency and flexibility. See the "data" package for details and helper functions for dealing with data.JSON. Another good read for handling JSON data in Go is http://blog.golang.org/json-and-go. Note that many of the concepts in Ratchet were taken from the Golang blog's post on pipelines (http://blog.golang.org/pipelines). While the details discussed in that blog post are largely abstracted away by Ratchet, it is still an interesting read and will help explain the general concepts being applied. There are two ways to construct and run a Pipeline. The first is a basic, non-branching Pipeline. For example: This is a 3-stage Pipeline that queries some SQL data in stage 1, does some custom data transformation in stage 2, and then writes the resulting data to a SQL table in stage 3. The code to create and run this basic Pipeline would look something like: The second way to construct a Pipeline is using a PipelineLayout. This method allows for more complex Pipeline configurations that support branching between stages that are running multiple DataProcessors. Here is a (fairly complex) example: This Pipeline consists of 4 stages where each DataProcessor is choosing which DataProcessors in the subsequent stage should receive the data it sends. The SQLReader in stage 2, for example, is sending data to only 2 processors in the next stage, while the Custom DataProcessor in stage 2 is sending it's data to 3. The code for constructing and running a Pipeline like this would look like: This example is only conceptual, the main points being to explain the flexibility you have when designing your Pipeline's layout and to demonstrate the syntax for constructing a new PipelineLayout.
Package peanut writes tagged data structs to disk in a variety of formats. Its primary purpose is to provide a single consistent interface for easy, ceremony-free persistence of record-based struct data. Each distinct struct type is written to an individual file (or table), automatically created, each named according to the name of the struct. Field/column names in each file/table are derived from struct tags. All writers use the same tags. Currently supported formats are CSV, TSV, Excel (.xlsx), JSON Lines (JSONL), and SQLite. Additional writers are also provided to assist with testing and debugging. Mutiple writers can be combined using MultiWriter. All writers have the same basic interface: a Write method, that can take any appropriately tagged struct; a Close method, which should be called to successfully complete writing; and a Cancel method, which should be called to abort writing and clean-up, in the event of an error or cancellation. It is safe to make mulltiple calls to Cancel, and it is safe to call Close after having previously called Cancel. All writers output their files atomically — that is to say: all output is written to a temporary location and only moved to the final output location when Close is called, meaning the output folder never contains any partially written files. Structs to be used with peanut must have appropriately tagged fields, for example: Fields without tags do not get written as output. First create a writer, for example: Next, write some records to it: When successfully completed: Or, to abort the whole operation in the event of an error or cancellation while writing records: Multiple writers can be combined using MultiWriter: Here w will write records to CSV files, Excel files, and a logger. Behaviour is undefined for types with the same name but in different packages, such as package1.Foo and package2.Foo. Supported datatypes for struct fields: string, bool, float32, float64, int, int8, int16, int32, int64, uint, uint8, uint16, uint32, uint64. Pointer following and nested structs are currently unsupported. Tagging a field that has an unsupported datatype will result in a error when Write is called.