Research
Security News
Quasar RAT Disguised as an npm Package for Detecting Vulnerabilities in Ethereum Smart Contracts
Socket researchers uncover a malicious npm package posing as a tool for detecting vulnerabilities in Etherium smart contracts.
github.com/t7a/tada
tada (TAble DAta) is a package that enables test-driven data pipelines in pure Go.
DISCLAIMER: still under development. API subject to breaking changes until v1.
If you still want to use this regardless of the disclaimer, congratulations, you are an alpha tester! Please DM your feedback to me on the Gophers slack channel or create an issue.
tada combines concepts from pandas, spreadsheets, R, Apache Spark, and SQL. Its most common use cases are cleaning, aggregating, transforming, and analyzing data.
Some notable features of tada:
The key data types are Series, DataFrames, and groupings of each. A Series is analogous to one column of a spreadsheet, and a DataFrame is analogous to a whole spreadsheet. Printing either data type will render an ASCII table.
Both Series and DataFrames have one or more "label levels". On printing, these appear as the leftmost columns in a table, and typically have values that help identify ("label") specific rows. They are analogous to the "index" concept in pandas.
For more detail and implementation notes, see this doc.
Logo: @egonelbre, licensed under CC0
You start with a CSV. Like most real-world data, it is messy. This one is missing a score in the first row. And we know that scores must range between 0 and 10, so the score of -100 and 1000 in the second and third rows must also be erroneous:
var data = `name, score
joe doe,
john doe, -100
jane doe, 1000
john doe, 5
jane doe, 8
john doe, 7
jane doe, 10`
You want to write and validate a function that discards erroneous data, groups by the name
column, and returns the mean of the groups.
First you write a test. You can test in two ways:
func TestDataPipeline(t *testing.T) {
want := `name, mean_score
jane doe, 9
john doe, 6`
df, _ := tada.ReadCSV(strings.NewReader(data))
ret := sampleDataPipeline(df)
eq, diffs, _ := ret.EqualsCSV(true, strings.NewReader(want))
if !eq {
t.Errorf("sampleDataPipeline(): got %v, want %v, has diffs: \n%v", ret, want, diffs)
}
}
func Test_sampleDataPipelineTyped(t *testing.T) {
type output struct {
Name []string `tada:"name"`
MeanScore []float64 `tada:"mean_score"`
}
want := output{
Name: []string{"jane doe", "john doe"},
MeanScore: []float64{9, 5},
}
df, _ := tada.ReadCSV(strings.NewReader(data))
out := sampleDataPipeline(df)
var got output
out.Struct(&got)
if !reflect.DeepEqual(got, want) {
t.Errorf("sampleDataPipelineTyped(): got %v, want %v", got, want)
}
}
Then you write the data pipeline:
func sampleDataPipeline(df *tada.DataFrame) *tada.DataFrame {
err := df.HasCols("name", "score")
if err != nil {
log.Fatal(err)
}
df.InPlace().DropNull()
df.Cast(map[string]tada.DType{"score": tada.Float64})
validScore := func(v interface{}) bool { return v.(float64) >= 0 && v.(float64) <= 10 }
df.InPlace().Filter(map[string]tada.FilterFn{"score": validScore})
df.InPlace().Sort(tada.Sorter{Name: "name", DType: tada.String})
ret := df.GroupBy("name").Mean("score")
if ret.Err() != nil {
log.Fatal(ret.Err())
}
return ret
}
More examples
s := tada.NewSeries([]float{1,2,3})
s := tada.NewSeries([]float{1,2,3}, []string{"foo", "bar", "baz"})
df := tada.NewDataFrame([]interface{}{
[]string{"a"},
[]float64{100},
}).SetColNames([]string{"foo", "bar"})
f, err := os.Open("foo.csv")
... handle err
defer f.Close()
df, err := tada.ReadCSV(f)
... handle err
More examples
InPlace()
.Cast()
it to tada.Float64
, tada.String
, or tada.DateTime
, respectively.FAQs
Unknown package
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
Socket researchers uncover a malicious npm package posing as a tool for detecting vulnerabilities in Etherium smart contracts.
Security News
Research
A supply chain attack on Rspack's npm packages injected cryptomining malware, potentially impacting thousands of developers.
Research
Security News
Socket researchers discovered a malware campaign on npm delivering the Skuld infostealer via typosquatted packages, exposing sensitive data.