Security News
The Risks of Misguided Research in Supply Chain Security
Snyk's use of malicious npm packages for research raises ethical concerns, highlighting risks in public deployment, data exfiltration, and unauthorized testing.
github.com/ptiger10/tada
tada (TAble DAta) is a package that enables test-driven data pipelines in pure Go.
DISCLAIMER: still under development. API subject to breaking changes until v1.
If you still want to use this regardless of the disclaimer, congratulations, you are an alpha tester! Please DM your feedback to me on the Gophers slack channel (Dave Fort) or create an issue.
tada combines concepts from pandas, spreadsheets, R, Apache Spark, and SQL. Its most common use cases are cleaning, aggregating, transforming, and analyzing data.
Some notable features of tada:
The key data types are Series, DataFrames, and groupings of each. A Series is analogous to one column of a spreadsheet, and a DataFrame is analogous to a whole spreadsheet. Printing either data type will render an ASCII table.
Both Series and DataFrames have one or more "label levels". On printing, these appear as the leftmost columns in a table, and typically have values that help identify ("label") specific rows. They are analogous to the "index" concept in pandas.
For more detail and implementation notes, see this doc.
Logo: @egonelbre, licensed under CC0
You start with a CSV. Like most real-world data, it is messy. This one is missing a score in the first row. And we know that scores must range between 0 and 10, so the score of -100 and 1000 in the second and third rows must also be erroneous:
var data = `name, score
joe doe,
john doe, -100
jane doe, 1000
john doe, 5
jane doe, 8
john doe, 7
jane doe, 10`
You want to write and validate a function that discards erroneous data, groups by the name
column, and returns the mean of the groups.
First you write a test. You can test in two ways:
func TestDataPipeline(t *testing.T) {
want := `name, mean_score
jane doe, 9
john doe, 6`
df, _ := tada.ReadCSV(strings.NewReader(data))
ret := sampleDataPipeline(df)
eq, diffs, _ := ret.EqualsCSV(true, strings.NewReader(want))
if !eq {
t.Errorf("sampleDataPipeline(): got %v, want %v, has diffs: \n%v", ret, want, diffs)
}
}
func Test_sampleDataPipelineTyped(t *testing.T) {
type output struct {
Name []string `tada:"name"`
MeanScore []float64 `tada:"mean_score"`
}
want := output{
Name: []string{"jane doe", "john doe"},
MeanScore: []float64{9, 5},
}
df, _ := tada.ReadCSV(strings.NewReader(data))
out := sampleDataPipeline(df)
var got output
out.Struct(&got)
if !reflect.DeepEqual(got, want) {
t.Errorf("sampleDataPipelineTyped(): got %v, want %v", got, want)
}
}
Then you write the data pipeline:
func sampleDataPipeline(df *tada.DataFrame) *tada.DataFrame {
err := df.HasCols("name", "score")
if err != nil {
log.Fatal(err)
}
df.InPlace().DropNull()
df.Cast(map[string]tada.DType{"score": tada.Float64})
validScore := func(v interface{}) bool { return v.(float64) >= 0 && v.(float64) <= 10 }
df.InPlace().Filter(map[string]tada.FilterFn{"score": validScore})
df.InPlace().Sort(tada.Sorter{Name: "name", DType: tada.String})
ret := df.GroupBy("name").Mean("score")
if ret.Err() != nil {
log.Fatal(ret.Err())
}
return ret
}
More examples
s := tada.NewSeries([]float{1,2,3})
s := tada.NewSeries([]float{1,2,3}, []string{"foo", "bar", "baz"})
df := tada.NewDataFrame([]interface{}{
[]string{"a"},
[]float64{100},
}).SetColNames([]string{"foo", "bar"})
f, err := os.Open("foo.csv")
... handle err
defer f.Close()
df, err := tada.ReadCSV(f)
... handle err
More examples
InPlace()
.Cast()
it to tada.Float64
, tada.String
, or tada.DateTime
, respectively.FAQs
Unknown package
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Snyk's use of malicious npm packages for research raises ethical concerns, highlighting risks in public deployment, data exfiltration, and unauthorized testing.
Research
Security News
Socket researchers found several malicious npm packages typosquatting Chalk and Chokidar, targeting Node.js developers with kill switches and data theft.
Security News
pnpm 10 blocks lifecycle scripts by default to improve security, addressing supply chain attack risks but sparking debate over compatibility and workflow changes.