
Research
Security News
The Growing Risk of Malicious Browser Extensions
Socket researchers uncover how browser extensions in trusted stores are used to hijack sessions, redirect traffic, and manipulate user behavior.
csv-dataset helps to read csv files and create descriptive and efficient input pipelines for deep learning in a streaming fashion
CsvDataset
helps to read a csv file and create descriptive and efficient input pipelines for deep learning.
CsvDataset
iterates the records of the csv file in a streaming fashion, so the full dataset does not need to fit into memory.
$ pip install csv-dataset
Suppose we have a csv file whose absolute path is filepath
:
open_time,open,high,low,close,volume
1576771200000,7145.99,7150.0,7141.01,7142.33,21.094283
1576771260000,7142.89,7142.99,7120.7,7125.73,118.279931
1576771320000,7125.76,7134.46,7123.12,7123.12,41.03628
1576771380000,7123.74,7128.06,7117.12,7126.57,39.885367
1576771440000,7127.34,7137.84,7126.71,7134.99,25.138154
1576771500000,7134.99,7144.13,7132.84,7141.64,26.467308
...
from csv_dataset import (
Dataset,
CsvReader
)
dataset = CsvDataset(
CsvReader(
filepath,
float,
# Abandon the first column and only pick the following
indexes=[1, 2, 3, 4, 5],
header=True
)
).window(3, 1).batch(2)
for element in dataset:
print(element)
The following output shows one print.
[[[7145.99, 7150.0, 7141.01, 7142.33, 21.094283]
[7142.89, 7142.99, 7120.7, 7125.73, 118.279931]
[7125.76, 7134.46, 7123.12, 7123.12, 41.03628 ]]
[[7142.89, 7142.99, 7120.7, 7125.73, 118.279931]
[7125.76, 7134.46, 7123.12, 7123.12, 41.03628 ]
[7123.74, 7128.06, 7117.12, 7126.57, 39.885367]]]
...
Defines the window size, shift and stride.
The default window size is 1
which means the dataset has no window.
Parameter explanation
Suppose we have a raw data set
[ 1 2 3 4 5 6 7 8 9 ... ]
And the following is a window of (size=4, shift=3, stride=2)
|-------------- size:4 --------------|
|- stride:2 -| |
| | |
win 0: [ 1 3 5 7 ] --------|-----
shift:3
win 1: [ 4 6 8 10 ] --------|-----
win 2: [ 7 9 11 13 ]
...
Defines batch size.
The default batch size of the dataset is 1
which means it is single-batch
If batch is 2
batch 0: [[ 1 3 5 7 ]
[ 4 6 8 10 ]]
batch 1: [[ 7 9 11 13 ]
[ 10 12 14 16 ]]
...
Gets the data of the next batch
Resets dataset
True
, the dataset will reset the data of the previous window in the bufferReads multiple batches at a time
If we reset_buffer
, then the next read will not use existing data in the buffer, and the result will have no overlap with the last read.
Reset buffer, so that the next read will have no overlap with the last one
Calculates and returns how many lines of the underlying datum are needed for reading reads
times
Calculates max_lines
lines could afford how many reads
Calculates the current reader could afford how many reads.
If max_lines
of current reader is unset, then it returns None
str
absolute path of the csv fileCallable
data type. We should only use float
or int
for this argument.List[int]
column indexes to pick from the lines of the csv filebool = False
whether we should skip reading the header line.str = ','
the column splitter of the csv fileList[NormalizerProtocol]
list of normalizer to normalize each column of data. A NormalizerProtocol
should contains two methods, normalize(float) -> float
to normalize the given datum and restore(float) -> float
to restore the normalized datum.int = -1
max lines of the csv file to be read. Defaults to -1
which means no limit.Resets reader pos
Gets max_lines
Changes max_lines
Returns the converted value of the next line
Returns number of lines has been read
FAQs
csv-dataset helps to read csv files and create descriptive and efficient input pipelines for deep learning in a streaming fashion
We found that csv-dataset demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
Socket researchers uncover how browser extensions in trusted stores are used to hijack sessions, redirect traffic, and manipulate user behavior.
Research
Security News
An in-depth analysis of credential stealers, crypto drainers, cryptojackers, and clipboard hijackers abusing open source package registries to compromise Web3 development environments.
Security News
pnpm 10.12.1 introduces a global virtual store for faster installs and new options for managing dependencies with version catalogs.