
Research
PyPI Package Disguised as Instagram Growth Tool Harvests User Credentials
A deceptive PyPI package posing as an Instagram growth tool collects user credentials and sends them to third-party bot services.
csv-dataset helps to read csv files and create descriptive and efficient input pipelines for deep learning in a streaming fashion
CsvDataset
helps to read a csv file and create descriptive and efficient input pipelines for deep learning.
CsvDataset
iterates the records of the csv file in a streaming fashion, so the full dataset does not need to fit into memory.
$ pip install csv-dataset
Suppose we have a csv file whose absolute path is filepath
:
open_time,open,high,low,close,volume
1576771200000,7145.99,7150.0,7141.01,7142.33,21.094283
1576771260000,7142.89,7142.99,7120.7,7125.73,118.279931
1576771320000,7125.76,7134.46,7123.12,7123.12,41.03628
1576771380000,7123.74,7128.06,7117.12,7126.57,39.885367
1576771440000,7127.34,7137.84,7126.71,7134.99,25.138154
1576771500000,7134.99,7144.13,7132.84,7141.64,26.467308
...
from csv_dataset import (
Dataset,
CsvReader
)
dataset = CsvDataset(
CsvReader(
filepath,
float,
# Abandon the first column and only pick the following
indexes=[1, 2, 3, 4, 5],
header=True
)
).window(3, 1).batch(2)
for element in dataset:
print(element)
The following output shows one print.
[[[7145.99, 7150.0, 7141.01, 7142.33, 21.094283]
[7142.89, 7142.99, 7120.7, 7125.73, 118.279931]
[7125.76, 7134.46, 7123.12, 7123.12, 41.03628 ]]
[[7142.89, 7142.99, 7120.7, 7125.73, 118.279931]
[7125.76, 7134.46, 7123.12, 7123.12, 41.03628 ]
[7123.74, 7128.06, 7117.12, 7126.57, 39.885367]]]
...
Defines the window size, shift and stride.
The default window size is 1
which means the dataset has no window.
Parameter explanation
Suppose we have a raw data set
[ 1 2 3 4 5 6 7 8 9 ... ]
And the following is a window of (size=4, shift=3, stride=2)
|-------------- size:4 --------------|
|- stride:2 -| |
| | |
win 0: [ 1 3 5 7 ] --------|-----
shift:3
win 1: [ 4 6 8 10 ] --------|-----
win 2: [ 7 9 11 13 ]
...
Defines batch size.
The default batch size of the dataset is 1
which means it is single-batch
If batch is 2
batch 0: [[ 1 3 5 7 ]
[ 4 6 8 10 ]]
batch 1: [[ 7 9 11 13 ]
[ 10 12 14 16 ]]
...
Gets the data of the next batch
Resets dataset
True
, the dataset will reset the data of the previous window in the bufferReads multiple batches at a time
If we reset_buffer
, then the next read will not use existing data in the buffer, and the result will have no overlap with the last read.
Reset buffer, so that the next read will have no overlap with the last one
Calculates and returns how many lines of the underlying datum are needed for reading reads
times
Calculates max_lines
lines could afford how many reads
Calculates the current reader could afford how many reads.
If max_lines
of current reader is unset, then it returns None
str
absolute path of the csv fileCallable
data type. We should only use float
or int
for this argument.List[int]
column indexes to pick from the lines of the csv filebool = False
whether we should skip reading the header line.str = ','
the column splitter of the csv fileList[NormalizerProtocol]
list of normalizer to normalize each column of data. A NormalizerProtocol
should contains two methods, normalize(float) -> float
to normalize the given datum and restore(float) -> float
to restore the normalized datum.int = -1
max lines of the csv file to be read. Defaults to -1
which means no limit.Resets reader pos
Gets max_lines
Changes max_lines
Returns the converted value of the next line
Returns number of lines has been read
FAQs
csv-dataset helps to read csv files and create descriptive and efficient input pipelines for deep learning in a streaming fashion
We found that csv-dataset demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
A deceptive PyPI package posing as an Instagram growth tool collects user credentials and sends them to third-party bot services.
Product
Socket now supports pylock.toml, enabling secure, reproducible Python builds with advanced scanning and full alignment with PEP 751's new standard.
Security News
Research
Socket uncovered two npm packages that register hidden HTTP endpoints to delete all files on command.