Security News
Fluent Assertions Faces Backlash After Abandoning Open Source Licensing
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
An experimental analytics database aiming to set a new standard for query performance and storage efficiency on commodity hardware. See How to Analyze Billions of Records per Second on a Single Desktop PC and How to Read 100s of Millions of Records per Second from a Single Disk for an overview of current capabilities.
Download the latest binary release, which can be run from the command line on most x64 Linux systems, including Windows Subsystem for Linux. For example, to load the file test_data/nyc-taxi.csv.gz
in this repository and start the repl run:
./locustdb --load test_data/nyc-taxi.csv.gz --trips
When loading .csv
or .csv.gz
files with --load
, the first line of each file is assumed to be a header containing the names for all columns. The type of each column will be derived automatically, but this might break for columns that contain a mixture of numbers/strings/empty entries.
To persist data to disk in LocustDB's internal storage format (which allows fast queries from disk after the initial load), specify the storage location with --db-path
When creating/opening a persistent database, LocustDB will open a lot of files and might crash if the limit on the number of open files is too low.
On Linux, you can check the current limit with ulimit -n
and set a new limit with e.g. ulimit -n 4096
.
The --trips
flag will configure the ingestion schema for loading the 1.46 billion taxi ride dataset which can be downloaded here.
For additional usage info, invoke with --help
:
$ ./locustdb --help
LocustDB 0.2.1
Clemens Winter <clemenswinter1@gmail.com>
Massively parallel, high performance analytics database that will rapidly devour all of your data.
USAGE:
locustdb [FLAGS] [OPTIONS]
FLAGS:
-h, --help Prints help information
--mem-lz4 Keep data cached in memory lz4 encoded. Decreases memory usage and query speeds.
--reduced-trips Set ingestion schema for select set of columns from nyc taxi ride dataset
--seq-disk-read Improves performance on HDD, can hurt performance on SSD.
--trips Set ingestion schema for nyc taxi ride dataset
-V, --version Prints version information
OPTIONS:
--db-path <PATH> Path to data directory
--load <FILES> Load .csv or .csv.gz files into the database
--mem-limit-tables <GB> Limit for in-memory size of tables in GiB [default: 8]
--partition-size <ROWS> Number of rows per partition when loading new data [default: 65536]
--readahead <MB> How much data to load at a time when reading from disk during queries in MiB
[default: 256]
--schema <SCHEMA> Comma separated list specifying the types and (optionally) names of all columns in
files specified by `--load` option.
Valid types: `s`, `string`, `i`, `integer`, `ns` (nullable string), `ni` (nullable
integer)
Example schema without column names: `int,string,string,string,int`
Example schema with column names: `name:s,age:i,country:s`
--table <NAME> Name for the table populated with --load [default: default]
--threads <INTEGER> Number of worker threads. [default: number of cores (12)]
A vision for LocustDB.
Query performance for analytics workloads is best-in-class on commodity hardware, both for data cached in memory and for data read from disk.
LocustDB automatically achieves spectacular compression ratios, has minimal indexing overhead, and requires less machines to store the same amount of data than any other system. The trade-off between performance and storage efficiency is configurable.
New data is available for queries within seconds.
LocustDB scales seamlessly from a single machine to large clusters.
LocustDB should be usable with minimal configuration or schema-setup as:
Until LocustDB is production ready these are distractions at best, if not wholly incompatible with the main goals.
LocustDB does not efficiently execute queries inserting or operating on small amounts of data.
LocustDB does not run on GPUs.
git clone https://github.com/cswinter/LocustDB.git
cd LocustDB
--release
for optimal performance:cargo run --release --bin repl -- --load test_data/nyc-taxi.csv.gz --reduced-trips
cargo test
cargo bench
FAQs
Embeddable high-performance analytics database.
We found that locustdb demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
Research
Security News
Socket researchers uncover the risks of a malicious Python package targeting Discord developers.
Security News
The UK is proposing a bold ban on ransomware payments by public entities to disrupt cybercrime, protect critical services, and lead global cybersecurity efforts.