better-sqlite3-schema
Advanced tools
Comparing version 0.2.0 to 0.3.0
{ | ||
"name": "better-sqlite3-schema", | ||
"version": "0.2.0", | ||
"version": "0.3.0", | ||
"description": "Migrate (nested and multi-dimensional) json data to/from sqlite database with better-sqlite3-helper", | ||
@@ -17,4 +17,4 @@ "keywords": [ | ||
"license": "BSD-2-Clause", | ||
"main": "dist/index.js", | ||
"types": "dist/index.d.ts", | ||
"main": "index.js", | ||
"types": "index.d.ts", | ||
"scripts": { | ||
@@ -27,17 +27,9 @@ "test": "npm run rawtest", | ||
"build": "tsc -p tsconfig.build.json", | ||
"postbuild": "mk-package", | ||
"pretest": "npm run format && npm run build", | ||
"rawtest": "ts-node test/index.ts && jest", | ||
"prepublishOnly": "npm run rawtest && npm run build" | ||
"rawtest": "ts-node test/index.ts && jest" | ||
}, | ||
"husky": { | ||
"hook": { | ||
"pre-commit": "npm run format" | ||
} | ||
}, | ||
"directories": { | ||
"test": "test" | ||
}, | ||
"files": [ | ||
"dist" | ||
], | ||
"dependencies": { | ||
@@ -48,19 +40,2 @@ "better-sqlite3": "^7.1.0", | ||
}, | ||
"devDependencies": { | ||
"@beenotung/tslib": "^16.35.0", | ||
"@types/better-sqlite3": "^5.4.0", | ||
"@types/jest": "^26.0.14", | ||
"@types/node": "*", | ||
"husky": "^4.3.0", | ||
"jest": "^26.4.2", | ||
"prettier": "^2.1.2", | ||
"rimraf": "^3.0.2", | ||
"ts-jest": "^26.4.1", | ||
"ts-node": "^9.0.0", | ||
"tslint": "^6.1.3", | ||
"tslint-config-prettier": "^1.18.0", | ||
"tslint-eslint-rules": "^5.4.0", | ||
"tslint-etc": "^1.13.7", | ||
"typescript": "^4.0.3" | ||
}, | ||
"repository": { | ||
@@ -74,2 +49,2 @@ "type": "git", | ||
"homepage": "https://github.com/beenotung/better-sqlite3-schema#readme" | ||
} | ||
} |
@@ -70,2 +70,97 @@ # better-sqlite3-schema | ||
Detail example see `makePredefinedInsertRowFn()` and `makeGeneralInsertRowFn()` in [sample-test.ts](./test/sample-test.ts) | ||
### Functional Approach (compose at runtime) | ||
The functional approach allows one to compose customizable helper functions at runtime. | ||
Explore the dataset and auto built schema with | ||
- `makeSchemaScanner()` | ||
Compose insert functions with | ||
- `makeInsertRowFnFromSchema()` | ||
- `makeDeduplicatedInsertRowFnFromSchema()` | ||
Compose select functions with | ||
- `makeSelectRowFnFromSchema()` | ||
- `makeSelectRefFieldArray()` | ||
- `makeGetRefValueFnFromSchema()` | ||
Detail example see `makePredefinedInsertRowFn()` and `makeGeneralInsertRowFn()` in [functional-test.ts](./example/functional-test.ts) | ||
### Code Generation Approach (compose at build-time) | ||
The code generation approach allows one to compose customizable helper functions at build-time. Which can archive ~50% speed up compared to the runtime composing. | ||
## Benchmark | ||
### Sample 1: HTTP Proxy Log | ||
8GiB of HTTP proxy server log. | ||
Each line is a compact json text. | ||
Sample text: | ||
```json | ||
{"timestamp":1600713130016,"type":"request","userAgent":"Mozilla/5.0 (Linux; Android 10; LIO-AL00) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Mobile Safari/537.36","referer":"https://www.example.net/sw.js","protocol":"https","host":"www.example.net","method":"GET","url":"/build/p-7794655c.js"} | ||
``` | ||
When stored into sqlite3, the data are normalized into multiple tables to avoid duplication, e.g. only storing the full text of each type of user agent and url once. | ||
File size in varies format: | ||
| storage | size | size compared with plain text | Remark | | ||
|---|---|---|---| | ||
| plain text | 8256M | - | | | ||
| sqlite without index | 920M | 11.1% | | | ||
| zip of non-indexed sqlite file | 220M | 2.7% | 23.9% of sqlite3 file | | ||
| sqlite with indices | 1147M | 13.9% | +24% of sqlite file | | ||
| zip of indexed sqlite file | 268M | 3.2% | 23.4% of indexed sqlite3 file | | ||
Time used to import: | ||
- 6 minutes 10 seconds: with inlined helper functions with code generation | ||
- 14 minutes: with runtime-composed helper functions | ||
Optimization used: | ||
- code generation from schema | ||
- bulk insert (batch each 8K items with a transaction) | ||
- cache id of normalized, repeatable values (with js object) | ||
- create unique index on normalized values | ||
- `PRAGMA synchronous = OFF` | ||
- `PRAGMA journal_mode = MEMORY` | ||
- `PRAGMA cache_size = ${(200 * 1000 ** 2) / 4}` | ||
(default page size is 4K, we largely increase the cache_size to avoid massive tedious disk write) | ||
**Remark**: | ||
Using index increases the file size by 1/4, but hugely speeds up the import process. | ||
To archive the best of both aspects, create indices during import; | ||
and remove indices (then VACUUM) for archive file. | ||
It takes 4.9s to build the indices; | ||
and 16.3s to vacuum the database after removal of indices. | ||
### Sample 2: Online Forum Data | ||
291119 sample json data crawled from online forum (threads and posts) | ||
Total size: 843M | ||
The objects have consistent shape. | ||
Some data are duplicated, e.g. user name, and some common comments. | ||
Same as the dataset used in [binary-object](https://github.com/beenotung/binary-object) | ||
File size in varies format: | ||
| storage | size | | ||
|---|---| | ||
| json text | 843M | | ||
| sqlite3 with index | 669M | | ||
| sqlite3 without index | 628M | | ||
| zip of sqlite3 without index | 171M | | ||
**Remark**: | ||
The data in sqlite3 are normalized to avoid duplication |
Major refactor
Supply chain riskPackage has recently undergone a major refactor. It may be unstable or indicate significant internal changes. Use caution when updating to versions that include significant changes.
Found 1 instance in 1 package
70062
0
30
996
166
1