Security News
Opengrep Emerges as Open Source Alternative Amid Semgrep Licensing Controversy
Opengrep forks Semgrep to preserve open source SAST in response to controversial licensing changes.
frictionless.js
Advanced tools
`frictionless.js` is a lightweight, standardized "stream-plus-metadata" interface for accessing files and datasets, especially tabular ones (CSV, Excel).
frictionless.js
is a lightweight, standardized "stream-plus-metadata" interface for accessing files and datasets, especially tabular ones (CSV, Excel).
frictionless.js
follows the "Frictionless Data Lib Pattern".
open
method for data on disk, online and inlineA line of code is worth a thousand words ...
const {open} = require('frictionless.js')
var file = open('path/to/ons-mye-population-totals.xls')
file.descriptor
{
path: '/path/to/ons-mye-population-totals.xls',
pathType: 'local',
name: 'ons-mye-population-totals',
format: 'xls',
mediatype: 'application/vnd.ms-excel',
encoding: 'windows-1252'
}
file.size
67584
file.rows() => stream object for rows
// keyed by header row by default ...
{ 'col1': 1, 'col2': 2, ... }
{ 'col1': 10, 'col2': 20, ... }
Table of Contents
frictionless.js
is motivated by the following use cases:
open
method whether you are accessing data on disk, from a URL or inline data from a string, buffer or array.npm install frictionless.js
If you want to use the it in the browser, first you need to build the bundle.
Run the following command to generate the bundle for the necessary JS targets
yarn build
This will create two bundles in the dist
folder. node
sub-folder contains build for node environment, while browser
sub-folder contains build for the browser. In a simple html file you can use it like this:
<head>
<script src="./dist/browser/bundle.js"></script>
<script>
// Global data lib is available here...
const file = data.open('path/to/file')
...
</script>
</head>
<body></body>
With a simple file:
const data = require('frictionless.js')
// path can be local or remote
const file = data.open(path)
// descriptor with metadata e.g. name, path, format, (guessed) mimetype etc
console.log(file.descriptor)
// returns promise with raw stream
const stream = await file.stream()
// let's get an object stream of the rows
// (assuming it is tabular i.e. csv, xls etc)
const rows = await file.rows()
// entire file as a buffer
const buffer = await file.buffer
//for large files you can return in chunks
await file.bufferInChunks((chunk, progress)=>{
console.log(progress, chunk)
})
With a Dataset:
const { Dataset } = require('frictionless.js')
const path = '/path/to/directory/' // must have datapackage.json in the directory atm
Dataset.load(path).then(dataset => {
// get a data file in this dataset
const file = dataset.resources[0]
const data = file.stream()
})
Load a file from a path or descriptor.
load(pathOrDescriptor, {basePath, format}={})
There are 3 types of file source we support:
const data = require('frictionless.js')
const file = data.open('/path/to/file.csv')
const file = data.open('https://example.com/data.xls')
// loading raw data
const file = data.open({
name: 'mydata',
data: { // can be any javascript - an object, an array or a string or ...
a: 1,
b: 2
}
})
// Loading with a descriptor - this allows more fine-grained configuration
// The descriptor should follow the Frictionless Data Resource model
// http://specs.frictionlessdata.io/data-resource/
const file = data.open({
// file or url path
path: 'https://example.com/data.csv',
// a Table Schema - https://specs.frictionlessdata.io/table-schema/
schema: {
fields: [
...
]
}
// CSV dialect - https://specs.frictionlessdata.io/csv-dialect/
dialect: {
// this is tab separated CSV/DSV
delimiter: '\t'
}
})
basePath
: use in cases where you want to create a File with a path that is relative to a base directory / path e.g.
const file = data.open('data.csv', {basePath: '/my/base/path'})
Will open the file: /my/base/path/data.csv
This functionality is primarily useful when using Files as part of Datasets where it can be convenient for a File to have a path relative to the directory of the Dataset. (See also Data Package and Data Resource in the Frictionless Data specs).
A single data file - local or remote.
DEPRECATED. Use simple open
.
Main metadata is available via the descriptor
:
file.descriptor
This metadata is a combination of the metadata passed in at File creation (if you created the File with a descriptor object) and auto-inferred information from the File path. This is the info that is auto-inferred:
path: path this was instantiated with - may not be same as file.path (depending on basePath)
pathType: remote | local
name: file name (without extension)
format: the extension
mediatype: mimetype based on file name and extension
In addition to this metadata there are certain properties which are computed on demand:
// the full path to the file (using basepath)
const path = file.path
const size = file.size
// md5 hash of the file
const hash = file.hash()
// sha256 hash of the file
const hash256 = file.hash(hashType='sha256')
// file encoding
const encoding = file.encoding
Note: size, hash are not available for remote Files (those created from urls).
stream()
Get readable stream
@returns Promise with readable stream object on resolve
File.buffer
Get this file as a buffer (async)
@returns: promise which resolves to the buffer
rows({keyed}={})
Get the rows for this file as a node object stream (assumes underlying data is tabular!)
@returns Promise with rows as parsed JS objects (depends on file format)
keyed
: if false
(default) returns rows as arrays. If true
returns rows as objects.TODO: casting (does data get cast automatically for you or not ...)
What formats are supported?
The rows functionality is currently available for CSV and Excel files. The Tabular support incorporates supports for Table Schema and CSV Dialect e.g. you can do:
// load a CSV with a non-standard dialect e.g. tab separated or semi-colon separated
const file = data.open({
path: 'mydata.tsv'
// Full support for http://specs.frictionlessdata.io/csv-dialect/
dialect: {
delimiter: '\t' // for tabs or ';' for semi-colons etc
}
})
// open a CSV with a Table Schema
const file = data.open({
path: 'mydata.csv'
// Full support for Table Schema https://specs.frictionlessdata.io/table-schema/
schema: {
fields: [
{
name: 'Column 1',
type: 'integer'
},
...
]
}
})
A collection of data files with optional metadata.
Under the hood it heavily uses Data Package formats and it natively supports Data Package formats including loading from datapackage.json
files. However, it does not require knowledge or use of Data Packages.
A Dataset has four primary properties:
descriptor
: key metadata. The descriptor follows the Data Package specresources
: an array of the Files contained in this Datasetidentifier
: the identifier encapsulates the location (or origin) of this Datasetreadme
: the README for this Dataset (if it exists). The readme content is taken from the README.md file located in the Dataset root directory, or, if that does not exist from the readme
property on the descriptor. If neither of those exist the readme will be undefined or null.In addition we provide the convenience attributes:
path
: the path (remote or local) to this datasetdataPackageJsonPath
: the path to the datapackage.json
for this Dataset (if it exists)To create a new Dataset object use Dataset.load
. It takes descriptor Object or identifier string:
async Dataset.load(pathOrDescriptor, {owner = null} = {})
pathOrDescriptor
- can be one of:
datapackage.json
and README.md
-- if README exists)For example:
const data = require('frictionless.js')
const pathOrDescriptor = 'https://raw.githubusercontent.com/datasets/co2-ppm/master/datapackage.json'
const dataset = await data.Dataset.load(pathOrDescriptor)
Add a resource to the Dataset:
addResource(resource)
resource
: may be an already instantiated File object or it is a resource descriptor// seeks to guess whether a given path is the path to a Dataset or a File
// (i.e. a directory or datapackage.json)
data.isDataset(path)
// parses dataset path and returns identifier dictionary
// handles local paths, remote URLs as well as DataHub and GitHub specific URLs
// (e.g., https://datahub.io/core/finance-vix or https://github.com/datasets/finance-vix
const identifier = data.parseDatasetIdentifier(path)
console.log(identifier)
and it prints out:
{
name: <name>,
owner: <owner>,
path: <path>,
type: <type>,
original: <path>,
version: <version>
}
Requirements:
We have two type of tests Karma based for browser testing and Mocha with Chai for Node. All node tests are in datajs/test
folder. Since Mocha is sensitive to test namings, we have separate the folder /browser-test
for only Karma.
dist/browser
folder. Run: yarn build:browser
to achieve this, then for browser testing use the command yarn test:browser
, this will run Karma tests.yarn test:node
yarn test
yarn test:node:watch
Git clone the repo
Install dependencies: yarn
To make the browser and node test work, first run the build: yarn build
Run tests: yarn test
Do some dev work
Once done, make sure tests are passing. Then build distribution version of the app - yarn build
.
Run yarn build
to compile using webpack and babel for different node and web target. To watch the build run: yarn build:watch
.
Now proceed to "Deployment" stage
package.json
.git commit -m "some message, eg, version"
.git tag -a v0.12.0 -m "some message"
.git push origin master --tags
npm publish
FAQs
`frictionless.js` is a lightweight, standardized "stream-plus-metadata" interface for accessing files and datasets, especially tabular ones (CSV, Excel).
We found that frictionless.js demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Opengrep forks Semgrep to preserve open source SAST in response to controversial licensing changes.
Security News
Critics call the Node.js EOL CVE a misuse of the system, sparking debate over CVE standards and the growing noise in vulnerability databases.
Security News
cURL and Go security teams are publicly rejecting CVSS as flawed for assessing vulnerabilities and are calling for more accurate, context-aware approaches.