Socket
Socket
Sign inDemoInstall

gather-cli

Package Overview
Dependencies
7
Maintainers
1
Versions
6
Alerts
File Explorer

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

    gather-cli

Merge JSON files, with a twist: optionally add metadata from the filename or the files' stats to each dataset.


Version published
Weekly downloads
6
increased by500%
Maintainers
1
Install size
1.17 MB
Created
Weekly downloads
 

Readme

Source

Gather

Build Status

Gather is a command-line tool that merges JSON files, with a twist: gather can optionally add metadata from the filename or the file's stats to each dataset. Because sometimes filenames are just meaningless descriptors, but often they're not.

Install with NPM (bundled with node.js):

npm install gather-cli -g

Examples

Combine all of last month's analytics data into a single file, without losing track of when those analytics were recorded:

gather 'analytics/{date}.json' > metrics.json

Convert your Markdown blogposts with YAML frontmatter into JSON, bundle them together with Gather and then render them:

yaml2json posts \
    --output posts \
    --prose \
    --convert markdown
gather 'posts/{year}-{month}-{day}-{permalink}.json' \
    --annotate \
    --output posts/all.json
render post.jade
    --input posts/all.json \
    --output 'build/{year}/{permalink}.html' \
    --many

Reorganize your data with a gather-and-groupby one-two punch:

gather 'staff/{department}/{username}.json' | \
groupby 'staff/{office}/{firstName}-{lastName}.json' --unique

Path metadata

By default, filled-in filename placeholders will get added to the data.

With this gather command...

gather 'analytics/{date}.json' > metrics.json

... the resulting metrics.json file will contain a date key

[
    {
        "date": "2014-10-01", 
        ...
    }, 
    {
        "date": "2014-10-02", 
        ...
    }, 
    ...
]

File metadata

File metadata includes:

  • an extended JSON representation of the file's created, modified and accessed date
  • if the file path contains {year}, {month} and {day} placeholders, a date inferred from these variables in the same extended JSON format
  • the file's absolute and relative path, basename and extension

While path metadata is enabled by default, file metadata is not. Use the --annotate flag to enable file metadata.

Here's an example of file metadata:

{
    "origin": {
        "relative": "...", 
        "absolute": "...", 
        "basename": "...", 
        "extension": "..."
    }, 
    "date": {
        "accessed": {
            "iso": "...", 
            "year": ..., 
            "month": ..., 
            "day": ...,
            ...
        }, 
        "modified": ..., 
        "created": ..., 
        "inferred": ...
    }
}

Compact, underscored and extended metadata naming schemes

Metadata from the filename or from the file's stats can conflict with keys already present in the data. If you are concerned about naming clashes, there are two ways to avoid this:

  • ask gather to either underscore any metadata with the --scheme underscored option
  • put the original data under data and metadata under metadata with --scheme extended, as opposed to merging those in at the root.

An example of the extended naming scheme:

{
    "origin": "file path, extension et cetera", 
    "date": "created, modified, accessed and inferred dates", 
    "metadata": "metadata extracted from path placeholders", 
    "data": "the original data"
}

Partial rebuilds

When adding additional metadata using the --annotate option, the origin of each piece of data that makes up the merged dataset will be a part of the output. This metadata makes it possible, on subsequent gathering operations, to only update or remove data that has changed rather than redoing the entire merge from scratch.

For example, you've added a new staff member at /staff/smith.json and would like to update the staff.json file which contains thousands of staff members. For every staff member in /staff, gather will first try to see if it can't get up-to-date information from the existing staff.json file. Only for smith.json it can't, so only the smith.json will need to be loaded and parsed from disk.

Especially when merging thousands of files, these partial rebuilds dramatically speed up gathering operations. Because the caching mechanism is generally safe (it will never use stale data, it will remove data for files that are no longer there, et cetera) it is enabled by default.

Nevertheless, it is possible to disable partial rebuilds: use --force to force a full redo of the merge. Alternatively, just rm the output file before using gather.

Use from node.js

var gather = require('gather-cli');
var source = 'examples/staff';
var options = {
    "extended": true, 
    "scheme": "underscored"
}
gather(source, options, function(err, staffMembers) {
    staffMembers.forEach(function(staff){
        console.log(staff.name);
    });
});

FAQs

Last updated on 16 Feb 2015

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc