Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

littlefork

Package Overview
Dependencies
Maintainers
1
Versions
22
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

littlefork

A sequential data processing pipeline.

  • 0.1.1
  • Source
  • npm
  • Socket score

Version published
Weekly downloads
7
Maintainers
1
Weekly downloads
 
Created
Source

Littlefork

A modular pipeline for sequential batch processing. We use it for iterative data retrievals and transformations.

About

Installation

npm install --save littlefork

Usage

The application is comprised of a plugin-runner and a set of plugins. To use them install into your NodeJS project Littlefork and any plugins you wish.

npm init
npm install --save littlefork

This gives you access to the Littlefork command:

$(npm bin)/littlefork

By itself, littlefork does not much. You have to install one or more plugins to make littlefork do anything.

npm install --save littlefork-plugin-twitter littlefork-plugin-mongodb

Configuration

Littlefork accepts configuration options as environment variables, command line arguments and from a configuration file.

PROFILE=profile-name PLUGINS=ddg DDG_TERMS=term $(npm bin)/littlefork

is equivalent to

$(npm bin)/littlefork -i profile-name -p ddg --ddg.terms term

is equivalent to

$(npm bin)/littlefork -c pipeline.json

with pipeline.json being a file in JSON format:

{
  "plugins": "ddg",
  "profile": "profile-name",
  "ddg": {
    "terms": "term"
  }
}

The base pipeline must be configured with a profile id and the plugins that form the pipeline. profile and plugins are required configuration options and must be set.

Every plugin that is installed can add additional configuration options. Print the usage help of the command line tool to get a complete list of command line options:

$(npm bin)/littlefork --help

Plugins

A plugin takes a piece of data and returns a transformed version of this data. Littlefork starts with a profile and a pipeline configuration and sequentially uses the output of a plugin as the input for the next plugin.

A plugin resembles a mathematical function. It maps over profile data to produce a new version of that profile data. But we are cheating, a plugin in Littlefork is not total. It has side effects that are managed using promises.

A search on npm lists all available plugins.

Development

Data format

The envelope of the profile data is a nested object of with values of various types. This is a simplified version of the profile data:

{
  "profile": {
    "name": "Some story name",
    "profileId": "some profileId",
    "twitter_handle": "twetter-id"
  },
  "data": [
    {"_lf_source": "twitter_tweets", "tweet": "bahh"},
    {"_lf_source": "twitter_tweets", "tweet": "bahh"}
  ],
  "stats": {}
}

Every data unit is an atomic piece of data. It depends on the data fetching transformation plugin. Various transformation plugins can extend the data format with _lf_ prefixed attributes. The list below is the basic set of littlefork set data entries:

  • _lf_source (String)

    The name of the plugin.

  • _lf_title (String)

    The name of the attribute that functions as title attribute for the data unit. This is a pointer to the real title attribute.

  • _lf_pubdates (Object)

    Plugins register the publishing dates of the data unit. Different plugins can determine different publishing dates.

  • _lf_links (Array)

    • A list if links that were found in the data unit.
  • _lf_images (Array)

    A list of images that were found in the data unit.

  • _lf_created (String)

    The timestamp at which the data unit was created.

  • _lf_profile (String)

    The name of the profile.

  • _lf_id_hash (String)

    A sha1 hash of the data unit identities.

  • _lf_content_hash (String)

    A sha1 hash of the data unit content.

  • _lf_meta (Object)

    Meta information stored by transformations.

Debugging

There is support for the excellent debug library. Use * to print all debug messages and to see which debug target exist.

DEBUG=* $(npm/bin)/littlefork -c config.json

Further more, littlefork can store the whole data set between transformation steps in files. If the VERBOSE_LOG environment is set with a path, the data after each transformation step is stored in that location.

VERBOSE_LOG=/tmp $(npm bin)/littlefork -p ddg,mongodb_store

Plugins

Plugins can extend the functionality of littlefork in three ways:

  • transformations are functions that take data in the littlefork data format and return data of the same format again.
  • hooks are called before each step of transformation in a pipeline. The above mentioned verbose logging is implemented using hooks.
  • profiles define sources to look up profile data. Each pipeline probably needs at least one profile source.

A plugin is a simple npm module. It can export either a single function or an object with functions.

Most plugins are a minor transformation step. If a module is exporting a single function it must be a transformation plugin.

If a module wants to export more than one transformation steps, or it wants to provide a pipeline hook or a profile source, it must export an object where each value is a function.

export {
  hook: (profile) => openRun(profile).disposer(closeRun),
  profile: (id) => get(id).then(result => {
    if (_.isNil(result)) {
      throw new Error(`Profile ${id} not found.`);
    }
    return result;
  }),
  // Transformations
  twitter_feed: (data) => ....,
  twitter_timeline: (data) => ....,
}

hook and profile are special functions. Every other item is treated as a transformation.

Contributing

Keywords

FAQs

Package last updated on 29 Jun 2016

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc