Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

littlefork

Package Overview
Dependencies
Maintainers
1
Versions
22
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

littlefork

A sequential data processing pipeline.

  • 0.5.1
  • Source
  • npm
  • Socket score

Version published
Weekly downloads
7
Maintainers
1
Weekly downloads
 
Created
Source

Littlefork

A modular pipeline for sequential batch processing. We use it for iterative data retrievals and transformations.

About

Installation

npm install --save littlefork

Usage

The application is comprised of a plugin-runner and a set of plugins. To use them install into your NodeJS project Littlefork and any plugins you wish.

npm init
npm install --save littlefork

This gives you access to the Littlefork command:

$(npm bin)/littlefork

By itself, littlefork does not much. You have to install one or more plugins to make littlefork do anything.

npm install --save littlefork-plugin-twitter littlefork-plugin-mongodb

Configuration

Littlefork accepts configuration options as environment variables, command line arguments and from a configuration file.

PROFILE=profile-name PLUGINS=ddg DDG_TERMS=term $(npm bin)/littlefork

is equivalent to

$(npm bin)/littlefork -i profile-name -p ddg --ddg.terms term

is equivalent to

$(npm bin)/littlefork -c pipeline.json

with pipeline.json being a file in JSON format:

{
  "plugins": "ddg",
  "profile": "profile-name",
  "ddg": {
    "terms": "term"
  }
}

The base pipeline must be configured with a profile id and the plugins that form the pipeline. profile and plugins are required configuration options and must be set.

Every plugin that is installed can add additional configuration options. Print the usage help of the command line tool to get a complete list of command line options:

$(npm bin)/littlefork --help

Plugins

A plugin takes a piece of data and returns a transformed version of this data. Littlefork starts with a profile and a pipeline configuration and sequentially uses the output of a plugin as the input for the next plugin.

A plugin resembles a mathematical function. It maps over profile data to produce a new version of that profile data. But we are cheating, a plugin in Littlefork is not total. It has side effects that are managed using promises.

A search on npm lists all available plugins.

Development

Data format

The envelope of the profile data is a nested object of with values of various types. This is a simplified version of the profile data:

{
  "profile": {
    "name": "Some story name",
    "profileId": "some profileId",
    "twitter_handle": "twetter-id"
  },
  "data": [
    {"_lf_source": "twitter_tweets", "tweet": "bahh"},
    {"_lf_source": "twitter_tweets", "tweet": "bahh"}
  ],
  "stats": {}
}

Every data unit is an atomic piece of data. It depends on the data fetching transformation plugin. Various transformation plugins can extend the data format with _lf_ prefixed attributes. The list below is the basic set of littlefork set data entries:

  • _lf_source (String)

    The name of the plugin.

  • _lf_title (String)

    The name of the attribute that functions as title attribute for the data unit. This is a pointer to the real title attribute.

  • _lf_pubdates (Object)

    Plugins register the publishing dates of the data unit. Different plugins can determine different publishing dates.

  • _lf_links (Array)

    • A list if links that were found in the data unit.
  • _lf_images (Array)

    A list of images that were found in the data unit.

  • _lf_created (String)

    The timestamp at which the data unit was created.

  • _lf_profile (String)

    The name of the profile.

  • _lf_id_hash (String)

    A sha1 hash of the data unit identities.

  • _lf_content_hash (String)

    A sha1 hash of the data unit content.

  • _lf_meta (Object)

    Meta information stored by transformations.

API

Plugin Runner
runner

Create a runable littlefork object.

Upon calling this function, you receive a runnable littlefork pipeline and an observable stream. The littlefork pipeline is a function that can be called without any arguments. It will return a promise that resolves to the result of the pipeline run. The stream object is used to receive messages during the pipeline run. It's currently mainly used for logging purposes, but can be used for more as well.

The stream sends messages with the following types:

  • log_info
  • log_debug
  • log_error
  • plugin_start
  • plugin_end

Parameters

  • config Object Configuration for a littlefork run.
  • queryIds Array<String> A list of ids to query.

Examples

const [run, stream] = runner(config, queryIds);

stream.onValue(msg => {
  switch (msg.type) {
    case 'log_info': console.log(msg.msg); break;
    // ... other cases ...
    default: break;
  }
});

run();

Returns Array<Function, Observable> Return a tuple with a function that runs the pipeline and an observable object, that receives messages during the pipeline run. The observable object is BaconJS stream and has the full BaconJS API available.

Plugin
id

A promised identity function.

id :: a -> Future a

Parameters

  • The Any value to returns.

Returns Promise<Any> A promise of the value that was supplied.

fmap

Map a function over a Functor

fmap :: Functor f => (a -> Future b) -> f (Future a) -> Future b fmap :: Functor f => (a -> b) -> f a -> Future b

Parameters

  • f Function The function to apply to the Functor.
  • p (Promise | Any) The functor value to map.

Examples

const p = () => Promise.resolve(1);
const f = v => v + 1;
fmap(f, p);  // Returns a promise resolving to 2.

Returns Promise A promise resolving to the value of p mapped over f.

pure

Lift a value into an applicative.

pure :: Applicative f => a -> f (Future a)

Parameters

  • a Any The value to lift.

Returns Promise A promise that resolves to a.

apply

Apply a function wrapped in a promise to a promisified value.

apply :: Applicative f => f (a -> Future b) -> f (Future a) -> f (Future b) apply :: Applicative f => f (a -> b) -> f a -> f (Future b)

Parameters

Examples

const pf = Promise.resolve(v => v + 1);
const p = Promise.resolve(1);
apply(pf, p); // Returns a promise resolving to 2.

Returns Promise<Any> A promise resolving to p applied to the function that pf resolves to.

liftA2

Lift a binary function over two Applicative.

liftA2 :: Applicative f => f (a -> b -> Future c) -> f (Future a) -> f (Future b) -> f (Future c) liftA2 :: Applicative f => f (a -> b -> Future c) -> f a -> f b -> f (Future c)

Parameters

  • f Function<Any, Any> A binary function.
  • a Promise<Any> A promise that resolves to a value.
  • b Promise<Any> A promise that resolves to a value.

Examples

const a = Promise.resolve(envelope);
const b = Promise.resolve(env);
liftA2(plugin, a, b); // Calls plugin with the value that a and b resolve to.

Returns Promise<Any> The value that f returns when applied to a and b.

Plugin Loading
loadPlugins

Load all plugins available for this littlefork installation.

Examples

const [transformations, queries] = loadPlugins();

Returns Array<Object, Object> A tuple containing two Objects. The first element is a list of transformation plugins, the second contains all query plugins.

Debugging

There is support for the excellent debug library. Use * to print all debug messages and to see which debug target exist.

DEBUG=* $(npm/bin)/littlefork -c config.json

Further more, littlefork can store the whole data set between transformation steps in files. If the VERBOSE_LOG environment is set with a path, the data after each transformation step is stored in that location.

VERBOSE_LOG=/tmp $(npm bin)/littlefork -p ddg,mongodb_store

Plugins

Plugins can extend the functionality of littlefork in three ways:

  • transformations are functions that take data in the littlefork data format and return data of the same format again.
  • hooks are called before each step of transformation in a pipeline. The above mentioned verbose logging is implemented using hooks.
  • profiles define sources to look up profile data. Each pipeline probably needs at least one profile source.

A plugin is a simple npm module. It can export either a single function or an object with functions.

Most plugins are a minor transformation step. If a module is exporting a single function it must be a transformation plugin.

If a module wants to export more than one transformation steps, or it wants to provide a pipeline hook or a profile source, it must export an object where each value is a function.

export {
  hook: (profile) => openRun(profile).disposer(closeRun),
  profile: (id) => get(id).then(result => {
    if (_.isNil(result)) {
      throw new Error(`Profile ${id} not found.`);
    }
    return result;
  }),
  // Transformations
  twitter_feed: (data) => ....,
  twitter_timeline: (data) => ....,
}

hook and profile are special functions. Every other item is treated as a transformation.

Contributing

Keywords

FAQs

Package last updated on 01 Mar 2017

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc