Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

datom

Package Overview
Dependencies
Maintainers
1
Versions
41
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

datom

standardized immutable objects in the spirit of datomic, especially suited for use in data pipelines

  • 2.0.0
  • Source
  • npm
  • Socket score

Version published
Weekly downloads
0
decreased by-100%
Maintainers
1
Weekly downloads
 
Created
Source

Datom ⚛

Table of Contents generated with DocToc

standardized immutable objects in the spirit of datomic, especially suited for use in data pipelines

NOTE: Documentation is still fragmentary. WIP.

Export Bound Methods

If you plan on using methods like new_datom() or select() a lot, consider using .export():

DATOM         = require 'datom'
{ new_datom
  select }    = DATOM.export()

Now new_datom() and select() are methods bound to DATOM. (Observe that because of the JavaScript 'tear-off' effect, when you do method = DATOM.method, then method() will likely fail as its reference to this has been lost.)

Creation of Bespoke Library Instances

In order to configure a copy of the library, pass in a settings object:

_DATOM        = require 'datom'
settings      = { merge_values: false, }
DATOM         = new _DATOM.Datom settings
{ new_datom
  select }    = DATOM.export()

Or, mode idiomatically:

DATOM         = new ( require 'datom' ).Datom { merge_values: false, }
{ new_datom
  select }    = DATOM.export()

The second form also helps to avoid accidental usage of the result of require 'datom', which is of course the same library with a different configuration.

Configuration Parameters

  • merge_values (boolean, default: true)—Whether to merge attributes of the second argument to new_datom() into the resulting value. When set to false, new_datom '^somekey', somevalue will always result in a datom { $key: '^somekey', $value: somevalue, }; when left to the default, and if somevalue is an object, then its attributes will become attributes of the datom, which may result in name clashes in case any attribute name should start with a $ (dollar sign).

  • freeze (boolean, default: true)—Whether to freeze datoms. When set to false, no freezing will be performed, which may entail slightly improved performance.

Methods

Freezing & Thawing

  • @freeze = ( d ) ->
  • @thaw = ( d ) ->
  • @lets = ( original, modifier ) ->
  • @set = ( d, k, P... ) ->
  • @unset = ( d, k ) ->

Stamping

  • @stamp = ( d, P... ) ->
  • @unstamp = ( d ) ->

Type Testing

  • @is_system = ( d ) ->
  • @is_stamped = ( d ) ->
  • @is_fresh = ( d ) ->
  • @is_dirty = ( d ) ->

Value Creation

  • @new_datom = ( $key, $value, other... ) ->
  • @new_single_datom = ( $key, $value, other... ) ->
  • @new_open_datom = ( $key, $value, other... ) ->
  • @new_close_datom = ( $key, $value, other... ) ->
  • @new_system_datom = ( $key, $value, other... ) ->
  • @new_text_datom = ( $value, other... ) ->
  • @new_end_datom = ->
  • @new_warning = ( ref, message, d, other... ) ->

Selecting

  • @select = ( d, selector ) ->

System Properties

  • d.$key—key (i.e., type) of a datom.
  • d.$value—'the' proper value of a datom. This is always used in case new_datom() was called with a non-object in the value slot (as in new_datom '^mykey', 123), or when the library was configured with { merge_values: false, }.—In case there is no d.$value, the datom's proper value is the object that would result from deleting all properties whose names start with a $ (dollar sign).
  • d.$dirty—whether the object has been (thawed, then) changed (and then frozen again) since its $dirty property was last cleared or set to false.
  • d.$stamped—whether the object has been marked as 'stamped' (i.e., processed).

WIP

The below copied from PipeDreams docs, to be updated

PipeDreams Datoms (Data Events)

Data streams—of which pull-streams, PipeStreams, and NodeJS Streams are examples—do their work by sending pieces of data (that originate from a data source) through a number of transforms (to finally end up in a data sink).note

(note) I will ignore here alternative ways of dealing with streams, especially the EventEmitter way of dealing with streamed data. When I say 'streams', I also implicitly mean 'pipelines'; when I say 'pipelines', I also implicitly mean 'pipelines to stream data' and 'streams' in general.

When NodeJS streams started out, the thinking about those streams was pretty much confined to saying that 'a stream is a series of bytes'. Already back then, an alternative view took hold (I'm slightly paraphrasing here):

The core interpretation was that stream could be buffers or strings - but the userland interpretation was that a stream could be anything that is serializeable [...] it was a sequence of buffers, bytes, strings or objects. Why not use the same api?

I will no repeat here what I've written about perceived shortcomings of NodeJS streams; instead, let me iterate a few observations:

  • In streaming, data is just data. There's no need for having a separate 'Object Mode' or somesuch.

  • There's a single exception to the above rule, and that is when the data item being sent down the line is null. This has historically—by both NodeJS streams and pull-streams—been interpreted as a termination signal, and I'm not going to change that (although at some point I might as well).

  • When starting out with streams and building fairly simple-minded pipelines, sending down either raw pieces of business data or else null to indicate termination is enough to satisfy most needs. However, when one transitions to more complex environments, raw data is not sufficient any more: When processing text from one format to another, how could a downstream transform tell whether a given piece of text is raw data or the output of an upstream transform?

    Another case where raw data becomes insufficient are circular pipelines—pipelines that re-compute (some or all) output values in a recursive manner. An example which outputs the integer sequences of the Collatz Conjecture is in the tests folder. There, whenever we see an even number n, we send down that even number n alongside with half its value, n/2; whenever we see an odd number n, we send it on, followed by its value tripled plus one, 3*n+1. No matter whether you put the transform for even numbers in front of that for odd numbers or the other way round, there will be numbers that come out at the bottom that need to be re-input into the top of the pipeline, and since there's no telling in advance how long a Collatz sequence will be for a given integer, it is, in the general case, insufficient to build a pipeline made from a (necessarily finite) repetitive sequence of copies of those individual transforms. Thus, classical streams cannot easily model this kind of processing.

The idea of datoms—short for data atoms, a term borrowed from Rich Hickey's Datomic—is to simply to wrap each piece of raw data in a higher-level structure. This is of course an old idea, but not one that is very prevalent in NodeJS streams, the fundamental assumption (of classical stream processing) being that all stream transforms get to process each piece of data, and that all pieces of data are of equal status (with the exception of null).

The PipeDreams sample implementation of Collatz Sequences uses datoms to (1) wrap the numerical pieces of data, which allows to mark data as processed (a.k.a. 'stamped'), to (2) mark data as 'to be recycled', and to (3) inject system-level synchronization signals into the data stream to make sure that recycled data gets processed before new data is allowed into the stream.

In PipeDreams datoms, each piece of data is explicitly labelled for its type; each datom may have a different status: there are system-level datoms that serve to orchestrate the flow of data within the pipeline; there are user-level datoms which originate from the application; there are datoms to indicate the opening and closing of regions (phases) in the data stream; there are stream transforms that listen to and act on specific system-level events.

Datoms are JS objects that must minimally have a key property, a string that specifies the datom's category, namespace and name; in addition, they may have a value property with the payload (where desired), and any number of other attributes. The property $ is used to carry metadata (e.g. from which line in a source file a given datom was generated from). Thus, we may give the outline of a datom as (in a rather informal notation) d := { key, ?value, ?stamped,..., ?$, }.

The key of a datom must be a string that consists of at least two parts, the sigil and the name. The sigil, a single punctuation character, indicates the 'category' of each datom; there are two levels and three elementary categories, giving six types of datoms:

  • Application level:

    • ^ for data datoms (a.k.a. 'singletons'),
    • < for start-of-region datoms,
    • > for end-of-region datoms.
  • System level:

    • ~ for data datoms,
    • [ for start-of-region datoms,
    • ] for end-of-region datoms.

Normally, one will probably want to send around business data inside (the value property of) application-level data datoms (hence their name, also shortened to D-datoms); however, one can also set other properties of datom objects, or send data around using properties of start- or end-of-region datoms.

Region events are intended to be used e.g. when parsing text with markup; say you want to turn a snippet of HTML like this:

<document><div>Helo <em>world!</em></div></document>

into another textual representation, you may want to turn that into a sequence of datoms similar to these, in the order of sending and regions symbolized by boxes:note

--------------------------------------------------------+
  { key: '<document',                   }   # d1        |
------------------------------------------------------+ |
  { key: '<div',                        }   # d2      | |
  { key: '^text',     value: "Helo ",   }   # d3      | |
----------------------------------------------------+ | |
  { key: '<em',                         }   # d4    | | |
  { key: '^text'      value: "world!",  }   # d5    | | |
  { key: '>em',                         }   # d6    | | |
----------------------------------------------------+ | |
  { key: '>div',                        }   # d7      | |
------------------------------------------------------+ |
  { key: '>document',                   }   # d8        |
--------------------------------------------------------+

note by 'in the order of sending' I mean you'd have to send datom d1 first, then d2 and so on. Trivial until you imagine you write a pipeline and then picture how the events will travel down that pipeline:

pipeline.push $do_this() # s1, might be processing d3 right now
pipeline.push $do_that() # s2, might be processing d2 right now
pipeline.push $do_something_else() # s3, might be processing d1 right now

Although there's really no telling whether step s3 will really process datom d1 at the 'same point in time' that step s2 processes datom d2 and so on (in the strict sense, this is hardly possible in a single-threaded language anyway), the visualization still holds a grain of truth: stream transforms that come 'later' (further down) in the pipeline will see events near the top of your to-do list first, and vice versa. This can be mildly confusing.

select = ( d, selector ) ->

The select method can be used to determine whether a given event d matches a set of conditions; typically, one will want to use select d, selector to decide whether a given event is suitable for processing by the stream transform at hand, or whether it should be passed on unchanged.

The current implementation of select() is much dumber and faster than its predecessors; where previously, it was possible to match datoms with multiple selectors that contained multiple sigils and so forth, the new version does little more than check wheter the single selector allowed equals the given datom's key value—that's about it, except that one can still select d, '^somekey#stamped' to match both unstamped and stamped datoms.


To Do

  • implement piecemeal structural validation such that on repeated calls to a validator instance's validate() method an error will be thrown as soon as unbalanced regions (delimeted by { $key: '<token', ..., } and { $key: '>token', ..., }) are encountered.

Keywords

FAQs

Package last updated on 12 Nov 2019

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc