DataSource
Library to power streamig data visualisations as used in Facebook's Flipper
Installation
yarn add flipper-data-source
Links
flipper-data-source powering the ADB logs view in Flipper
Contstraints & Benefits
This library builds a map-reduce inspired data processing pipeline that stores data, and can incrementally update existing visualizations when new data arrives or existing data is updated. It achieves this by emitting events that describe how a visualisation should be changed over time, rather than computing & providing a fresh immutable dataset when the stored data is updated. Some benefits:
- Appending or updating records is roughly
O(update_size)
instead of O(dataset_size)
, while still respecting filtering, sorting and windowing - Virtualization (windowing) is built in.
- Dynamic row wrapping is supported when rendering tables.
- Any stable JS function can be used for sorting and filtering, without impacting performance significantly.
This library is designed with the following constraints in mind:
- The full dataset is kept in memory (automatically removing old items is supported to prevent unlimited growth)
- New data or data updates arrive 'streaming' over time. For rendering a large fixed dataset that doesn't evolve over time this abstraction offers no benefits.
After applying this abstraction (right two sections), in Flipper we saw a 10-20 fold framerate increase while displaying 100K log items, to which new records where added at a rate of ~50/sec. The above image is taken while tailing the logs (so it continuesly scrolls with the arrival of new data), while a search filter is also active. In the first two sections, scripting ate away most of the CPU, result in just a couple of frames per second. After applying these changes CPU time is primarily spend on cranking out more frames resulting in a smooth rather than stuttering scroll (see the lightning talk below for a demo). On top of that the base situation uses a fixed row-height, while the new situation supports text wrapping.
More detailed explanation
DataSource
Many visualization and underlying storage abstractions are optimised for large but fixed datasets.
This abstractions is optimised for visualization that need to present datasets that are continuesly updated / expanded.
The significant difference to many other solutions is that DataSource doesn't produce an immutable dataset that is swapped out every time the data is changed.
Instead, it keeps internally a mutable dataset (the records stored themselves are still immutable but can be replaced) to which new entries are added.
However, instead of propagating the dataset to the rendering layer, events are emitted instead.
DataSourceView
Conceptually, DataSourceView
is a materialized view of a DataSource
.
For visualizations, typically the following transformations need to be applied: filter/search, sorting and windowing.
Where many libraries applies these transformations as part of the rendering, DataSourceView applies these operations directly when updates to the dataset are received.
As a result the transformations need to be applied only to the newly arriving data.
For example, if a new record arrives for a sorted dataset, we will apply a binary inseration sort for the new entry, avoiding the need for a full re-sort of the dataset during Rendering.
Once the dataset is updated, the DataSource will emit events to the DataSourceRenderer
, rather than providing it a new dataset.
The events will describe how the current view should be updated to reflect the data changes.
DataSourceRendererVirtual
DataSourceRendererVirtual
is one of the possible visualizations of a DataSourceView.
It takes care of subscribing to the events emitted by the DataSourceView
, and applies them when they are relevant (e.g. within the visible window).
Beyond that, it manages virtualizations (using the react-virtual
library), so that for example scroll interactions are used to move the window of theDataSourceView
.
Typically this component is used as underlying abstraction for a Table representation.
DataSourceRendererStatic
A simplified (and not very efficient) render for DataSource that doens't use virtualization. Use this as basic for a natural growing representaiton.
Example usage
Using DataSource in a table
Excerpt from https://codesandbox.io/s/flipper-datasource-demo-iy0tq?file=/src/CBDataSource.tsx:
export type RowData = {
product_id: string;
price: number;
};
export function CBDataSource() {
const [dataSource] = useState(() => new DataSource<RowData>(undefined));
const [search, setSearch] = useState("");
const [sorted, setSorted] = useState(false);
const [sticky, setSticky] = useState(false);
useEffect(() => {
return streamCoinbase((event) => {
dataSource.append({
product_id: event.product_id,
price: parseFloat(event.price)
});
});
}, []);
useEffect(() => {
dataSource.view.setFilter(
search ? (r) => r.product_id.includes(search) : undefined
);
}, [search]);
useEffect(() => {
dataSource.view.setSortBy(sorted ? "price" : undefined);
}, [sorted]);
return (
<div>
{/* toolbar omitted */}
<div className="table">
<DataSourceRendererVirtual
dataSource={dataSource}
itemRenderer={rowRenderer}
autoScroll={sticky}
/>
</div>
</div>
);
}
function rowRenderer(row: RowData) {
return <Row row={row} />;
}
function Row({ row }: { row: RowData }) {
return (
<div className="row">
<div>{row.product_id}</div>
<div>{row.price}</div>
</div>
);
}
Using DataSource in a Chart
Experimental. See: https://codesandbox.io/s/flipper-datasource-demo-iy0tq?file=/src/DataSourceChart.tsx
Future work
Project setup:
Features: