Timeseer.AI Client
The Timeseer.AI Client is a Python SDK to access the functionality of Timeseer.AI.
Built on Apache Arrow,
the SDK integrates natively with the Pandas or Polars ecosystems.
Installing
The Timeseer.AI Client is available on PyPI.
(venv) $ pip install timeseer
Connecting
The Timeseer.AI Client uses the Timeseer REST API and uses Apache Arrow where possible to make data transfers efficient.
Communications are protected by an API key.
An API key can be generated within Timeseer under Configure > API keys
.
Each API key has a name and a secret value that is shown only once.
The API key is used to create a connection to a Timeseer instance running at a specific host and port:
>>> from timeseer_client import *
>>> api_key=('<api-key-name>', '<api-key>')
>>> client = Client(api_key, host='localhost', port=8081)
Functionality Overview
In Timeseer,
time series data is available through two concepts:
- Sources contain a varying number of time series that are constantly updated with new data.
- Data Sets contain a fixed number of time series in a specific time range.
Sources are typically used for continuous monitoring of data,
while Data Sets are the starting point for a data science project.
Time series data from Sources and Data Sets is processed by Flows.
Flows analyze data or create derived Data Sets.
Insights and data that is generated by Flows are made available through Data Services.
The Timeseer.AI Client represents each of these concepts as a separate class that exposes the functionality that is specific to that concept.
Each concept class is created by passing the Client
to the constructor.
Full documentation is available in the code by running:
>>> import timeseer_client
>>> help(timeseer_client)
Usage
This usage sample generates a sine wave in using Pandas and numpy.
Values below 0 of the sine wave are assumed to be the result of a faulty sensor reading.
It shows how Timeseer can be used to analyze this and how it automatically creates a derived data set.
First install Pandas:
(venv) $ pip install pandas
Generate the sine wave data:
>>> import numpy as np
>>> import pandas as pd
>>> ts = pd.date_range("2022-01-01T00:00:00Z", "2022-02-01T00:00:00Z", freq="h")
>>> values = np.round(10 * np.sin(2 * np.pi * ((ts.astype(np.int64) // 10**9) - ts[0].timestamp()) / (24*60*60)), decimals=2)
>>> df = pd.DataFrame(dict(ts=ts, value=values))
>>> df.head(20)
ts value
0 2022-01-01 00:00:00+00:00 0.00
1 2022-01-01 01:00:00+00:00 2.59
2 2022-01-01 02:00:00+00:00 5.00
3 2022-01-01 03:00:00+00:00 7.07
4 2022-01-01 04:00:00+00:00 8.66
5 2022-01-01 05:00:00+00:00 9.66
6 2022-01-01 06:00:00+00:00 10.00
7 2022-01-01 07:00:00+00:00 9.66
8 2022-01-01 08:00:00+00:00 8.66
9 2022-01-01 09:00:00+00:00 7.07
10 2022-01-01 10:00:00+00:00 5.00
11 2022-01-01 11:00:00+00:00 2.59
12 2022-01-01 12:00:00+00:00 -0.00
13 2022-01-01 13:00:00+00:00 -2.59
14 2022-01-01 14:00:00+00:00 -5.00
15 2022-01-01 15:00:00+00:00 -7.07
16 2022-01-01 16:00:00+00:00 -8.66
17 2022-01-01 17:00:00+00:00 -9.66
18 2022-01-01 18:00:00+00:00 -10.00
19 2022-01-01 19:00:00+00:00 -9.66
Define a Timeseer API key in Configure > API keys
and use it to create a Client
:
>>> from timeseer_client import *
>>> client = Client(("<api key name>", "<api key>"), host='timeseer.example.org', port=8081)
Timeseer uses metadata to automatically profile a time series.
In this case, only the physical lower limit
of the sensor that measured the time series is known,
which is 0
.
>>> from timeseer_client.metadata import fields
>>> series = SeriesSelector("Sines", {"function": "sine", "amplitude": "10"})
>>> metadata = Metadata(series, {fields.LimitLowPhysical: 0})
Each time series in Timeseer is identified by a SeriesSelector
.
Each SeriesSelector
has a source ("Sines"
),
which will become the data set name,
and tags and a field.
This time series has the "function"
and "amplitude"
tags and the (default) "value"
field.
For time series where additional structure is not available,
a SeriesSelector
can also be created using a single "series name"
tag:
>>> SeriesSelector("Sines", "sine-10") == SeriesSelector("Sines", {"series name": "sine-10"})
Profiling this time series can be done using the profile
convenience function:
>>> profile(client, "Sines", [(metadata, df)])
[{'type': 'flow', 'name': 'Sines'}, {'type': 'data service', 'name': 'Sines'}, {'type': 'data set', 'name': 'Sines'}]
The profile
function creates a Data Set, a Data Service and a Flow with the given name,
in this case "Sines"
.
It also evaluates the flow.
Data should be provided as a pyarrow.Table
or a Pandas DataFrame
.
A Data Service summarizes the profiling results as Statistics and Event Frames.
Event Frames define a time range where something interesting has been detected.
>>> data_services = DataServices(client)
>>> data_service = DataServiceSelector('Sines', 'Sines')
>>> event_frames = data_services.get_event_frames(data_service)
>>> event_frames.to_pandas()['type'].value_counts()
compression - linear undercompression 61
Out of bounds (lower, physical) 31
Values below zero 31
Upper limit is present 1
Interpolation type is present 1
Compression - flat archival rate 1
Description is present 1
Unit is present 1
Name: type, dtype: int64
Not all profiling results are issues.
In this case we can safely ignore the 'linear undercompression' events.
The 'Out of bounds (lower, physical)' event frames cannot be ignored though,
as was mentioned earlier.
Statistics can be used to gain high-level insight into the data and explain the Event Frames:
>>> data_services.get_statistics(data_service, series)
[... Statistic(name='Value statistics', data_type='table', result=[['Min', -10.0], ['Max', 10.0], ['Mean', 4.775152794086695e-18], ['Median', 0], ['Std', 7.073308943835715]]) ...]
It is clear (and expected based on the data generation) that the Out of bounds (lower, physical)
Event Frames occur because the minimum value is -10.0
.
Timeseer can automatically correct the data to be within bounds using various strategies.
To create derived data in periods where an Event Frame is detected,
a "filter" Block in a Flow on that event frame type needs to be inserted.
The derived data can be stored in a few ways.
It is possible to create another Data Set, for example.
Storing them in a Data Service instead will allow verification that the problem has been resolved,
as data is stored there alongside quality indicators.
There is no shorthand for data cleaning,
as each case will require different action.
The most readable way to define the Flow that will create the derived data is in YAML.
Create sine-derive.yml
:
---
- type: data service
name: Derived sine results
kpiSet: Data quality fundamentals
range:
start: "2022-01-01T00:00:00Z"
end: "2022-02-01T00:00:00Z"
- type: flow
name: Create derived sine
dataSet: Sines
blocks:
- name: Analyze time series
type: analysis
- name: Hold last value when out of bounds
type: filter
augmentationStrategy: hold last value
filters:
- type: univariate
filter: "Out of bounds (lower, physical)"
series: ALL
- name: Analyze derived time series
type: analysis
- name: Keep results for derived series in Derived sine results data service
type: data_service_contribute
dataServiceName: Derived sine results
contributionBlockNames: [Analyze derived time series]
The Resources
and Flows
classes allow creating resources and evaluating flows respectively.
>>> resources = Resources(client)
>>> resources.create(path="sine-derive.yml")
>>> flows = Flows(client)
>>> flows.evaluate("Create derived sine")
The derived data has been profiled by the Flow.
Profiling results are available in the "Derived sine results"
Data Service:
>>> derived_data_service = DataServiceSelector('Derived sine results', 'Sines')
>>> event_frames = data_services.get_event_frames(derived_data_service)
>>> event_frames.to_pandas()['type'].value_counts()
compression - linear undercompression 31
Compression - flat archival rate 1
Interpolation type is present 1
Unit is present 1
Description is present 1
Upper limit is present 1
Name: type, dtype: int64
The derived data does no longer contain values below 0:
>>> derived_data = data_services.get_data(derived_data_service, series)
>>> derived_data.to_pandas().head(26)
value
ts
2022-01-01 00:00:00+00:00 0.00
2022-01-01 01:00:00+00:00 2.59
2022-01-01 02:00:00+00:00 5.00
2022-01-01 03:00:00+00:00 7.07
2022-01-01 04:00:00+00:00 8.66
2022-01-01 05:00:00+00:00 9.66
2022-01-01 06:00:00+00:00 10.00
2022-01-01 07:00:00+00:00 9.66
2022-01-01 08:00:00+00:00 8.66
2022-01-01 09:00:00+00:00 7.07
2022-01-01 10:00:00+00:00 5.00
2022-01-01 11:00:00+00:00 2.59
2022-01-01 12:00:00+00:00 -0.00
2022-01-01 13:00:00+00:00 -0.00
2022-01-01 14:00:00+00:00 -0.00
2022-01-01 15:00:00+00:00 -0.00
2022-01-01 16:00:00+00:00 -0.00
2022-01-01 17:00:00+00:00 -0.00
2022-01-01 18:00:00+00:00 -0.00
2022-01-01 19:00:00+00:00 -0.00
2022-01-01 20:00:00+00:00 -0.00
2022-01-01 21:00:00+00:00 -0.00
2022-01-01 22:00:00+00:00 -0.00
2022-01-01 23:00:00+00:00 -0.00
2022-01-02 00:00:00+00:00 0.00
2022-01-02 01:00:00+00:00 2.59
This only scratches the surface of the functionality in Timeseer.
Learn more in the Help
menu in the user interface.
All resources, blocks in Flows and event frame types are thoroughly documented.