fdcodepy

A codebook solution for time series data compression and feature extraction considering rebound effect

0.1.1

PyPI

Maintainers: 1

fdcodepy

Introduction

FD_codepy is an open-source python package that can be used to extract time series in an interpretable manner, and use it for compression.
The key idea is proposed specifically for metered data in energy sector, but can also be used with smart sensors and edge computing.
Inspired by Codebook method, it breaks down the time series data into its constituent parts, i.e., the unique sub-patterns called Codewords, and the index of the Codewords, i.e., representations, allowing for efficient compression and analysis.
Compared to resampling data into lower resolution, this lossy compression method takes similar data storage and transmission bandwidth, while preserving high frequency information and accumulative/average metered values.
Get a high level idea of the problem from our article published by The Conversation:
- Smart meters haven’t delivered the promised benefits to electricity users. Here’s a way to fix the problems
The FD_codepy source code is on GitHub: https://github.com/abc123yuanrui/FD_codepy/
- An example is provided as notebook: FD_codepy\examples\Codebook_processing_for_energy_porfile.ipynb

Key method for time series compression

Codebook: key class for reconstructing long energy time series into unique partitions (codewords) and representations. Check examples for details.
- It takes time series, window size, and distance metric types as inputs.
- Four ensambled distance methods are:
  - Euclidean Distance (default or 'euclidean')
  - Dynamic Time Warping ('DTW')
  - Wasserstein Distance ('Wasserstein')
  - Flexibility Distance ('flexibilityD')
- preprocessing method will normalise the data into normalised series as attribute normalized_arr, with the scaler attribute scaler_average
- get_distance_matrix method is a statistical analysis that computes the distance matrix for long time series (assuming we know historical data). It returns the matrix and quantile result for setting a similarity threshold (otherwise, the threshold can be set by an empirical value).
- desolve_time_series_thre process the time seires into codewords and representations, return them.
- post_processing reconstructe time series based on codewords and representations, the result stores as attribute recovered_series
Flexbility distance: a novel distance metric that measures the similarity between time series data while taking into account both temporal and amplitude distance, and the rebound effect of the data.
- Codebook.flex_distance is a static function for getting FD between the two given time series.
  - The default usage case is fd = Codebook.flex_distance(series_a, series_b) which provides fd by default settings
  - Get addtional routing inforamtion by fd, row_index, col_index = Codebook.flex_distance(series_a, series_b, route = True). It probvides the reshaping strage from series a to series b
  - User can also customise the weighted matrix by fd = Codebook.flex_distance(series_a, series_b, weighted_matix_a, weighted_matix_b, route = False)
- Given any two time series, user can generate a sample dashboard with the default settings following:
  - from fdcodepy.utils.helpers import distance_method_routing_analysis
  - distance_method_routing_analysis(series_a, series_b, methods.Code_book, report = True)
  - The report is gereated in working directorary, user can change it by modifying export_dir variable of the function distance_method_routing_analysis

Installation

Install using pip: pip install fdcodepy

Usage

Import the package: import fdcodepy
Step by step example: Codebook processing for a given hourly time series and window size of 24, which decides the compression ratio (same with resmpling data from hourly into daily)
- from fdcodepy import methods
- sample_series = np.random.uniform(0, 30, 365*24)
- series_codebook = methods.Code_book(time_series, 24, 'flexibilityD')
- series_codebook.pre_processing()
- distance_matrix, quantiles = series_codebook.get_distance_matrix()
- codewords, representations = series_codebook.desolve_time_series_thre(quantiles[0])
- series_codebook.post_processing()
- series_codebook.recovered_series is the reconstruted data, computed from the representations with only lenght of len(representations), compared to original series with length len(sample_series)
The representations are the length of data needs to be communicated to data center, which is equal to the size of downsampled data, in this case, 365
Use the Codebook processing result to generate a report
- from fdcodepy.utils.helpers import code_book_processing_analysis
- code_book_processing_analysis(series_codebook, time_index, report = True, export_dir = '.')
Use the FlexibilityDistance to compute the flexibility distance between two time series datasets (with default settings).
- from fdcodepy import methods
- Code_book.flex_distance(time_series_1, time_series_2)

Example analysis report

The figures can be zoomed to for checking details

Reference

Yuan, R., Pourmousavi, S. A., Soong, W. L., Black, A. J., Liisberg, J. A. R., & Lemos-Vinasco, J. (2024). Unleashing the benefits of smart grids by overcoming the challenges associated with low-resolution data. Cell Reports Physical Science, 5(2), 101830. https://doi.org/10.1016/j.xcrp.2024.101830
Yuan, S. A. Pourmousavi, W. L. Soong, A. J. Black, J. A. R. Liisberg, and J. Lemos-Vinasco, “A New Time Series Similarity Measure and Its Smart Grid Applications,” 2023. https://arxiv.org/abs/2310.12399

FAQs

What is fdcodepy?

Is fdcodepy well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install