Welcome to PyPOTS
a Python toolbox for machine learning on Partially-Observed Time Series
⦿ Motivation
: Due to all kinds of reasons like failure of collection sensors, communication error,
and unexpected malfunction, missing values are common to see in time series from the real-world environment.
This makes partially-observed time series (POTS) a pervasive problem in open-world modeling and prevents advanced
data analysis. Although this problem is important, the area of machine learning on POTS still lacks a dedicated toolkit.
PyPOTS is created to fill in this blank.
⦿ Mission
: PyPOTS (pronounced "Pie Pots") is born to become a handy toolbox that is going to make machine learning on
POTS easy rather than tedious, to help engineers and researchers focus more on the core problems in their hands rather
than on how to deal with the missing parts in their data. PyPOTS will keep integrating classical and the latest
state-of-the-art machine learning algorithms for partially-observed multivariate time series. For sure, besides various
algorithms, PyPOTS is going to have unified APIs together with detailed documentation and interactive examples across
algorithms as tutorials.
🤗 Please star this repo to help others notice PyPOTS if you think it is a useful toolkit.
Please kindly cite PyPOTS in your publications if it helps with
your research.
This really means a lot to our open-source research. Thank you!
The rest of this readme file is organized as follows:
❖ Available Algorithms,
❖ PyPOTS Ecosystem,
❖ Installation,
❖ Usage,
❖ Citing PyPOTS,
❖ Contribution,
❖ Community.
❖ Available Algorithms
PyPOTS supports imputation, classification, clustering, forecasting, and anomaly detection tasks on multivariate
partially-observed time series with missing values. The table below shows the availability of each algorithm
(sorted by Year) in PyPOTS for different tasks. The symbol ✅
indicates the algorithm is available for the
corresponding task (note that models will be continuously updated in the future to handle tasks that are not
currently supported. Stay tuned❗️).
🌟 Since v0.2, all neural-network models in PyPOTS has got hyperparameter-optimization support.
This functionality is implemented with the Microsoft NNI framework. You may want to
refer to our time-series imputation survey repo Awesome_Imputation
to see how to config and tune the hyperparameters.
🔥 Note that all models whose name with 🧑🔧
in the table (e.g. Transformer, iTransformer, Informer etc.) are not
originally proposed as algorithms for POTS data in their papers, and they cannot directly accept time series with
missing values as input, let alone imputation. To make them applicable to POTS data, we specifically apply the
embedding strategy and training approach (ORT+MIT) the same as we did in
the SAITS paper1.
The task types are abbreviated as follows:
IMPU
: Imputation;
FORE
: Forecasting;
CLAS
: Classification;
CLUS
: Clustering;
ANOD
: Anomaly Detection.
The paper references and links are all listed at the bottom of this file.
Type | Algo | IMPU | FORE | CLAS | CLUS | ANOD | Year - Venue |
---|
LLM | Time-Series.AI 2 | ✅ | ✅ | ✅ | ✅ | ✅ | Later in 2024 |
Neural Net | TEFN🧑🔧3 | ✅ | | | | | 2024 - arXiv |
Neural Net | TimeMixer4 | ✅ | | | | | 2024 - ICLR |
Neural Net | iTransformer🧑🔧5 | ✅ | | | | | 2024 - ICLR |
Neural Net | ModernTCN6 | ✅ | | | | | 2024 - ICLR |
Neural Net | ImputeFormer🧑🔧7 | ✅ | | | | | 2024 - KDD |
Neural Net | SAITS1 | ✅ | | | | | 2023 - ESWA |
Neural Net | FreTS🧑🔧8 | ✅ | | | | | 2023 - NeurIPS |
Neural Net | Koopa🧑🔧9 | ✅ | | | | | 2023 - NeurIPS |
Neural Net | Crossformer🧑🔧10 | ✅ | | | | | 2023 - ICLR |
Neural Net | TimesNet11 | ✅ | | | | | 2023 - ICLR |
Neural Net | PatchTST🧑🔧12 | ✅ | | | | | 2023 - ICLR |
Neural Net | ETSformer🧑🔧13 | ✅ | | | | | 2023 - ICLR |
Neural Net | MICN🧑🔧14 | ✅ | | | | | 2023 - ICLR |
Neural Net | DLinear🧑🔧15 | ✅ | | | | | 2023 - AAAI |
Neural Net | TiDE🧑🔧16 | ✅ | | | | | 2023 - TMLR |
Neural Net | SCINet🧑🔧17 | ✅ | | | | | 2022 - NeurIPS |
Neural Net | Nonstationary Tr.🧑🔧18 | ✅ | | | | | 2022 - NeurIPS |
Neural Net | FiLM🧑🔧19 | ✅ | | | | | 2022 - NeurIPS |
Neural Net | RevIN_SCINet🧑🔧20 | ✅ | | | | | 2022 - ICLR |
Neural Net | Pyraformer🧑🔧21 | ✅ | | | | | 2022 - ICLR |
Neural Net | Raindrop22 | | | ✅ | | | 2022 - ICLR |
Neural Net | FEDformer🧑🔧23 | ✅ | | | | | 2022 - ICML |
Neural Net | Autoformer🧑🔧24 | ✅ | | | | | 2021 - NeurIPS |
Neural Net | CSDI25 | ✅ | ✅ | | | | 2021 - NeurIPS |
Neural Net | Informer🧑🔧26 | ✅ | | | | | 2021 - AAAI |
Neural Net | US-GAN27 | ✅ | | | | | 2021 - AAAI |
Neural Net | CRLI28 | | | | ✅ | | 2021 - AAAI |
Probabilistic | BTTF29 | | ✅ | | | | 2021 - TPAMI |
Neural Net | StemGNN🧑🔧30 | ✅ | | | | | 2020 - NeurIPS |
Neural Net | Reformer🧑🔧31 | ✅ | | | | | 2020 - ICLR |
Neural Net | GP-VAE32 | ✅ | | | | | 2020 - AISTATS |
Neural Net | VaDER33 | | | | ✅ | | 2019 - GigaSci. |
Neural Net | M-RNN34 | ✅ | | | | | 2019 - TBME |
Neural Net | BRITS35 | ✅ | | ✅ | | | 2018 - NeurIPS |
Neural Net | GRU-D36 | ✅ | | ✅ | | | 2018 - Sci. Rep. |
Neural Net | TCN🧑🔧37 | ✅ | | | | | 2018 - arXiv |
Neural Net | Transformer🧑🔧38 | ✅ | | | | | 2017 - NeurIPS |
Naive | Lerp39 | ✅ | | | | | |
Naive | LOCF/NOCB | ✅ | | | | | |
Naive | Mean | ✅ | | | | | |
Naive | Median | ✅ | | | | | |
💯 Contribute your model right now to increase your research impact! PyPOTS downloads are increasing rapidly
(300K+ in total and 1K+ daily on PyPI so far),
and your work will be widely used and cited by the community.
Refer to the contribution guide to see how to include your model in
PyPOTS.
❖ PyPOTS Ecosystem
At PyPOTS, things are related to coffee, which we're familiar with. Yes, this is a coffee universe!
As you can see, there is a coffee pot in the PyPOTS logo. And what else? Please read on ;-)
👈 Time series datasets are taken as coffee beans at PyPOTS, and POTS datasets are incomplete coffee beans with missing
parts that have their own meanings. To make various public time-series datasets readily available to users,
Time Series Data Beans (TSDB) is created to make loading time-series datasets super easy!
Visit TSDB right now to know more about this handy tool 🛠, and it now supports a
total of 172 open-source datasets!
👉 To simulate the real-world data beans with missingness, the ecosystem library
PyGrinder, a toolkit helping grind your coffee beans into incomplete ones, is
created. Missing patterns fall into three categories according to Robin's theory40:
MCAR (missing completely at random), MAR (missing at random), and MNAR (missing not at random).
PyGrinder supports all of them and additional functionalities related to missingness.
With PyGrinder, you can introduce synthetic missing values into your datasets with a single line of code.
👈 To fairly evaluate the performance of PyPOTS algorithms, the benchmarking suite
BenchPOTS is created, which provides standard and unified data-preprocessing
pipelines to prepare datasets for measuring the performance of different POTS algorithms on various tasks.
👉 Now the beans, grinder, and pot are ready, please have a seat on the bench and let's think about how to brew us a cup
of coffee. Tutorials are necessary! Considering the future workload, PyPOTS tutorials are released in a single repo,
and you can find them in BrewPOTS.
Take a look at it now, and learn how to brew your POTS datasets!
☕️ Welcome to the universe of PyPOTS. Enjoy it and have fun!
❖ Installation
You can refer to the installation instruction in PyPOTS documentation
for a guideline with more details.
PyPOTS is available on both PyPI
and Anaconda.
You can install PyPOTS like below as well as
TSDB,PyGrinder,
BenchPOTS, and AI4TS:
pip install pypots
pip install pypots --upgrade
pip install https://github.com/WenjieDu/PyPOTS/archive/main.zip
conda install conda-forge::pypots
conda update conda-forge::pypots
❖ Usage
Besides BrewPOTS, you can also find a simple and quick-start tutorial notebook
on Google Colab
. If you have further questions, please refer to PyPOTS documentation docs.pypots.com.
You can also raise an issue or ask in our community.
We present you a usage example of imputing missing values in time series with PyPOTS below, you can click it to view.
Click here to see an example applying SAITS on PhysioNet2012 for imputation:
import numpy as np
from sklearn.preprocessing import StandardScaler
from pygrinder import mcar
from pypots.data import load_specific_dataset
data = load_specific_dataset('physionet_2012')
X = data['X']
num_samples = len(X['RecordID'].unique())
X = X.drop(['RecordID', 'Time'], axis = 1)
X = StandardScaler().fit_transform(X.to_numpy())
X = X.reshape(num_samples, 48, -1)
X_ori = X
X = mcar(X, 0.1)
dataset = {"X": X}
print(X.shape)
from pypots.imputation import SAITS
from pypots.utils.metrics import calc_mae
saits = SAITS(n_steps=48, n_features=37, n_layers=2, d_model=256, n_heads=4, d_k=64, d_v=64, d_ffn=128, dropout=0.1, epochs=10)
saits.fit(dataset)
imputation = saits.impute(dataset)
indicating_mask = np.isnan(X) ^ np.isnan(X_ori)
mae = calc_mae(imputation, np.nan_to_num(X_ori), indicating_mask)
saits.save("save_it_here/saits_physionet2012.pypots")
saits.load("save_it_here/saits_physionet2012.pypots")
❖ Citing PyPOTS
[!TIP]
[Updates in Jun 2024] 😎 The 1st comprehensive time-seres imputation benchmark paper
TSI-Bench: Benchmarking Time Series Imputation now is public available.
The code is open source in the repo Awesome_Imputation.
With nearly 35,000 experiments, we provide a comprehensive benchmarking study on 28 imputation methods, 3 missing
patterns (points, sequences, blocks),
various missing rates, and 8 real-world datasets.
[Updates in Feb 2024] 🎉 Our survey
paper Deep Learning for Multivariate Time Series Imputation: A Survey has been
released on arXiv.
We comprehensively review the literature of the state-of-the-art deep-learning imputation methods for time series,
provide a taxonomy for them, and discuss the challenges and future directions in this field.
The paper introducing PyPOTS is available on arXiv,
and a short version of it is accepted by the 9th SIGKDD international workshop on Mining and Learning from Time
Series (MiLeTS'23)).
Additionally, PyPOTS has been included as a PyTorch Ecosystem project.
We are pursuing to publish it in prestigious academic venues, e.g. JMLR (track for
Machine Learning Open Source Software). If you use PyPOTS in your work,
please cite it as below and 🌟star this repository to make others notice this library. 🤗
There are scientific research projects using PyPOTS and referencing in their papers.
Here is an incomplete list of them.
@article{du2023pypots,
title = {{PyPOTS: a Python toolbox for data mining on Partially-Observed Time Series}},
author = {Wenjie Du},
journal = {arXiv preprint arXiv:2305.18811},
year = {2023},
}
or
Wenjie Du.
PyPOTS: a Python toolbox for data mining on Partially-Observed Time Series.
arXiv, abs/2305.18811, 2023.
❖ Contribution
You're very welcome to contribute to this exciting project!
By committing your code, you'll
- make your well-established model out-of-the-box for PyPOTS users to run,
and help your work obtain more exposure and impact.
Take a look at our inclusion criteria.
You can utilize the
template
folder in each task package (e.g.
pypots/imputation/template) to quickly
start; - become one of PyPOTS contributors and
be listed as a volunteer developer on the PyPOTS website;
- get mentioned in PyPOTS release notes;
You can also contribute to PyPOTS by simply staring🌟 this repo to help more people notice it.
Your star is your recognition to PyPOTS, and it matters!
👏 Click here to view PyPOTS stargazers and forkers.
We're so proud to have more and more awesome users, as well as more bright ✨stars:
👀 Check out a full list of our users' affiliations on PyPOTS website here!
We care about the feedback from our users, so we're building PyPOTS community on
- Slack. General discussion,
Q&A, and our development team are here;
- LinkedIn. Official announcements and news are here;
- WeChat (微信公众号). We also run a group chat on WeChat,
and you can get the QR code from the official account after following it;
If you have any suggestions or want to contribute ideas or share time-series related papers, join us and tell.
PyPOTS community is open, transparent, and surely friendly. Let's work together to build and improve PyPOTS!