Socket
Socket
Sign inDemoInstall

dynabench

Package Overview
Dependencies
0
Maintainers
1
Alerts
File Explorer

Install Socket

Detect and block malicious and high-risk dependencies

Install

    dynabench

Benchmark dataset for learning dynamical systems from data


Maintainers
1

Readme

Dynabench: A benchmark dataset for learning dynamical systems from low-resolution data

This is the repository containing the data generation algorithms as well as all baseline models for the Dynabench: A benchmark dataset for learning dynamical systems from low-resolution data paper (accepted at ECML-PKDD 2023)

!!!You can find the documentation on how to use this package here: dynabench.github.io

DynaBench is a benchmark dataset for learning dynamical systems from data. Dynamical systems are physical systems that are typically modelled by partial differential equations (e.g. numerical weather prediction, climate models, fluid simulation, electromagnetic field simulation etc.). The main challenge of learning to predict the evolution of these systems from data is the chaotic behaviour that these systems show (small deviation from the initial conditions leads to highly different predictions) as well as data availability. In real world settings only low-resolution data is available, with measurements sparsly scattered in the simulation domain (see following figure illustrating the distribution of weather monitoring stations in europe).

Weather stations europe gif

In this benchmark we try to simulate this setting using synthetic data for easier evaluation and training of different machine learning models. To this end we generated simulation data by solving five different PDE systems which were then postprocessed to create low-resolution snapshots of the simulation.

There main tasks for which the dataset has been generated is forecasting - predicting the next state(s) of the system

The six included different equations were selected to be both sufficiently complex, as well as sufficiently variable to simulate different physical systems (first and second order, coupled equations, stationary and non-statinary).

An example (wave equation) of a simulated system is shown below:

Wave example gif

Equations

There are four different equations in the dataset, each with different characteristics summarized in the following table:

EquationComponentsTime OrderSpatial Order
Advection111
Burgers'212
Gas Dynamics412
Kuramoto-Sivashinsky114
Reaction-Diffusion212
Wave122

Setup

Automated setup

If needed create a virtual environment and activate it You can then install all dependencies by running

sh scripts/install_requirements.sh

from the main project directory

Manual installation

Alternatively you can manually install the dependencies from the requirements.txt file:

pip install -r requirements.txt

It is recommended to first create a virtual environment, for example:

python -m venv venv
source venv/bin/activate

Additionally you need to install pytorch geometric, following the instructions on their website.

Generation

To generate the data for a specific equation run

python generate.py num_simulations=NUM_SIMULATIONS equation=EQUATION split=DATASET_SPLIT

Where NUM_SIMULATIONS indicates how many times each equation is simulated,EQUATION is one of (advection, burgers, gas_dynamics, kuramoto_sivashinsky, reaction_diffusion, wave), and DATASET_SPLIT is one of train, test, val

The full benchmark dataset contains 7000 simulations for the training set, 1000 for the validation set and 1000 for the test set, all divided into chunks of 500 simulations.

Warning: this can take a long time.

Data format

The data is stored in .tar archives in chunks of 500 simulations. Each simulation consists of one file called XXXXXXX.data containing the simulation values for the given setting (cloud/grid + number of points) as well as a file called XXXXXXX.data containing the coordinates of the points at which the measurements were recorded.

Usage

To reproduce the experiments from our paper run: python main.py equation=EQUATION model=MODEL support=cloud num_points=NUM_POINTS

This will start the training for a specific setting. The parameters specify which model, task, support structure, number of points etc. should be run. The available choices of parameters are:

EQUATION = [brusselator, gas_dynamics, kuramoto_sivashinsky, wave, advection]

MODEL = [persistence, point_gnn, point_net, point_transformer, gat, gcn, feast, kernelNN, graphpde]

To run the experiments for the grid models run: python main.py equation=EQUATION model=MODEL support=grid num_points=NUM_POINTS datamodule=torch" lightningmodule=gridmodule support=grid

with MODEL selected from [neuralpde, resnet, cnn]

Additionally, to use the benchmark for your own research use the included datasets.The repository contains two dataset classes to handle the generated data.

  1. A pytorch dataset class, where each sample has the form $X\in\mathbb{R}^{L\times N\times D}$, where L is the lookback, N is the number of points and D is the number of target variables. See documentation of the dataset for details. To initialize the dataset class:
DynaBenchBase(
    mode: str = 'train',
    equation: str = 'gas_dynamics',
    task: str = 'forecast',
    support: str = 'grid',
    num_points: str = 'high',
    base_path: str = 'data',
    lookback: int = 1,
    rollout: int = 1,
    test_ratio: float = 0.1,
    val_ratio: float = 0.1,
    merge_lookback: bool = True,
    *args,
    **kwargs
)

Initializes a pytorch dataset with selected parameters. The data is loaded lazily.

Args:

  • mode (str, optional): the selection of data to use (train/val/test). Defaults to "train".
  • equation (str, optional): the equation to use. Defaults to "gas_dynamics".
  • task (str, optional): Which task to use as targets. Defaults to "forecast".
  • support (str, optional): Structure of the points at which the measurements are recorded. Defaults to "grid".
  • num_points (str, optional): Number of points at which measurements are available. Defaults to "high".
  • base_path (str, optional): location where the data is stored. Defaults to "data".
  • lookback (int, optional): How many past states are used to make the prediction. The additional states can be concatenated along the channel dimension if merge_lookback is set to True. Defaults to 1.
  • rollout (int, optional): How many steps should be predicted in a closed loop setting. Only used for forecast task. Defaults to 1.
  • test_ratio (float, optional): What fraction of simulations to set aside for testing. Defaults to 0.1.
  • val_ratio (float, optional): What fraction of simulations to set aside for validation. Defaults to 0.1.
  • merge_lookback (bool, optional): Whether to merge the additional lookback information into the channel dimension. Defaults to True.
  1. A graph dataset, specifically used for Message Passing Neural Networks implemented using the Pytorch Geometric module. It has a similar structure as the base DynaBench dataset.

Benchmark Results

The following tables show the results of our experiments

  • forecast task, 900 points (1-step MSE):
modelAdvectionBurgersGas DynamicsKuramoto-SivashinskyReaction-DiffusionWave
CNN5.30848e-050.01109880.004203680.0006698370.000369180.00143387
FeaSt0.0001303510.01161550.01620.01178670.0004888480.00523298
GAT0.009601130.04399860.0374830.06670570.009152080.0151498
GCN0.0263970.138990.08426110.4365630.1646780.0382004
GraphPDE0.0001370980.01073910.01947550.007198220.0001421140.00207144
KernelNN6.31157e-050.01061460.0133540.006686980.0001870190.00542925
NeuralPDE8.24453e-070.01123730.003734160.0005369580.0003031760.00169871
Persistence0.08120810.03676880.1869850.1422430.1471240.113805
Point Transformer4.41633e-050.01030980.007248990.004897110.0001412480.00238447
PointGNN2.82496e-050.008825280.009016490.006730360.0001360590.00138772
ResNet2.15721e-060.01480520.003212350.0004901040.0001567520.00145884
  • forecast task, 900 points (16-step rollout MSE):
modelAdvectionBurgersGas DynamicsKuramoto-SivashinskyReaction-DiffusionWave
CNN0.001613310.5545540.9953821.260110.01834830.561433
FeaSt1.482880.5611970.8195943.744480.1301491.61066
GAT41364.10.8333531.214365.689253.855062.38418
GCN3.51453e+1313.08767.206331.70612e+241.75955e+077.89253
GraphPDE1.079530.7298790.9692082.10440.08002351.02586
KernelNN0.8974310.727160.8540152.003340.06352781.57885
NeuralPDE0.0002703080.6597890.4434981.055640.02241550.247704
Persistence2.393930.6792611.4571.897520.2756782.61281
Point Transformer0.6170250.5038650.6428792.097460.05643991.27343
PointGNN0.6606651.043420.7592572.820630.05822931.30743
ResNet8.64621e-051.863520.4802841.06970.007046120.299457

License

The content of this project itself, including the data and pretrained models, is licensed under the Creative Commons Attribution-ShareAlike 4.0 International Public License (CC BY-SA 4.0). The underlying source code used to generate the data and train the models is licensed under the MIT license.

FAQs


Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc