Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

catbench

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

catbench

CatBench: Benchmark of Graph Neural Networks for Adsorption Energy Predictions in Heterogeneous Catalysis

  • 0.1.18
  • PyPI
  • Socket score

Maintainers
1

CatBench

CatBench: Benchmark Framework for Graph Neural Networks in Adsorption Energy Predictions

Installation

pip install catbench

Overview

CatBench Schematic CatBench is a comprehensive benchmarking framework designed to evaluate Graph Neural Networks (GNNs) for adsorption energy predictions. It provides tools for data processing, model evaluation, and result analysis.

Usage Workflow

1. Data Processing

CatBench supports two types of data sources:

A. Direct from Catalysis-Hub
# Import the catbench package
import catbench

# Process data from Catalysis-Hub
catbench.cathub_preprocess("Catalysis-Hub_Dataset_tag")

Example:

# Process specific dataset from Catalysis-Hub
# Using AraComputational2022 as an example
catbench.cathub_preprocess("AraComputational2022")
B. User Dataset

For custom datasets, prepare your data structure as follows:

The data structure should include:

  • Gas references (gas/) containing VASP output files for gas phase molecules
  • Surface structures (surface1/, surface2/, etc.) containing:
    • Clean slab calculations (slab/)
    • Adsorbate-surface systems (H/, OH/, etc.)

Note: Each directory must contain CONTCAR and OSZICAR files. Other VASP output files can be present as well - the process_output function will automatically clean up (delete) all files except CONTCAR and OSZICAR.

data/
├── gas/
│   ├── H2gas/
│   │   ├── CONTCAR
│   │   └── OSZICAR
│   └── H2Ogas/
│       ├── CONTCAR
│       └── OSZICAR
├── surface1/
│   ├── slab/
│   │   ├── CONTCAR
│   │   └── OSZICAR
│   ├── H/
│   │   ├── CONTCAR
│   │   └── OSZICAR
│   └── OH/
│       ├── CONTCAR
│       └── OSZICAR
└── surface2/
    ├── slab/
    │   ├── CONTCAR
    │   └── OSZICAR
    ├── H/
    │   ├── CONTCAR
    │   └── OSZICAR
    └── OH/
        ├── CONTCAR
        └── OSZICAR

Then process using:

import catbench

# Define coefficients for calculating adsorption energies
# For each adsorbate, specify coefficients based on the reaction equation:
# Example for H*: 
#   E_ads(H*) = E(H*) - E(slab) - 1/2 E(H2_gas)
# Example for OH*:
#   E_ads(OH*) = E(OH*) - E(slab) + 1/2 E(H2_gas) - E(H2O_gas)

coeff_setting = {
    "H": {
        "slab": -1,      # Coefficient for clean surface
        "adslab": 1,     # Coefficient for adsorbate-surface system
        "H2gas": -1/2,   # Coefficient for H2 gas reference
    },
    "OH": {
        "slab": -1,      # Coefficient for clean surface
        "adslab": 1,     # Coefficient for adsorbate-surface system
        "H2gas": +1/2,   # Coefficient for H2 gas reference
        "H2Ogas": -1,    # Coefficient for H2O gas reference
    },
}

# This will clean up directories and keep only CONTCAR and OSZICAR files
catbench.process_output("data", coeff_setting)
catbench.userdata_preprocess("data")

2. Execute Benchmark

A. General Benchmark

This is a general benchmark setup. The range() value determines the number of repetitions for reproducibility testing. If reproducibility testing is not needed, it can be set to 1.

import catbench
from your_calculator import Calculator

# Prepare calculator list
# range(5): Run 5 times for reproducibility testing
# range(1): Single run when reproducibility testing is not needed
calculators = [Calculator() for _ in range(5)]

config = {}
catbench.execute_benchmark(calculators, **config)

After execution, the following files and directories will be created:

  1. A result directory is created to store all calculation outputs.
  2. Inside the result directory, subdirectories are created for each GNN.
  3. Each GNN's subdirectory contains:
    • gases/: Gas reference molecules for adsorption energy calculations
    • log/: Slab and adslab calculation logs
    • traj/: Slab and adslab trajectory files
    • {GNN_name}_gases.json: Gas molecules energies
    • {GNN_name}_outlier.json: Outlier detection status for each adsorption data
    • {GNN_name}_result.json: Raw data (energies, calculation times, outlier detection, slab displacements, etc.)
B. Single-point Calculation Benchmark
import catbench
from your_calculator import Calculator

calculator = Calculator()

config = {}
catbench.execute_benchmark_single(calculator, **config)

3. Analysis

import catbench

config = {}
catbench.analysis_GNNs(**config)

The analysis function processes the calculation data stored in the result directory and generates:

  1. A plot/ directory:

    • Parity plots for each GNN model
    • Combined parity plots for comparison
    • Performance visualization plots
  2. An Excel file {dataset_name}_Benchmarking_Analysis.xlsx:

    • Comprehensive performance metrics for all GNN models
    • Statistical analysis of predictions
    • Model-specific details and parameters
Single-point Calculation Analysis
import catbench

config = {}
catbench.analysis_GNNs_single(**config)

Outputs

1. Adsorption Energy Parity Plot (mono_version & multi_version)

You can plot adsorption energy parity plots for each adsorbate across all GNNs, either simply or by adsorbate.

2. Comprehensive Performance Table

View various metrics for all GNNs. Comparison Table

3. Outlier Analysis

See how outliers are detected for all GNNs. Comparison Table

4. Analysis by Adsorbate

Observe how each GNN predicts for each adsorbate. Comparison Table

Configuration Options

execute_benchmark

OptionDescriptionDefault
GNN_nameName of your GNNRequired
benchmarkName of benchmark datasetRequired
F_CRIT_RELAXForce convergence criterion0.05
N_CRIT_RELAXMaximum number of steps999
rateFix ratio for surface atoms (0: use original constraints, >0: fix atoms from bottom up to specified ratio)0.5
disp_thrs_slabDisplacement threshold for slab1.0
disp_thrs_adsDisplacement threshold for adsorbate1.5
again_seedSeed variation threshold0.2
dampingDamping factor for optimization1.0
gas_distanceCell size for gas molecules10
optimizerOptimization algorithm"LBFGS"

execute_benchmark_single

OptionDescriptionDefault
GNN_nameName of your GNNRequired
benchmarkName of benchmark datasetRequired
gas_distanceCell size for gas molecules10

analysis_GNNs

OptionDescriptionDefault
Benchmarking_nameName for output filesCurrent directory name
calculating_pathPath to result directory"./result"
GNN_listList of GNNs to analyzeAll GNNs in result directory
target_adsorbatesTarget adsorbates to analyzeAll adsorbates
specific_colorColor for plots"black"
minPlot y-axis minimumAuto-calculated
maxPlot y-axis maximumAuto-calculated
figsizeFigure size(9, 8)
mark_sizeMarker size100
linewidthsLine width1.5
dpiPlot resolution300
legend_offToggle legendFalse
error_bar_displayToggle error barsFalse

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

This work will be published soon.

Keywords

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc