Slideflow is a deep learning library for digital pathology, offering a user-friendly interface for model development.
Designed for both medical researchers and AI enthusiasts, the goal of Slideflow is to provide an accessible, easy-to-use interface for developing state-of-the-art pathology models. Slideflow has been built with the future in mind, offering a scalable platform for digital biomarker development that bridges the gap between ever-evolving, sophisticated methods and the needs of a clinical researcher. For developers, Slideflow provides multiple endpoints for integration with other packages and external training paradigms, allowing you to leverage highly optimized, pathology-specific processes with the latest ML methodologies.
🚀 Features
Full documentation with example tutorials can be found at slideflow.dev.
Requirements
Optional
- Libvips >= 8.9 (alternative slide reader, adds support for *.scn, *.mrxs, *.ndpi, *.vms, and *.vmu files).
- Linear solver (for preserved-site cross-validation)
📥 Installation
Slideflow can be installed with PyPI, as a Docker container, or run from source.
Method 1: Install via pip
pip3 install --upgrade setuptools pip wheel
pip3 install slideflow[cucim] cupy-cuda11x
The cupy
package name depends on the installed CUDA version; see here for installation instructions. cupy
is not required if using Libvips.
Method 2: Docker image
Alternatively, pre-configured docker images are available with OpenSlide/Libvips and the latest version of either Tensorflow and PyTorch. To install with the Tensorflow backend:
docker pull jamesdolezal/slideflow:latest-tf
docker run -it --gpus all jamesdolezal/slideflow:latest-tf
To install with the PyTorch backend:
docker pull jamesdolezal/slideflow:latest-torch
docker run -it --shm-size=2g --gpus all jamesdolezal/slideflow:latest-torch
Method 3: From source
To run from source, clone this repository, install the conda development environment, and build a wheel:
git clone https://github.com/slideflow/slideflow
conda env create -f slideflow/environment.yml
conda activate slideflow
pip install -e slideflow/ cupy-cuda11x
Non-Commercial Add-ons
To add additional tools and pretrained models available under a non-commercial license, install slideflow-gpl
and slideflow-noncommercial
:
pip install slideflow-gpl slideflow-noncommercial
This will provide integrated access to 6 additional pretrained foundation models (UNI, HistoSSL, GigaPath, PLIP, RetCCL, and CTransPath), the MIL architecture CLAM, the UQ algorithm BISCUIT, and the GAN framework StyleGAN3.
⚙️ Configuration
Deep learning (PyTorch vs. Tensorflow)
Slideflow supports both PyTorch and Tensorflow, defaulting to PyTorch if both are available. You can specify the backend to use with the environmental variable SF_BACKEND
. For example:
export SF_BACKEND=tensorflow
Slide reading (cuCIM vs. Libvips)
By default, Slideflow reads whole-slide images using cuCIM. Although much faster than other openslide-based frameworks, it supports fewer slide scanner formats. Slideflow also includes a Libvips backend, which adds support for *.scn, *.mrxs, *.ndpi, *.vms, and *.vmu files. You can set the active slide backend with the environmental variable SF_SLIDE_BACKEND
:
export SF_SLIDE_BACKEND=libvips
Getting started
Slideflow experiments are organized into Projects, which supervise storage of whole-slide images, extracted tiles, and patient-level annotations. The fastest way to get started is to use one of our preconfigured projects, which will automatically download slides from the Genomic Data Commons:
import slideflow as sf
P = sf.create_project(
root='/project/destination',
cfg=sf.project.LungAdenoSquam(),
download=True
)
After the slides have been downloaded and verified, you can skip to Extract tiles from slides.
Alternatively, to create a new custom project, supply the location of patient-level annotations (CSV), slides, and a destination for TFRecords to be saved:
import slideflow as sf
P = sf.create_project(
'/project/path',
annotations="/patient/annotations.csv",
slides="/slides/directory",
tfrecords="/tfrecords/directory"
)
Ensure that the annotations file has a slide
column for each annotation entry with the filename (without extension) of the corresponding slide.
Next, whole-slide images are segmented into smaller image tiles and saved in *.tfrecords
format. Extract tiles from slides at a given magnification (width in microns size) and resolution (width in pixels) using sf.Project.extract_tiles()
:
P.extract_tiles(
tile_px=299,
tile_um=302
)
If slides are on a network drive or a spinning HDD, tile extraction can be accelerated by buffering slides to a SSD or ramdisk:
P.extract_tiles(
...,
buffer="/mnt/ramdisk"
)
Training models
Once tiles are extracted, models can be trained. Start by configuring a set of hyperparameters:
params = sf.ModelParams(
tile_px=299,
tile_um=302,
batch_size=32,
model='xception',
learning_rate=0.0001,
...
)
Models can then be trained using these parameters. Models can be trained to categorical, multi-categorical, continuous, or time-series outcomes, and the training process is highly configurable. For example, to train models in cross-validation to predict the outcome 'category1'
as stored in the project annotations file:
P.train(
'category1',
params=params,
save_predictions=True,
multi_gpu=True
)
Evaluation, heatmaps, mosaic maps, and more
Slideflow includes a host of additional tools, including model evaluation and prediction, heatmaps, analysis of layer activations, mosaic maps, and more. See our full documentation for more details and tutorials.
📚 Publications
Slideflow has been used by:
- Dolezal et al, Modern Pathology, 2020
- Rosenberg et al, Journal of Clinical Oncology [abstract], 2020
- Howard et al, Nature Communications, 2021
- Dolezal et al Nature Communications, 2022
- Storozuk et al, Modern Pathology [abstract], 2022
- Partin et al Front Med, 2022
- Dolezal et al Journal of Clinical Oncology [abstract], 2022
- Dolezal et al Mediastinum [abstract], 2022
- Howard et al npj Breast Cancer, 2023
- Dolezal et al npj Precision Oncology, 2023
- Hieromnimon et al [bioRxiv], 2023
- Carrillo-Perez et al Cancer Imaging, 2023
🔓 License
This code is made available under the Apache-2.0 license.
🔗 Reference
If you find our work useful for your research, or if you use parts of this code, please consider citing as follows:
Dolezal, J.M., Kochanny, S., Dyer, E. et al. Slideflow: deep learning for digital histopathology with real-time whole-slide visualization. BMC Bioinformatics 25, 134 (2024). https://doi.org/10.1186/s12859-024-05758-x
@Article{Dolezal2024,
author={Dolezal, James M. and Kochanny, Sara and Dyer, Emma and Ramesh, Siddhi and Srisuwananukorn, Andrew and Sacco, Matteo and Howard, Frederick M. and Li, Anran and Mohan, Prajval and Pearson, Alexander T.},
title={Slideflow: deep learning for digital histopathology with real-time whole-slide visualization},
journal={BMC Bioinformatics},
year={2024},
month={Mar},
day={27},
volume={25},
number={1},
pages={134},
doi={10.1186/s12859-024-05758-x},
url={https://doi.org/10.1186/s12859-024-05758-x}
}