
Security News
vlt Launches "reproduce": A New Tool Challenging the Limits of Package Provenance
vlt's new "reproduce" tool verifies npm packages against their source code, outperforming traditional provenance adoption in the JavaScript ecosystem.
Preprocessing large medical images for machine learning made easy!
Description • Installation • Usage • API Documentation • Citation
HistoPrep
makes is easy to prepare your histological slide images for deep
learning models. You can easily cut large slide images into smaller tiles and then
preprocess those tiles (remove tiles with shitty tissue, finger marks etc).
Install OpenSlide
on your system and then install histoprep with pip
!
pip install histoprep
Typical workflow for training deep learning models with histological images is the following:
With HistoPrep
, steps 1. and 2. are as easy as accidentally drinking too much at the
research group christmas party and proceeding to work remotely until June.
Let's start by cutting a slide from the PANDA kaggle challenge into small tiles.
from histoprep import SlideReader
# Read slide image.
reader = SlideReader("./slides/slide_with_ink.jpeg")
# Detect tissue.
threshold, tissue_mask = reader.get_tissue_mask(level=-1)
# Extract overlapping tile coordinates with less than 50% background.
tile_coordinates = reader.get_tile_coordinates(
tissue_mask, width=512, overlap=0.5, max_background=0.5
)
# Save tile images with image metrics for preprocessing.
tile_metadata = reader.save_regions(
"./train_tiles/", tile_coordinates, threshold=threshold, save_metrics=True
)
slide_with_ink: 100%|██████████| 390/390 [00:01<00:00, 295.90it/s]
Let's take a look at the output and visualise the thumbnails.
jopo666@~$ tree train_tiles
train_tiles
└── slide_with_ink
├── metadata.parquet # tile metadata
├── properties.json # tile properties
├── thumbnail.jpeg # thumbnail image
├── thumbnail_tiles.jpeg # thumbnail with tiles
├── thumbnail_tissue.jpeg # thumbnail of the tissue mask
└── tiles [390 entries exceeds filelimit, not opening dir]
That was easy, but it can be annoying to whip up a new python script every time you want
to cut slides, and thus it is recommended to use the HistoPrep
CLI program!
# Repeat the above code for all images in the PANDA dataset!
jopo666@~$ HistoPrep --input './train_images/*.tiff' --output ./tiles --width 512 --overlap 0.5 --max-background 0.5
As we can see from the above images, histological slide images often contain areas that we would not like to include into our training data. Might seem like a daunting task but let's try it out!
from histoprep.utils import OutlierDetector
# Let's wrap the tile metadata with a helper class.
detector = OutlierDetector(tile_metadata)
# Cluster tiles based on image metrics.
clusters = detector.cluster_kmeans(num_clusters=4, random_state=666)
# Visualise first cluster.
reader.get_annotated_thumbnail(
image=reader.read_level(-1), coordinates=detector.coordinates[clusters == 0]
)
I said it was gonna be easy! Now we can mark tiles in cluster 0
as outliers and
start overfitting our neural network! This was a simple example but the same code can be
used to cluster all several million tiles extracted from the PANDA
dataset and discard
outliers simultaneously!
If you use HistoPrep
to process the images for your publication, please cite the github repository.
@misc{histoprep,
author = {Pohjonen, Joona and Ariotta, Valeria},
title = {HistoPrep: Preprocessing large medical images for machine learning made easy!},
year = {2022},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {https://github.com/jopo666/HistoPrep},
}
FAQs
Read and process histological slide images with python!
We found that histoprep demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
vlt's new "reproduce" tool verifies npm packages against their source code, outperforming traditional provenance adoption in the JavaScript ecosystem.
Research
Security News
Socket researchers uncovered a malicious PyPI package exploiting Deezer’s API to enable coordinated music piracy through API abuse and C2 server control.
Research
The Socket Research Team discovered a malicious npm package, '@ton-wallet/create', stealing cryptocurrency wallet keys from developers and users in the TON ecosystem.