Igneous
$ igneous image xfer gs://other-lab/data file://./my-data --queue ./xfer-queue --shape 2048,2048,64
$ igneous image downsample file://./my-data --mip 0 --queue ./ds-queue
$ igneous execute -x ./ds-queue
$ igneous mesh forge s3://my-data/seg --mip 2 --queue sqs://mesh-queue
$ igneous --parallel 4 execute sqs://mesh-queue
$ igneous skeleton forge s3://my-data/seg --mip 2 --queue sqs://mesh-queue
$ igneous skeleton merge s3://my-data/seg --queue sqs://mesh-queue
$ igneous execute sqs://mesh-queue
$ igneous --help
Igneous is a TaskQueue and CloudVolume based pipeline for producing and managing visualizable Neuroglancer Precomputed volumes. It uses CloudVolume for accessing data on AWS S3, Google Storage, or the local filesystem. It can operate in the cloud using an SQS task queuing system or run locally on a single machine or cluster (using a file based SQS emulation).
Igneous is useful for downsampling, transferring, deleting, meshing, and skeletonizing large images. There are a few more esoteric functions too. You can watch a video tutorial here.
Originally by Nacho and Will.
Pre-Built Docker Container
You can use this container for scaling big jobs horizontally or to experiment with Igneous within the container.
https://hub.docker.com/r/seunglab/igneous/
Installation
You'll need Python 3, pip, (possibly) a C++ compiler (e.g. g++ or clang), and virtualenv. It's tested under Ubuntu 16.04 and Mac OS Monterey.
pip install igneous-pipeline
Manual Installation
Sometimes it's useful to tweak tasks for special circumstances, and so you'll want to use a developer installation.
git clone git@github.com:seung-lab/igneous.git
cd igneous
virtualenv venv
source venv/bin/activate
pip install numpy
pip install -r requirements.txt
python setup.py develop
Igneous is intended as a self-contained pipeline system and not as a library. Such uses are possible, but not supported. If specific functionality is needed, please open an issue and we can break that out into a library as has been done with several algorithms such as tinybrain, zmesh, and kimimaro.
Sample Local Use
Below we show three ways to use Igneous on a local workstation or cluster. As an example, we generate meshes for an already-existing Precomputed segmentation volume.
In Memory Queue (Simple Execution)
This procedure is good for running small jobs as it is very simple, allows you to make use of parallelization, but on the downside it is brittle. If a job fails, you may have to restart the entire task set.
from taskqueue import LocalTaskQueue
import igneous.task_creation as tc
cloudpath = 'gs://bucket/dataset/labels'
tq = LocalTaskQueue(parallel=8)
tasks = tc.create_meshing_tasks(cloudpath, mip=3, shape=(256, 256, 256))
tq.insert(tasks)
tq.execute()
tasks = tc.create_mesh_manifest_tasks(cloudpath)
tq.insert(tasks)
tq.execute()
print("Done!")
Filesystem Queue (Producer-Consumer)
This procedure is more robust as tasks can be restarted if they fail. The queue is written to the filesystem and as such can be used by any processor that can read and write files to the selected directory. Thus, there is the potential for local cluster processing. Conceptually, a single producer script populates a filesystem queue ("FileQueue") and then typically one worker per a core consumes each task. The FileQueue allows for leasing a task for a set amount of time. If the task is not completed, it recycles into the available task pool. The order with which tasks are consumed is not guaranteed, but is approximately FIFO (a random task is selected from the next 100 to avoid conflicts) if all goes well.
This mode is very new, so please report any issues. You can read about the queue design here. In particular, we expect you may see problems with NFS or other filesystems that have problems with networked file locking. However, purely local use should generally be issue free. You can read more tips on using FileQueue here. You can remove a FileQueue by deleting its containing directory.
Note that the command line tool ptq
("Python Task Queue") is co-installed with Igneous and can be used to monitor queue status using e.g. ptq status $QUEUENAME
.
Producer Script
from taskqueue import TaskQueue
import igneous.task_creation as tc
cloudpath = 'gs://bucket/dataset/labels'
tq = TaskQueue("fq:///path/to/queue/directory")
tasks = tc.create_meshing_tasks(cloudpath, mip=3, shape=(256, 256, 256))
tq.insert(tasks)
tq.execute()
tasks = tc.create_mesh_manifest_tasks(cloudpath)
tq.insert(tasks)
tq.execute()
print("Tasks created!")
Consumer Script
from taskqueue import TaskQueue
import igneous.tasks
tq = TaskQueue("fq:///path/to/queue/directory")
tq.poll(
verbose=True,
lease_seconds=600,
tally=True
)
Sample Cloud Use
Igneous is intended to be used with Kubernetes (k8s). A pre-built docker container is located on DockerHub as seunglab/igneous
. A sample deployment.yml
(used with kubectl create -f deployment.yml
) is located in the root of the repository.
As Igneous is based on CloudVolume, you'll need to create a google-secret.json
or aws-secret.json
to access buckets located on these services.
You'll need to create an Amazon SQS queue to store the tasks you generate. Google's TaskQueue was previously supported but the API changed. It may be supported in the future.
Populating the SQS Queue
There's a bit of an art to achieving high performance on SQS. You can read more about it here.
import sys
from taskqueue import TaskQueue
import igneous.task_creation as tc
cloudpath = sys.argv[1]
tq = TaskQueue("sqs://queue-url")
tasks = tc.create_downsampling_tasks(
cloudpath, mip=0,
fill_missing=True, preserve_chunk_size=True
)
tq.insert(tasks)
print("Done!")
Executing Tasks in the Cloud
The following instructions are for Google Container Engine, but AWS has similar tools.
export PROJECT_NAME=example
export CLUSTER_NAME=example
export NUM_NODES=5
gcloud container --project $PROJECT_NAME clusters create $CLUSTER_NAME --zone "us-east1-b" --machine-type "n1-standard-16" --image-type "GCI" --disk-size "50" --scopes "https://www.googleapis.com/auth/compute","https://www.googleapis.com/auth/devstorage.full_control","https://www.googleapis.com/auth/taskqueue","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/cloud-platform","https://www.googleapis.com/auth/servicecontrol","https://www.googleapis.com/auth/service.management.readonly","https://www.googleapis.com/auth/trace.append" --num-nodes $NUM_NODES --network "default" --enable-cloud-logging --no-enable-cloud-monitoring
gcloud config set container/cluster $CLUSTER_NAME
kubectl create secret generic secrets \
--from-file=$HOME/.cloudvolume/secrets/google-secret.json \
--from-file=$HOME/.cloudvolume/secrets/aws-secret.json \
--from-file=$HOME/.cloudvolume/secrets/boss-secret.json
kubectl create -f deployment.yml
gcloud container clusters resize $CLUSTER_NAME --num-nodes=20
kubectl scale deployment igneous --replicas=320
gcloud container clusters resize $CLUSTER_NAME --num-nodes=0
kubectl delete deployment igneous
Command Line Interface (CLI)
Igneous also comes with a command line interface for performing some routine tasks. We currently support downsample
, xfer
, mesh
, skeleton
, and execute
and plan to add more Igneous functions as well. Check igneous --help
to see the current menu of functions and their options.
The principle of the CLI is specify a source layer, a destination layer (if applicable), and a TaskQueue (e.g. sqs://
or fq://
). First, populate the queue with the correct task type and then execute against it.
The CLI is intended to handle typical tasks that aren't too complex. If your task gets weird, it's time to try scripting!
igneous image downsample gs://my-lab/data --mip 0 --queue ./my-queue
igneous execute ./my-queue
igneous --help
For those that have been using Igneous a long time, igneous execute
can replace python igneous/task_execution.py
.
Capabilities
You can find the following tasks in igneous/tasks/tasks.py
and can use them via editing or importing functions from igneous/task_creation.py
.
Capability | Tasks | Description |
---|
Downsampling | DownsampleTask | Generate image hierarchies. |
Meshing | MeshTask, MeshManifestTask | Create object meshes viewable in Neuroglancer. |
Skeletonize | SkeletonTask, SkeletonMergeTask | Create Neuroglancer viewable skeletons using a modified TEASAR algorithm. |
Transfer | TransferTask | Copy data, supports rechunking and coordinate translation. |
Deletion | DeleteTask | Delete a data layer. |
Contrast Normalization | LuminanceLevelsTask, ContrastNormalizationTask | Spread out slice histograms to fill value range. |
Connected Components | CCLFacesTask, CCLEquivalancesTask, more... | Compute the 6-way CCL of the whole segmentation. |
Quantization | QuantizeTask | Rescale values into 8-bit to make them easier to visualize. |
Remapping | WatershedRemapTask | Remap segmentations to create agglomerated labels. |
Eyewire Consensus Import | HyperSquareConsensusTask | Map Eyewire consensus into Neuroglancer. |
HyperSquare Ingest | HyperSquareTask | (deprecated) Convert Eyewire's HyperSquare format into Precomputed. |
HyperSquareConsensus | HyperSquareConsensusTask | Apply Eyewire consensus to a watershed version in Precomputed. |
Downsampling (DownsampleTask)
Requires compiled tinybrain library.
For any but the very smallest volumes, it's desirable to create smaller summary images of what may be multi-gigabyte
2D slices. The purpose of these summary images is make it easier to visualize the dataset or to work with lower
resolution data in the context of a data processing (e.g. ETL) pipeline.
Image (uint8, microscopy) datasets are typically downsampled in an recursive hierarchy using 2x2x1 average pooling. Segmentation (uint8-uint64, labels) datasets (i.e. human ground truth or machine labels) are downsampled using 2x2x1 mode pooling in a recursive hierarchy using the COUNTLESS algorithm. This means that mip 1 segmentation labels are exact mode computations, but subsequent ones may not be. Under this scheme, the space taken by downsamples will be at most 33% of the highest resolution image's storage.
Whether image or segmentation type downsampling will be used is determined from the neuroglancer info file's "type" attribute.
CLI Downsample
Here we show an example where we insert the tasks to downsample 4 mip levels using 2x2x1 pooling into a queue and process it. Then we insert the tasks to downsample from mip 4 up to mip 7 using 2x2x2 downsamples that ignores background to avoid ghosted images when multiple Z are combined.
PATH=gs://mydataset/layer
QUEUE=fq://./my-queue
igneous image downsample $PATH --mip 0 --num-mips 4 --queue $QUEUE
igneous execute $QUEUE
igneous image downsample $PATH --mip 4 --num-mips 3 --volumetric --sparse --queue $QUEUE
igneous image downsample $PATH --mip 0 --queue $QUEUE --sharded
igneous image downsample $PATH --queue $QUEUE --zrange 0,1
igneous execute $QUEUE
Scripting Downsample
tasks = create_downsampling_tasks(
layer_path,
mip=0,
fill_missing=False,
axis='z',
num_mips=5,
chunk_size=None,
preserve_chunk_size=True,
sparse=False,
bounds=None,
encoding=None
delete_black_uploads=False,
background_color=0,
compress='gzip',
factor=(2,2,1),
)
tasks = create_image_shard_downsample_tasks(
cloudpath, mip=0, fill_missing=False,
sparse=False, chunk_size=None,
encoding=None, memory_target=MEMORY_TARGET,
agglomerate=False, timestamp=None,
factor=(2,2,1)
)
Variable | Description |
---|
layer_path | Location of data layer. e.g. 'gs://bucket/dataset/layer'. c.f. CloudVolume |
mip | Integer. Which level of the resolution heirarchy to start downsampling from. 0 is highest res. Higher is lower res. -1 means use lowest res. |
fill_missing | If a file chunk is missing, fill it with zeros instead of throwing an error. |
chunk_size | Force this chunk_size in the underlying representation of the downsamples. Conflicts with preserve_chunk_size |
preserve_chunk_size | (True) Use the chunk size of this mip level for higher downsamples. (False) Use a fixed block size and generate downsamples with decreasing chunk size. Conflicts with chunk_size . |
sparse | Only has an effect on segmentation type images. False: The dataset contains large continuous labeled areas (most connectomics datasets). Uses the COUNTLESS 2D algorithm. True: The dataset contains sparse labels that are disconnected. Use the Stippled COUNTLESS 2D algorithm. |
bounds | Only downsample this region. If using a restricted bounding box, make sure it's chunk aligned at all downsampled mip levels. |
encoding | Force 'raw' or 'compressed_segmentation' for segmentation volumes. |
delete_black_uploads | Issue a delete instead of uploading files containing all background. |
background_color | Designates the background color. Only affects delete_black_uploads , not fill_missing . |
compress | What compression algorithm to use: None, 'gzip', 'br' (brotli) |
Data Transfer / Rechunking (TransferTask)
A common task is to take a dataset that was set up as single slices (X by Y by 1) chunks. This is often appropriate
for image alignment or other single section based processing tasks. However, this is not optimal for Neuroglancer
visualization or for achieving the highest performance over TCP networking (e.g. with CloudVolume). Therefore, it can make sense to rechunk the dataset to create deeper and overall larger chunks (e.g. 64x64x64, 128x128x32, 128x128x64). In some cases, it can also be desirable to translate the coordinate system of a data layer.
The TransferTask
will automatically run the first few levels of downsampling as well, making it easier to
visualize progress and reducing the amount of work a subsequent DownsampleTask
will need to do.
Another use case is to transfer a neuroglancer dataset from one cloud bucket to another, but often the cloud
provider's transfer service will suffice, even across providers.
CLI Transfer
Here's an example where we transfer from a source to destination dataset. There are many options available, see igneous xfer --help
.
igneous image xfer $SRC $DEST --queue $QUEUE
igneous image xfer $SRC $DEST --queue $QUEUE --sharded
igneous -p 4 execute $QUEUE
We have developed some calculation aids to help you pick the right shape for the transfer task.
igneous design ds-shape gs://bucket/dataset --shape 1024,1024,64 --factor 2,2,1
>>> 715.8 MB
igneous design ds-memory gs://bucket/dataset 3.5e9 --verbose
>>> Data Width: 8
>>> Factor: (2, 2, 1)
>>> Chunk Size: 512, 512, 16
>>> Memory Limit: 3.5 GB
>>> -----
>>> Optimized Shape: 4096,4096,16
>>> Downsamples: 3
>>> Memory Used*: 2.9 GB
>>>
>>> *memory used is for retaining the image and all downsamples.
>>> Additional costs may be incurred from processing.
Scripting Transfer
tasks = create_transfer_tasks(
src_layer_path, dest_layer_path,
chunk_size=None, shape=None,
fill_missing=False, translate=None,
bounds=None, mip=0, preserve_chunk_size=True,
encoding=None, skip_downsamples=False,
delete_black_uploads=False, background_color=0,
agglomerate=False, timestamp=None, compress='gzip',
factor=None, sparse=False, dest_voxel_offset=None,
memory_target=3.5e9, max_mips=5
)
tasks = create_image_shard_transfer_tasks(
src_layer_path, dst_layer_path,
mip=0, chunk_size=None,
encoding=None, bounds=None, fill_missing=False,
translate=(0, 0, 0), dest_voxel_offset=None,
agglomerate=False, timestamp=None,
memory_target=3.5e9,
)
Most of the options here are the same as for downsample. The major exceptions are shape
and skip_downsamples
. shape
designates the size of a single transfer task and must be chunk aligned. The number of downsamples that will be generated can be computed as log2(shape
/ chunk_size
). skip_downsamples
will prevent downsamples from being generated.
Due to the memory constraints, sharded tasks do not automatically generate downsamples.
Deletion (DeleteTask)
If you want to parallelize deletion of an image layer in a bucket beyond using e.g. gsutil -m rm
, you can
horizontally scale out deleting using these tasks. Note that the tasks assume that the information to be deleted
is chunk aligned and named appropriately.
CLI
igneous image rm $LAYER --queue $QUEUE
igneous execute $QUEUE
Scripting
tasks = create_deletion_tasks(
layer_path,
mip=0,
num_mips=5,
shape=None,
bounds=None
)
Meshing (MeshTask & MeshManifestTask)
Requires compiled zmesh library.
Meshing is a two stage process. First, the dataset is divided up into a regular grid of tasks that will be meshed independently of
each other using the MeshTask
. The resulting mesh fragments are uploaded to the destination layer's meshing directory
(named something like mesh_mip_3_err_40
).
There are two ways do conduct meshing. The standard "unsharded" way can generate a lot of mesh fragment files. It scales to about 100M labels before it starts incurring unreasonable costs on cloud systems. To handle larger volumes, there is the somwhat more difficult to use sharded meshing process that condenses the number of files by orders of magnitude.
Multi-Resolution meshes are supported. Specify the desired number of levels of detail and up to that number will be generated per a mesh (if the mesh is large enough to need it based on the chunk size). Levels of detail are generated by simplifying the base mesh with pyfqmr with successively more aggressive parameters.
Unsharded Meshing
Without additional processing, Neuroglancer has no way of
knowing the names of these chunks (which will be named something like $SEGID:0:$BOUNDING_BOX
e.g. 1052:0:0-512_0-512_0-512
).
The $BOUNDING_BOX
part of the name is arbitrary and is the convention used by igneous because it is convenient for debugging.
The manually actuated second stage runs the MeshManifestTask
which generates files named $SEGID:0
which contains a short JSON snippet like { "fragments": [ "1052:0:0-512_0-512_0-512" ] }
. This file tells Neuroglancer and CloudVolume which mesh files to download when accessing a given segment ID.
If multiple levels of detail are specified, the mesh files will be organized differently as they will be using the newer container format.
Sharded Meshing
Sharded Meshes are not only condensed, but also draco encoded with an integer position attribute. The files must be initially meshed and then a set of meshes gathered into the memory of a single machine which can then synthesize the shard file. This requires more time and memory to generate than unsharded meshes, but simplifies management of the resultant data set by creating far fewer files. The shard files have names like a31.shard
. A sharded dataset is indicated by the info file in the mesh directory having { "@type": "neuroglancer_multilod_draco" }
.
CLI Meshing
The CLI supports only standard Precomputed. Graphene is not currently supported. There are many more options, check out igneous mesh --help
, igneous mesh forge --help
, and igneous mesh merge --help
.
igneous mesh forge $PATH --mip 2 --queue $QUEUE
igneous execute $QUEUE
igneous mesh merge $PATH --magnitude 2 --queue $QUEUE --nlod 0
igneous execute $QUEUE
igneous mesh forge $PATH --mip 2 --queue $QUEUE --sharded
igneous execute $QUEUE
igneous mesh merge-sharded $PATH --queue $QUEUE --nlod 0
igneous execute $QUEUE
Scripting Meshing
tasks = create_meshing_tasks(
layer_path,
mip,
shape=(448, 448, 448),
simplification=True,
max_simplification_error=40,
mesh_dir=None,
cdn_cache=False,
dust_threshold=None,
object_ids=None,
progress=False,
fill_missing=False,
encoding='precomputed'
spatial_index=True,
sharded=False,
)
tasks = create_mesh_manifest_tasks(layer_path, magnitude=3)
The parameters above are mostly self explanatory, but the magnitude parameter of create_mesh_manifest_tasks
is a bit odd. What a MeshManifestTask does is iterate through a proportion of the files defined by a filename prefix. magnitude
splits up the work by
an additional 10^magnitude. A high magnitude (3-5+) is appropriate for horizontal scaling workloads while small magnitudes
(1-2) are more suited for small volumes locally processed since there is overhead introduced by splitting up the work.
Of note: Meshing is a memory intensive operation. The underlying zmesh library has an optimization for meshing volumes smaller than 1024 voxels on the X and Y dimensions and 512 in Z which could be helpful to take advantage of. Meshing time scales with the number of labels contained in the volume.
Skeletonization (SkeletonTask, SkeletonMergeTask)
Igneous provides the engine for performing out-of-core skeletonization of labeled images.
The in-core part of the algorithm is provided by the Kimimaro library.
The strategy is to apply Kimimaro mass skeletonization to 1 voxel overlapping chunks of the segmentation and then fuse them in a second pass. Both sharded and unsharded formats are supported. For very large datasets, note that sharded runs better on a local cluster as it can make use of mmap
.
We also support computing the cross sectional area at each vertex, but be aware that this will add significant time to the computation (currently many hours for a densely labeled task). This is practical for sparse labeling though. This should be improved substantially in the future.
CLI Skeletonization
The CLI for skeletonization is similar to meshing. Graphene is not supported. However, both sharded and unsharded formats are.
igneous skeleton forge $PATH --mip 2 --queue $QUEUE --scale 2.5 --const 10
igneous execute $QUEUE
igneous skeleton merge $PATH --queue $QUEUE --tick-threshold 500 --max-cable-length 10000000
igneous execute $QUEUE
igneous skeleton forge $PATH --mip 2 --queue $QUEUE --scale 2.5 --const 10 --sharded
igneous execute $QUEUE
igneous skeleton merge-sharded $PATH --queue $QUEUE --tick-threshold 500 --max-cable-length 10000000 --minishard-bits 7 --shard-bits 3 --preshift-bits 4
igneous execute $QUEUE
Scripting Skeletonization
import igneous.task_creation as tc
tasks = tc.create_skeletonizing_tasks(
cloudpath,
mip,
shape=Vec(512, 512, 512),
sharded=False,
spatial_index=False,
info=None,
fill_missing=False,
teasar_params={'scale':10, 'const': 10},
object_ids=None,
mask_ids=None,
fix_branching=True,
fix_borders=True,
dust_threshold=1000,
progress=False,
parallel=1,
cross_sectional_area=False,
cross_sectional_area_smoothing_window=5,
)
tasks = tc.create_unsharded_skeleton_merge_tasks(
layer_path, mip,
crop=0,
magnitude=3,
dust_threshold=4000,
tick_threshold=6000,
delete_fragments=False
)
tasks = tc.create_sharded_skeleton_merge_tasks(
layer_path,
dust_threshold=1000,
tick_threshold=3500,
shard_index_bytes=2**13,
minishard_index_bytes=2**15,
minishard_index_encoding='gzip',
data_encoding='gzip'
max_cable_length=None,
spatial_index_db=None
)
Contrast Normalization (LuminanceLevelsTask & ContrastNormalizationTask)
Sometimes a dataset's luminance values cluster into a tight band and make the image unnecessarily bright or dark and above all
low contrast. Sometimes the data may be 16 bit, but the values cluster all at the low end, making it impossible to even see without
using ImageJ / Fiji or another program that supports automatic image normalization. Furthermore, Fiji can only go so far on a
Teravoxel or Petavoxel dataset.
The object of these tasks are to first create a representative sample of the luminance levels of a dataset per a Z slice (i.e. a frequency count of gray values). This levels information is then used to perform per Z section contrast normalization. In the future, perhaps we will attempt global normalization. The algorithm currently in use reads the levels files for a given Z slice,
determines how much of the ends of the distribution to lop off, perhaps 1% on each side (you should plot the levels files for your own data as this is configurable, perhaps you might choose 0.5% or 0.25%). The low value is recentered at 0, and the high value is stretched to 255 (in the case of uint8s) or 65,535 (in the case of uint16).
CLI Contrast Normalization
igneous image contrast histogram $PATH --queue $QUEUE --coverage 0.01 --mip 0
igneous image contrast equalize $SRC_PATH $DEST_PATH --queue $QUEUE --clip-fraction 0.01 --mip 0
Scripting Contrast Normalization
tasks = create_luminance_levels_tasks(layer_path, coverage_factor=0.01, shape=None, offset=(0,0,0), mip=0)
tasks = create_contrast_normalization_tasks(src_path, dest_path, shape=None, mip=0, clip_fraction=0.01, fill_missing=False, translate=(0,0,0))
Connected Components Labeling (CCL) (Beta!)
Igneous supports whole image connected components labeling of a segmentation. Currently, only 6-connected components are supported. The largest image currently supported would have 2^64 voxels (about 18 exavoxels or 18+ whole mouse brains). You can apply CCL to either a labeled image or to a grayscale image that can be binarized with a threshold.
The whole image CCL algorithm requires four steps that must be executed in order. The shape specified and optional binarization and dust thresholds must be the same for all steps nonsensical outputs will result. By default, the values will be consistent. To apply a binarization threshold, you can apply both or one of --threshold-lte
(<=
) and --threshold-gte
(>=
). You can also apply a --dust
threshold to remove unwanted objects smaller than this threshold.
This capability is very new and may have some quirks, so please report any issues.
Scripting CCL
import igneous.task_creation as tc
import igneous.tasks.image.ccl
tasks = tc.create_ccl_face_tasks(
cloudpath, mip, shape=(512,512,512),
threshold_gte=None, threshold_lte=None,
dust_threshold=0,
)
tasks = tc.create_ccl_equivalence_tasks(
cloudpath, mip, shape,
threshold_gte, threshold_lte,
dust_threshold
)
igneous.tasks.image.ccl.create_relabeling(src, mip, shape)
tasks = tc.create_ccl_relabel_tasks(
src_path, dest_path,
mip, shape=(512,512,512),
chunk_size=None, encoding=None,
threshold_gte=None,
threshold_lte=None,
dust_threshold=0,
)
CLI CCL
igneous image ccl faces SRC --mip 0 --queue queue
igneous execute -x queue
igneous image ccl links SRC --mip 0 --queue queue
igneous execute -x queue
igneous image ccl calc-labels SRC --mip 0
igneous image ccl relabel SRC DEST --mip 0 --queue queue --encoding compresso
igneous execute -x queue
igneous image ccl clean SRC
For smaller images that could reasonably be processed on a single machine there is a shortcut auto
that will also automatically execute.
igneous -p PARALLEL image ccl auto SRC DEST --shape 512,512,512 --encoding compresso --queue queue
Computing Per-Object Voxel Counts
This will create a MapBuffer
dictionary containing the global number of voxels per a label at the location $CLOUDPATH/$KEY/stats/voxel_counts.mb
. You can then use this file to lookup the global voxel count for each label.
Scripting Voxel Counts
import igneous.task_creation as tc
cloudpath = ...
mip = 0
tasks = tc.create_voxel_counting_tasks(
cloudpath, mip=mip
)
tq = LocalTaskQueue(parallel=1)
tq.insert_all(tasks)
tc.accumulate_voxel_counts(cloudpath, mip)
from cloudfiles import CloudFile
from mapbuffer import IntMap
cf = CloudFile("/".join(cloudpath, "stats", "voxel_counts.im"))
im = IntMap(cf)
im = IntMap(cf.get())
im[label]
CLI Voxel Counts
igneous image voxels count SRC --mip 0 --queue queue
igneous execute -x queue
igneous image voxels sum SRC --mip 0
Reordering Z-Slices
When acquiring a new microscopy stack, for a variety of reasons,
such as process interruptions, reimaging, etc, the montaged slices
may not be compact in Z or otherwise out of order. Using a JSON
file, you can specify (sparsely) which movements in Z are required
to put the image stack in order.
{ "1": 2, "2": 3, "3": 1 }
The mapping file will be analyzed to ensure no slices are dropped
before creating the task set.
CLI Z-Slice Reorder
igneous image reorder SRC DEST --queue queue --mip 0 --mapping-file mapping.json
Tissue ROI Detection
Sometimes, especially during the alignment of a new microscopy stack, volume bounds may greatly exceed the tissue containing regions. This results in wasted computation when processing large volumes. As of version 8.22.0, CloudVolume can use a set of pre-computed bounding boxes to avoid issuing network requests to empty regions.
Igneous can compute these regions by checking low resolution images of the dataset for the presence of tissue and record those bounding boxes in the highest resolution scale of the info file in the following format where the variable names are integers and the bounds are inclusive.
"rois": [ [ xstart, ystart, zstart, xend, yend, zend ], ... ]
The below function performs all computation in one step and does not require using task queues. It can be memory intensive if the volume is large and not sufficiently downsampled. The lowest resolution downsample available will be used, and possibly downsampled further in memory before being analyzed.
tc.compute_rois(
cloudpath:str,
progress:bool = False,
suppress_faint_voxels:int = 0,
dust_threshold:int = 10,
max_axial_length:int = 512,
z_step:Optional[int] = None,
)
When the function finishes executing, it will print out the number of bounding boxes found. Depending on your data, a reasonable number of bounding boxes are between 1 to 15. Above 25 bounding boxes, CloudVolume may incur more than 1 millisecond of additional processing per a cutout.
If you see hundreds of bounding boxes have generated unexpectedly, try examining your image more carefully and consider suppressing faint voxels or tiny connected components.
CLI Tissue ROI Detection
igneous image roi $PATH
igneous image roi $PATH --z-step 100
igneous image roi $PATH --max-axial-len 256
Conclusion
It's possible something has changed or is not covered in this documentation. Please read igneous/task_creation/
and igneous/tasks/
for the most current information.
Please post an issue or PR if you think something needs to be addressed.
Related Projects
- tinybrain - Downsampling code for images and segmentations.
- kimimaro - Skeletonization of dense volumetric labels.
- zmesh - Mesh generation and simplification for dense volumetric labels.
- CloudVolume - IO for images, meshes, and skeletons.
- python-task-queue - Parallelized dependency-free cloud task management.
- DracoPy - Encode/Decode Draco compressed meshes in Python
- MapBuffer - Zero decode random access to uint64 keyed byte streams (useful for shard construction)
Acknowledgements
Special thanks to everyone that has contributed to Igneous! I'll fill this section in more fully later, but in particular recent thanks to Jeremy Maitin-Shepard, David Ackermann, Hythem Sidky, and Sridhar Jagannathan for giving advice and sharing code for producing sharded multi-res meshes. Thanks to Chris Roat for improving the CLI's typing. Thanks to Nico Kemnitz for improving the original mesh task. Of course, thanks to lab alumni Ignacio Tartavull who started the project with me and provided its initial impetus.