Example: Segmenting the U2OS Small Dataset with Newer Plugins
This vignette demonstrates how to run cell segmentation on the publicly available U2OS small dataset using the CellposeSAM and InstanSeg segmentation plugins.
The U2OS small dataset (3953 × 3960 px, 5 stains, 7 z-levels) is the same dataset used in the Example: Segmenting a Small Dataset Saved on a Local Hard Drive vignette. Where that vignette uses the legacy built-in Cellpose family, this one shows the newer plugin-based workflows.
Note
CellposeSAM and InstanSeg are separate packages — they are not included
with pip install vpt[all]. Each plugin must be installed individually in
the same Python environment as VPT. See Installation for details.
Before Beginning: System Setup
Make sure your environment meets the system-requirements before proceeding. In particular, CellposeSAM requires a CUDA-capable GPU.
Download the dataset
wget -q https://d21zg11mb7aqva.cloudfront.net/202305010900_U2OS_small_set_VMSC00000.zip
unzip -q 202305010900_U2OS_small_set_VMSC00000.zip
CellposeSAM
Install the Plugin
pip install vpt-plugin-cellposesam
Additional plugin documentation and source-install instructions are available in the vpt-plugin-cellposesam repository.
Verify that VPT recognises the plugin:
vpt --help
Segmentation Specification
Create a file named cellposesam_u2os.json with the following contents. This
specification selects three stains (DAPI, PolyT, Cellbound1), segments z-layer 3,
and applies CLAHE preprocessing to each channel:
{
"experiment_properties": {
"all_z_indexes": [0, 1, 2, 3, 4, 5, 6],
"z_positions_um": [1.5, 3.0, 4.5, 6.0, 7.5, 9.0, 10.5]
},
"segmentation_tasks": [
{
"task_id": 0,
"segmentation_family": "CellposeSAM",
"entity_types_detected": ["cell"],
"z_layers": [3],
"segmentation_properties": {
"model": "cellpose-sam",
"model_dimensions": "2D",
"custom_weights": null,
"version": "latest"
},
"task_input_data": [
{
"image_channel": "DAPI",
"image_preprocessing": [
{"name": "normalize", "parameters": {"type": "CLAHE", "clip_limit": 0.01, "filter_size": [100, 100]}}
]
},
{
"image_channel": "PolyT",
"image_preprocessing": [
{"name": "normalize", "parameters": {"type": "CLAHE", "clip_limit": 0.01, "filter_size": [100, 100]}}
]
},
{
"image_channel": "Cellbound1",
"image_preprocessing": [
{"name": "normalize", "parameters": {"type": "CLAHE", "clip_limit": 0.01, "filter_size": [100, 100]}}
]
}
],
"segmentation_parameters": {
"nuclear_channel": "DAPI",
"entity_fill_channel": "PolyT",
"diameter": 30,
"flow_threshold": 0.95,
"cellprob_threshold": -5.5,
"minimum_mask_size": 500
},
"polygon_parameters": {
"simplification_tol": 2,
"smoothing_radius": 10,
"minimum_final_area": 500
}
}
],
"segmentation_task_fusion": {
"entity_fusion_strategy": "harmonize",
"fused_polygon_postprocessing_parameters": {
"min_distance_between_entities": 1,
"min_final_area": 500
}
},
"output_files": [
{
"entity_types_output": ["cell"],
"files": {
"run_on_tile_dir": "result_tiles/",
"mosaic_geometry_file": "cellpose_mosaic_space.parquet",
"micron_geometry_file": "cellpose_micron_space.parquet",
"cell_metadata_file": "cellpose_cell_metadata.csv"
}
}
]
}
Run Segmentation
vpt --verbose run-segmentation \
--segmentation-algorithm cellposesam_u2os.json \
--input-images '202305010900_U2OS_small_set_VMSC00000/region_0/images/mosaic_(?P<stain>[\w|-]+)_z(?P<z>[0-9]+).tif' \
--input-micron-to-mosaic 202305010900_U2OS_small_set_VMSC00000/region_0/images/micron_to_mosaic_pixel_transform.csv \
--output-path u2os_cellposesam_output/ \
--tile-size 2400 --tile-overlap 200
Expected Results
CellposeSAM should detect approximately 800 cell entities across 4 tiles. The output directory will contain:
cellpose_micron_space.parquet— cell boundary polygons in micron coordinatescellpose_mosaic_space.parquet— cell boundary polygons in mosaic pixel coordinatesresult_tiles/— per-tile intermediate results
These geometry tables store one row per entity per z-level, so count cells by unique EntityID rather than total parquet rows.
The cell_metadata_file entry in the segmentation specification reserves the
downstream metadata filename. The metadata file itself is created later by
derive-entity-metadata, not by run-segmentation.
In one representative run on a mid-range CUDA GPU, this completed in approximately 2 minutes 20 seconds wall-clock time.
InstanSeg
Install the Plugin
pip install vpt-plugin-instanseg
Additional plugin documentation and source-install instructions are available in the vpt-plugin-instanseg repository.
Verify that VPT recognises the plugin:
vpt --help
Segmentation Specification
Create a file named instanseg_u2os.json with the following contents. InstanSeg
is channel-invariant, so channel order does not matter. The pixel_size is set
to 0.108 µm to match the U2OS dataset resolution:
{
"experiment_properties": {
"all_z_indexes": [0, 1, 2, 3, 4, 5, 6],
"z_positions_um": [1.5, 3.0, 4.5, 6.0, 7.5, 9.0, 10.5]
},
"segmentation_tasks": [
{
"task_id": 0,
"segmentation_family": "InstanSeg",
"entity_types_detected": ["cell"],
"z_layers": [3],
"segmentation_properties": {
"model": "fluorescence_nuclei_and_cells",
"model_dimensions": "2D",
"version": "0.1.1",
"custom_weights": null
},
"task_input_data": [
{
"image_channel": "DAPI",
"image_preprocessing": [
{"name": "normalize", "parameters": {"type": "CLAHE", "clip_limit": 0.01, "filter_size": [100, 100]}}
]
},
{
"image_channel": "PolyT",
"image_preprocessing": [
{"name": "normalize", "parameters": {"type": "CLAHE", "clip_limit": 0.01, "filter_size": [100, 100]}}
]
},
{
"image_channel": "Cellbound1",
"image_preprocessing": [
{"name": "normalize", "parameters": {"type": "CLAHE", "clip_limit": 0.01, "filter_size": [100, 100]}}
]
}
],
"segmentation_parameters": {
"pixel_size": 0.108,
"normalise": true,
"target": "all_outputs",
"rescale_output": true
},
"polygon_parameters": {
"simplification_tol": 2,
"smoothing_radius": 10,
"minimum_final_area": 100
}
}
],
"segmentation_task_fusion": {
"entity_fusion_strategy": "harmonize",
"fused_polygon_postprocessing_parameters": {
"min_distance_between_entities": 1,
"min_final_area": 100
}
},
"output_files": [
{
"entity_types_output": ["cell"],
"files": {
"run_on_tile_dir": "result_tiles/",
"mosaic_geometry_file": "instanseg_mosaic_space.parquet",
"micron_geometry_file": "instanseg_micron_space.parquet",
"cell_metadata_file": "instanseg_cell_metadata.csv"
}
}
]
}
Run Segmentation
vpt --verbose run-segmentation \
--segmentation-algorithm instanseg_u2os.json \
--input-images '202305010900_U2OS_small_set_VMSC00000/region_0/images/mosaic_(?P<stain>[\w|-]+)_z(?P<z>[0-9]+).tif' \
--input-micron-to-mosaic 202305010900_U2OS_small_set_VMSC00000/region_0/images/micron_to_mosaic_pixel_transform.csv \
--output-path u2os_instanseg_output/ \
--tile-size 2400 --tile-overlap 200
Expected Results
InstanSeg should detect approximately 860 cells across 4 tiles. The output directory will contain:
instanseg_micron_space.parquet— cell boundary polygons in micron coordinatesinstanseg_mosaic_space.parquet— cell boundary polygons in mosaic pixel coordinatesresult_tiles/— per-tile intermediate results
These geometry tables store one row per entity per z-level, so count cells by unique EntityID rather than total parquet rows.
The cell_metadata_file entry in the segmentation specification reserves the
downstream metadata filename. The metadata file itself is created later by
derive-entity-metadata, not by run-segmentation.
On a mid-range CUDA GPU this run completed in approximately 23 seconds wall-clock time. For multi-tissue runtime comparisons, see Segmentation Benchmarks.
Next Steps
After segmentation, the output parquet files can be used in the standard VPT workflow (partition transcripts, derive entity metadata, sum signals, update VZG) exactly as shown in the Example: Segmenting a Small Dataset Saved on a Local Hard Drive vignette.
For a guide to retraining a Cellpose2 model with manual annotations, see the Example: Re-segmenting a MERSCOPE Heart Dataset with a Machine Learning Model Customized with Manual Annotations vignette.
For detailed parameter reference pages, see: