Example: Segmenting the U2OS Small Dataset with Newer Plugins

This vignette demonstrates how to run cell segmentation on the publicly available U2OS small dataset using the CellposeSAM and InstanSeg segmentation plugins.

The U2OS small dataset (3953 × 3960 px, 5 stains, 7 z-levels) is the same dataset used in the Example: Segmenting a Small Dataset Saved on a Local Hard Drive vignette. Where that vignette uses the legacy built-in Cellpose family, this one shows the newer plugin-based workflows.

Note

CellposeSAM and InstanSeg are separate packages — they are not included with pip install vpt[all]. Each plugin must be installed individually in the same Python environment as VPT. See Installation for details.

Before Beginning: System Setup

Make sure your environment meets the system-requirements before proceeding. In particular, CellposeSAM requires a CUDA-capable GPU.

Download the dataset

wget -q https://d21zg11mb7aqva.cloudfront.net/202305010900_U2OS_small_set_VMSC00000.zip
unzip -q 202305010900_U2OS_small_set_VMSC00000.zip

CellposeSAM

Install the Plugin

pip install vpt-plugin-cellposesam

Additional plugin documentation and source-install instructions are available in the vpt-plugin-cellposesam repository.

Verify that VPT recognises the plugin:

vpt --help

Segmentation Specification

Create a file named cellposesam_u2os.json with the following contents. This specification selects three stains (DAPI, PolyT, Cellbound1), segments z-layer 3, and applies CLAHE preprocessing to each channel:

{
  "experiment_properties": {
    "all_z_indexes": [0, 1, 2, 3, 4, 5, 6],
    "z_positions_um": [1.5, 3.0, 4.5, 6.0, 7.5, 9.0, 10.5]
  },
  "segmentation_tasks": [
    {
      "task_id": 0,
      "segmentation_family": "CellposeSAM",
      "entity_types_detected": ["cell"],
      "z_layers": [3],
      "segmentation_properties": {
        "model": "cellpose-sam",
        "model_dimensions": "2D",
        "custom_weights": null,
        "version": "latest"
      },
      "task_input_data": [
        {
          "image_channel": "DAPI",
          "image_preprocessing": [
            {"name": "normalize", "parameters": {"type": "CLAHE", "clip_limit": 0.01, "filter_size": [100, 100]}}
          ]
        },
        {
          "image_channel": "PolyT",
          "image_preprocessing": [
            {"name": "normalize", "parameters": {"type": "CLAHE", "clip_limit": 0.01, "filter_size": [100, 100]}}
          ]
        },
        {
          "image_channel": "Cellbound1",
          "image_preprocessing": [
            {"name": "normalize", "parameters": {"type": "CLAHE", "clip_limit": 0.01, "filter_size": [100, 100]}}
          ]
        }
      ],
      "segmentation_parameters": {
        "nuclear_channel": "DAPI",
        "entity_fill_channel": "PolyT",
        "diameter": 30,
        "flow_threshold": 0.95,
        "cellprob_threshold": -5.5,
        "minimum_mask_size": 500
      },
      "polygon_parameters": {
        "simplification_tol": 2,
        "smoothing_radius": 10,
        "minimum_final_area": 500
      }
    }
  ],
  "segmentation_task_fusion": {
    "entity_fusion_strategy": "harmonize",
    "fused_polygon_postprocessing_parameters": {
      "min_distance_between_entities": 1,
      "min_final_area": 500
    }
  },
  "output_files": [
    {
      "entity_types_output": ["cell"],
      "files": {
        "run_on_tile_dir": "result_tiles/",
        "mosaic_geometry_file": "cellpose_mosaic_space.parquet",
        "micron_geometry_file": "cellpose_micron_space.parquet",
        "cell_metadata_file": "cellpose_cell_metadata.csv"
      }
    }
  ]
}

Run Segmentation

vpt --verbose run-segmentation \
  --segmentation-algorithm cellposesam_u2os.json \
  --input-images '202305010900_U2OS_small_set_VMSC00000/region_0/images/mosaic_(?P<stain>[\w|-]+)_z(?P<z>[0-9]+).tif' \
  --input-micron-to-mosaic 202305010900_U2OS_small_set_VMSC00000/region_0/images/micron_to_mosaic_pixel_transform.csv \
  --output-path u2os_cellposesam_output/ \
  --tile-size 2400 --tile-overlap 200

Expected Results

CellposeSAM should detect approximately 800 cell entities across 4 tiles. The output directory will contain:

cellpose_micron_space.parquet — cell boundary polygons in micron coordinates
cellpose_mosaic_space.parquet — cell boundary polygons in mosaic pixel coordinates
result_tiles/ — per-tile intermediate results

These geometry tables store one row per entity per z-level, so count cells by unique EntityID rather than total parquet rows.

The cell_metadata_file entry in the segmentation specification reserves the downstream metadata filename. The metadata file itself is created later by derive-entity-metadata, not by run-segmentation.

In one representative run on a mid-range CUDA GPU, this completed in approximately 2 minutes 20 seconds wall-clock time.

InstanSeg

Install the Plugin

pip install vpt-plugin-instanseg

Additional plugin documentation and source-install instructions are available in the vpt-plugin-instanseg repository.

Verify that VPT recognises the plugin:

vpt --help

Segmentation Specification

Create a file named instanseg_u2os.json with the following contents. InstanSeg is channel-invariant, so channel order does not matter. The pixel_size is set to 0.108 µm to match the U2OS dataset resolution:

{
  "experiment_properties": {
    "all_z_indexes": [0, 1, 2, 3, 4, 5, 6],
    "z_positions_um": [1.5, 3.0, 4.5, 6.0, 7.5, 9.0, 10.5]
  },
  "segmentation_tasks": [
    {
      "task_id": 0,
      "segmentation_family": "InstanSeg",
      "entity_types_detected": ["cell"],
      "z_layers": [3],
      "segmentation_properties": {
        "model": "fluorescence_nuclei_and_cells",
        "model_dimensions": "2D",
        "version": "0.1.1",
        "custom_weights": null
      },
      "task_input_data": [
        {
          "image_channel": "DAPI",
          "image_preprocessing": [
            {"name": "normalize", "parameters": {"type": "CLAHE", "clip_limit": 0.01, "filter_size": [100, 100]}}
          ]
        },
        {
          "image_channel": "PolyT",
          "image_preprocessing": [
            {"name": "normalize", "parameters": {"type": "CLAHE", "clip_limit": 0.01, "filter_size": [100, 100]}}
          ]
        },
        {
          "image_channel": "Cellbound1",
          "image_preprocessing": [
            {"name": "normalize", "parameters": {"type": "CLAHE", "clip_limit": 0.01, "filter_size": [100, 100]}}
          ]
        }
      ],
      "segmentation_parameters": {
        "pixel_size": 0.108,
        "normalise": true,
        "target": "all_outputs",
        "rescale_output": true
      },
      "polygon_parameters": {
        "simplification_tol": 2,
        "smoothing_radius": 10,
        "minimum_final_area": 100
      }
    }
  ],
  "segmentation_task_fusion": {
    "entity_fusion_strategy": "harmonize",
    "fused_polygon_postprocessing_parameters": {
      "min_distance_between_entities": 1,
      "min_final_area": 100
    }
  },
  "output_files": [
    {
      "entity_types_output": ["cell"],
      "files": {
        "run_on_tile_dir": "result_tiles/",
        "mosaic_geometry_file": "instanseg_mosaic_space.parquet",
        "micron_geometry_file": "instanseg_micron_space.parquet",
        "cell_metadata_file": "instanseg_cell_metadata.csv"
      }
    }
  ]
}

Run Segmentation

vpt --verbose run-segmentation \
  --segmentation-algorithm instanseg_u2os.json \
  --input-images '202305010900_U2OS_small_set_VMSC00000/region_0/images/mosaic_(?P<stain>[\w|-]+)_z(?P<z>[0-9]+).tif' \
  --input-micron-to-mosaic 202305010900_U2OS_small_set_VMSC00000/region_0/images/micron_to_mosaic_pixel_transform.csv \
  --output-path u2os_instanseg_output/ \
  --tile-size 2400 --tile-overlap 200

Expected Results

InstanSeg should detect approximately 860 cells across 4 tiles. The output directory will contain:

instanseg_micron_space.parquet — cell boundary polygons in micron coordinates
instanseg_mosaic_space.parquet — cell boundary polygons in mosaic pixel coordinates
result_tiles/ — per-tile intermediate results

These geometry tables store one row per entity per z-level, so count cells by unique EntityID rather than total parquet rows.

The cell_metadata_file entry in the segmentation specification reserves the downstream metadata filename. The metadata file itself is created later by derive-entity-metadata, not by run-segmentation.

On a mid-range CUDA GPU this run completed in approximately 23 seconds wall-clock time. For multi-tissue runtime comparisons, see Segmentation Benchmarks.

Next Steps

After segmentation, the output parquet files can be used in the standard VPT workflow (partition transcripts, derive entity metadata, sum signals, update VZG) exactly as shown in the Example: Segmenting a Small Dataset Saved on a Local Hard Drive vignette.

For a guide to retraining a Cellpose2 model with manual annotations, see the Example: Re-segmenting a MERSCOPE Heart Dataset with a Machine Learning Model Customized with Manual Annotations vignette.

For detailed parameter reference pages, see: