Multiple Entities

Overview

In vpt, an "entity" refers to a cellular or sub-cellular structure for which we can identify a boundary polygon, such as a cell or a nucleus. vpt is expanding its capabilities to handle multiple entity types during both segmentation and the creation of entity relationships. As an example, a user may construct their own segmentation task that outputs both cells and nuclei, and this experimental version of vpt is able to establish relationships between pairs of cell and nucleus outputs. The rules for connecting entities of different types (e.g. cells and nuclei) are called "constraints," and can be customized by the user in the Segmentation Algorithm JSON file (see Segmentation Task Definition). A more detailed example of entity relationship constraints can be found in the Multiple Entities section.

The multiple entities that exist in this experimental version of vpt can assume different relationships under certain user-defined constraints. Each entity can stand alone with no relationship to any other type of other, be the "parent" to another entity, or the "child" of another entity. Taking cells and nuclei as an example, a common approach might be to enforce that every nucleus (or child) must have a cell (parent) that encompasses it. Because we are establishing relationships that may not already exist in the raw segmentation output, each constraint has a resolution strategy in the event that it is violated. The relationship constraints and resolution strategies that currently exist are as follows.

Constraints

maximum_child_count: Each parent entity is checked for children. If the number of children is greater than the input value, a problem is detected for that parent entity.
Valid resolution strategies:
remove_child

remove_parent
minimum_child_count: Each parent entity is checked for children. If the number of children is smaller than the input value, a problem is detected for that parent entity.
Valid resolution strategies:
remove_parent

create_child (only applied in segment-on-tile step)
child_must_have_parent: Each child entity is checked for a parent. If no parent is assigned, a problem is detected for that child entity.
Valid resolution strategies:
create_parent (only applied in segment-on-tile step)

remove_child
parent_must_cover_child: Each child entity is to see if the parent entity completely covers the child (shapely predicate operation). If the child is not covered, a problem is detected for the child entity.
Valid resolution strategies:
shrink_child

remove_child
child_intersect_one_parent: Each child entity is to see if it intersects a parent entity other than the one that it is assigned to. If the child intersects a second parent entity, a problem is detected for the child entity.
Valid resolution strategies:
shrink_child

remove_child

Resolution Strategies

remove_parent: Delete the parent entity from the DataFrame
remove_child: Remove one or more child entities from the DataFrame
create_child: Create a child entity that is a copy of the parent entity in the child DataFrame. Assign parent when adding entity.
create_parent: Create a parent entity that is a copy of the child entity in the parent DataFrame. Assign parent when adding entity.
shrink_child: Use the parent entity to crop the child entity so that it fits completely within the parent.

Definitions

Key	Type	Values	Meaning
`parent_type`	string	Any string
`child_type`	string	Any string
`child_coverage_threshold`	float	0.5 - 1	Fraction of child entity volume that must be covered by parent volume. Values less than 0.5 are not allowed because they may lead to ambiguous assignments.
`constraints`	list		List of constraints to apply to entities
`contraints.constraint`	string	`maximum_child_count` `minimum_child_count` `child_must_have_parent` `parent_must_cover_child` `child_intersect_one_parent`	The name of a constraint function to use to detect problems in the parent and child DataFrames
`contraints.value`	Any	Any or null	The parameter passed to the constraint function to detect conflicts
`contraints.resolution`	string	`remove_child` `remove_parent` `create_parent` `create_child` `shrink_child`	The conflict resolution method to resolve the problem detected by the constraint

Example Usage and Outputs

The multiple entity types, relationships, constraints, resolution strategies and parameters all need to be specified and configured in the Segmentation Algorithm JSON file according to the valid operations and values previously mentioned. Specifically, the entity_type_relationships object in the segmentation algorithm file needs to be defined. An example of how to complete this is shown here:

...
"entity_type_relationships": {
    "parent_type": "cell",
    "child_type": "nuclei",
    "child_coverage_threshold": 0.5,
    "constraints": [
        {"constraint": "maximum_child_count",
        "value": 1,
        "resolution": "remove_child"
        },
        {"constraint": "minimum_child_count",
        "value": 1,
        "resolution": "create_child"
        },
        {"constraint": "child_must_have_parent",
        "value": null,
        "resolution": "create_parent"
        },
        {"constraint": "parent_must_cover_child",
        "value": null,
        "resolution": "shrink_child"
        },
        { "constraint": "child_intersect_one_parent",
        "value": null,
        "resolution": "shrink_child"
        },
        {"constraint": "maximum_child_count",
        "value": 1,
        "resolution": "remove_child"
        }
    ]
}

The output SegmentationResult object has a dataframe attribute that is a geopandas GeoDataFrame containing all of the multiple entity type relationships, which in turn gets saved as a Parquet file. A loaded example of this is provided here:

Note

Because the ParentID column of the SegmentationResult dataframe can contain integers and NoneTypes, to preserve the Int64 data type if the IDs, the boundary Parquet file should be read using the read_parquet() funtion within the vpt_core.io.input_tools module. Using read_parquet() within geopandas will truncate the number of unique IDs.

Once the user has created the micron-space parquet boundary file and entity by gene csv file for each entity type, they can run update-vzg to create a new vzg file with multiple entities embedded within. The user can now explore their data with multiple entity types in mind as seen here: