--- jupytext: text_representation: extension: .md format_name: myst format_version: 0.13 jupytext_version: 1.17.2 kernelspec: display_name: Python 3 (ipykernel) language: python name: python3 --- # Tutorial The `deepcell-types` model predicts cell types from multiplexed spatial proteomic images. There are three main inputs to the cell type prediction pipeline: 1. The multiplexed image in channels-first format 2. A whole-cell segmentation mask for the image, and 3. A mapping of the channel index to the marker expression name. Each of these components will be covered in further detail in this tutorial. ## Example datasets This tutorial will make use of the spatial proteomic data available on the [HuBMAP data portal][hubmap-data-portal]. Users are encouraged to explore the portal for data of interest. For convenience, a subset of the publicly-available spatial proteomic data has been converted to a remote [zarr archive][zarr]. The datasets in the zarr archive reflect the original HuBMAP indexing scheme (i.e. `HBM###_????_###`, where `#` indicates a number nad `?` indicates an upper-case alphabetical character). Interacting with the zarr hubmap data mirror requires a few additional dependencies: ```bash pip install zarr\>2 s3fs rich ``` ```{note} The hubmap data mirror uses zarr format v3, thus requires `zarr>2` to be installed. ``` ### Exploring the archive ```{code-cell} ipython3 import zarr z = zarr.open_group( store="s3://deepcelltypes-demo-datasets/hubmap.zarr", mode="r", storage_options={ "anon": True, "client_kwargs": dict(region_name="us-east-1"), }, ) ``` High-level structure of the data archive: ```{code-cell} ipython3 z.tree() ``` A more detailed look at the datasets: ```{code-cell} ipython3 import pandas as pd # for nice html rendering summary = pd.DataFrame.from_dict( {k: { "tissue": z[k].attrs["tissue"], "technology": z[k].attrs["modality"], "Num ch.": z[k]["image"].shape[0], "shape": z[k]["image"].shape[1:], } for k in z.group_keys() }, orient="index", ) summary.sort_index() ``` In the interest of minimizing network bandwidth, we'll use the `HBM994_PDJN_987` dataset to demonstrate the deepcell-types inference pipeline. ```{code-cell} ipython3 k = "HBM994_PDJN_987" ``` ### Dataset anatomy As noted above, the cell-type prediction pipeline requires the multiplexed image, the channel-name mapping, and a segmentation mask for the image. The multiplexed image is stored in the `image` array for each dataset, and the channel mapping is stored under the key `"channels"` in the image metadata. Note that these two inputs are derived directly from the corresponding datasets on the HuBMAP data portal. ```{code-cell} ipython3 ds = z[k] img = ds["image"][:] # Load data into memory chnames = ds["image"].attrs.get("channels") # Sanity check: ensure that channel name list is the same size as the number of # channels in the image len(chnames) == img.shape[0] ``` Another bit of metadata that is useful (when available) is the pixel size of the image, in microns-per-pixel. While not strictly required, this can improve predictions by tamping down variability in image scaling. This information is stored in the dataset metadata. ```{code-cell} ipython3 mpp = ds["image"].attrs["mpp"] mpp ``` ## Running the cell-type prediction pipeline ```{note} Both `cellSAM` and `deepcell-types` models can in principle be run on CPUs, but it is strongly recommended that users make use of GPU-capable machines when running cell segmentation/cell-type prediction workflows. ``` The final input is a segmentation mask. `deepcell-types` has been intentionally designed for flexibility on this front to better integrate into existing spatial-omics workflows. However, for convenience, several pre-computed segmentation masks are stored in the data archive: one computed by [Mesmer](https://www.nature.com/articles/s41587-021-01094-0) (available at `ds["segmentations/torch_mesmer"]`) and a second by [CellSAM](https://www.biorxiv.org/content/10.1101/2023.11.17.567630v4) (available at `ds["segmentations/cellsam"]`). In this tutorial, we will demonstrate how to use one of these models to construct a full cell-type inference pipeline. ### Cell segmentation with `cellSAM` In order to use `cellSAM`, it must be installed in the environment, e.g. ```bash pip install git+https://github.com/vanvalenlab/cellSAM.git ``` ```{code-cell} ipython3 import numpy as np from cellSAM.cellsam_pipeline import cellsam_pipeline ``` For convenience, channels corresponding to nuclear markers and a whole-cell marker are stored in the dataset metadata. ```{note} Nuclear markers are typically unambiguous. The whole-cell channel selection on the other hand is less well-defined. Users are encouraged to try different channels or combinations of channels for improved whole-cell segmentation results. The `membrane_channel` selection in the metadata is arbitrary and provided for convenience. ``` ```{code-cell} ipython3 # Extract channels for segmentation nuc, mem = ds.attrs["nuclear_channel"], ds.attrs["membrane_channel"] im = np.stack( [img[chnames.index(nuc)], img[chnames.index(mem)]], axis=-1, ).squeeze() ``` CellSAM expects multiplexed data in a particular format. See the [cellsam docs][cellsam_ref] for details. ```{code-cell} ipython3 # Format for cellsam seg_img = np.zeros((*im.shape[:-1], 3), dtype=im.dtype) seg_img[..., 1:] = im ``` Finally, run the segmentation pipeline: ```{code-cell} ipython3 :tags: [hide-output] mask = cellsam_pipeline( seg_img, block_size=512, low_contrast_enhancement=False, use_wsi=True, gauge_cell_size=False, ) ``` ```{code-cell} ipython3 # Sanity check: the segmentation mask should have the same W, H dimensions as # the input image mask.shape == img.shape[1:] ``` Let's perform a bit of post-processing to ensure that the segmentation mask (represented as a label image) is sequential. ```{code-cell} ipython3 import skimage mask, _, _ = skimage.segmentation.relabel_sequential(mask) mask = mask.astype(np.uint32) ``` ### Visualizing results ```{note} Multiplexed images and their analysis products are extremely information dense; users are strongly recommended to run tutorials locally to leverage `napari` for interactive visualization. ``` ```{code-cell} ipython3 import napari nim = napari.Viewer(show=True) # Headless for CI; set show=True for interactive viz # Compute contrast limits cl = [(np.min(ch), np.max(ch)) for ch in img] # Visualize multiplex image nim.add_image(img, channel_axis=0, name=chnames, contrast_limits=cl); # Add segmentation mask mask_lyr = nim.add_labels(mask, name="CellSAM segmentation") mask_lyr.contour = 3 # Relatively thick borders for static viz ``` ```{code-cell} ipython3 :tags: [hide-cell] # For static rendering - can safely be ignored if running notebook interactively from pathlib import Path screenshot_path = Path("../_static/_generated") screenshot_path.mkdir(parents=True, exist_ok=True) nim.screenshot( path=screenshot_path / "napari_img_and_segmentation.png", canvas_only=False, ); ```