# Reproducibility

In addition to the `cellSAM` library, the source repo contains additional
code to aid in reproducing the results in the publication.
This additional code for reproducibility can be found in the
[`paper_evaluation`][gh-eval] directory.

[gh-eval]: https://github.com/vanvalenlab/cellSAM/tree/master/paper_evaluation

Additional resources including pre-trained model weights and the evaluation dataset
are required for reproducibility.
All necessary components are available for download - see {doc}`API-key` for
details.

## Setup

In a new (empty) virtual environment, install cellSAM from the parent directory.

````{admonition} Example: creating an environment
:class: tip dropdown

Users are encouraged to use whichever environment management system with which they
are most comfortable (`uv`, `pixi`, `conda/mamba`, etc.)

For those unsure, Python's [built-in environment management module][python-venv] is
a simple, ubiquitous option. For example, to create and enter a new environment:

```bash
$ python3.XX -m venv cs-eval-env
$ source cs-eval-env/bin/activate
```

Where `XX` is the Python version you wish to use (e.g. `python3.13`).
You can then verify the newly created environment is empty (though `pip` should be available):

```bash
$ pip list
Package Version
------- -------
pip     24.3.1
```
````

[python-venv]: https://docs.python.org/3/library/venv.html

For example, from the `paper_evaluation` directory:

```bash
$ pip install ..
```

### Evaluation dependencies

Once in a "clean" environment, install the requirements for the evaluation suite:

```bash
$ pip install -r requirements.txt
```

```{note}
This may downgrade some of the dependencies (e.g. `torch`, `numpy`, etc.) installed
in the previous step.
```

### Evaluation models

The pretrained model weights necessary for reproducibility are available via the `get_model`
function:

```python
>>> from cellSAM import get_model
>>> get_model();
```

This will automatically download and unpack the latest version of the pretrained model weights.

```{admonition} Model versions
:class: note dropdown

You may use the `version=` keyword argument for `get_model` to specify a specific
model version for evaluation.

 - Version `1.2` is the version that was used to produce the published results in the paper
 - Version `1.2` is the *minimum* model version which is designed to work with the
   reproducibility workflow.
```

### Evaluation dataset

Make sure you have the evaluation dataset. This can be downloaded with:

```python
>>> from cellSAM import download_training_data
>>> download_training_data()
```

This will initiate the download of a compressed data archive.
The compressed data will be downloaded to
`$HOME/.deepcell/datasets/cellsam-data_v{X.Y}.tar.gz` where `X.Y` is the requested
dataset version.
See {doc}`API-key` for details.
Once the download is complete, unpack/inflate the dataset to a desired location.

````{admonition} Dataset Size
:class: caution dropdown

The compressed data archive is 14GB in size, and inflates to 84GB when uncompressed.
Therefore you may want to unpack the data to a different location.
Similarly, the decompression is comuptationally intensive, and may benefit from parallel
decompression algorithms.
Here's an example incantation which will store the unpacked dataset to `/data` using 8
threads for decompression:

```bash
$ tar --use-compress-program="unpigz -p 8" -xf $HOME/.deepcell/datasets/cellsam-data_v1.2.tar.gz -C /data
```

The unpacked data will then be available at `/data/cellsam_v1.2`.
````

## Running the evaluation

Once all of the above steps are complete, the evaluation can be run via the `all_run.sh`
shell script.
Before running, ensure that the variables at the top of the file reflect the locations of
the models/dataset on your system.
If you used the defaults in all the steps above (and unpacked the dataset in its
download location) this will already be the case.

```bash
$ ./all_runs.sh
```

The results of each run will be saved locally in a `summary.csv` that records the datset,
model used, and `f1_mean` for that run.

### Individual evaluations

It is not necessary to run the entire evaluation suite - evaluation can be limited to
specific datasets.
See the `all_runs.sh` for a general idea of how to do so via `eval_main.py`.