SsharvienKumar
/

SurGrID

Model card Files Files and versions

xet

Community

SsharvienKumar commited on Jun 11, 2025

Commit

c721c80

verified ·

1 Parent(s): 9503454

Update README.md

Browse files

Files changed (1) hide show

README.md +158 -3

README.md CHANGED Viewed

@@ -1,3 +1,158 @@
----
-license: cc-by-4.0
----

+---
+license: cc-by-4.0
+---
+<div id="top" align="center">
+# SurGrID: Controllable Surgical Simulation via Scene Graph to Image Diffusion (IPCAI 2025)
+  [![arXiv](https://img.shields.io/badge/arXiv-2502.07945-b31b1b.svg)](https://arxiv.org/abs/2502.07945)
+  [![Paper](https://img.shields.io/badge/Paper-Visit-blue)](https://rdcu.be/em4E2)
+  [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/SsharvienKumar/SurGrID)
+</div>
+## 💡Key Features
+- We show that SGs can encode surgical scenes in a human-readable format.
+- We propose a novel pre-training step that encodes global and local information from (image, mask, SG) triplets. The learned embeddings are employed to condition graph to image diffusion for high-quality and precisely controllable surgical simulation.
+- We evaluate our generative approach on scenes from cataract surgeries using quantitative fidelity and diversity measurements, followed by an extensive user study
+involving clinical experts
+## 🛠 Setup
+```bash
+git clone https://github.com/MECLabTUDA/SurGrID.git
+cd SurGrID
+conda create -n surgrid python=3.8.5 pip=20.3.3
+conda activate surgrid
+pip install torch==2.0.1 torchvision==0.15.2 --index-url https://download.pytorch.org/whl/cu118
+pip install -r requirements.txt
+```
+## 🏁 Model Checkpoints and Dataset
+Download the checkpoints of all the necessary models from the provided sources and place them in `[results](./results)`. We also provide the processed CADIS dataset, containing images, segmentation masks and their scene graphs. Update the paths of the dataset in `[configs](./configs)`.
+- `Checkpoints`: [VQGANs, GraphEncoders, Diffusion Model](https://huggingface.co/SsharvienKumar/SurGrID/tree/main/checkpoints)
+- `Processed Dataset`: [CADIS](https://huggingface.co/SsharvienKumar/SurGrID/tree/main/dataset)
+## 💥 Sampling SurGrID
+```bash
+python script/sampler_diffusion.py --conf configs/eval/eval_combined_emb.yaml
+```
+## ⏳ Training SurGrID
+**Step 1:** Train Separate VQGAN for Image and Segmentation
+```bash
+python surgrid/taming/main.py --base configs/vqgan/vqgan_image_cadis.yaml -t --gpus 0,
+python surgrid/taming/main.py --base configs/vqgan/vqgan_segmentation_cadis.yaml -t --gpus 0,
+```
+**Step 2:** Train Both Graph Encoder
+```bash
+python script/trainer_graph.py --mode masked --conf configs/graph/graph_cadis.yaml
+python script/trainer_graph.py --mode segclip --conf configs/graph/graph_cadis.yaml
+```
+**Step 3:** Train Diffusion Model
+```bash
+python script/trainer_diffusion.py --conf configs/trainer/combined_emb.yaml
+```
+## 🔄 Training SurGrID on a New Dataset
+The files below needs to be adapted:
+- [Configs](./configs)
+- [SurGrID Dataset](./surgrid/dataset/cadis_dataset.py)
+- [VQGAN Dataset](./surgrid/taming/taming/data/cadis.py)
+- [CADIS Specifications in Graph Encoder Pre-training](./surgrid/graph/graph_masked_segclip.py)
+## 🥼 Clinical Expert Assesment
+```bash
+python script/demo_surgrid.py --conf configs/trainer/combined_emb.yaml
+```
+Our demo GUI allows for loading ground-truth graphs along with the ground-truth image. The graph’s nodes can be moved, deleted, or have their class changed. We instruct our participants to load four different ground-truth graphs and sequentially perform the following actions on each. They are requested to score the samples’ realism and coherence with the graph input using a Likert scale of 1 to 7:
+- First, participants are instructed to generate a batch of four samples from the groundtruth SG without modifications.
+- Second, the participants are requested to spatially move nodes in the canvas and again judge the synthesised samples.
+- Third, participants change the class of one of the instrument nodes and judge the generated images.
+- Lastly, participants are instructed to remove one of the instruments or miscellaneous classes and judge the synthesised image a final time.
+<table>
+  <thead>
+    <tr>
+      <th rowspan="2">Clinician</th>
+      <th colspan="2">Synthesisation from GT</th>
+      <th colspan="2">Spatial Modification</th>
+      <th colspan="2">Tool Modification</th>
+      <th colspan="2">Tool Removal</th>
+    </tr>
+    <tr>
+      <th>Realism</th>
+      <th>Coherence</th>
+      <th>Realism</th>
+      <th>Coherence</th>
+      <th>Realism</th>
+      <th>Coherence</th>
+      <th>Realism</th>
+      <th>Coherence</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td>P1</td>
+      <td>6.5±0.5</td>
+      <td>6.5±1.0</td>
+      <td>6.3±0.9</td>
+      <td>6.3±0.9</td>
+      <td>5.3±1.2</td>
+      <td>4.5±1.9</td>
+      <td>6.3±0.9</td>
+      <td>5.5±2.3</td>
+    </tr>
+    <tr>
+      <td>P2</td>
+      <td>5.3±0.9</td>
+      <td>5.3±0.5</td>
+      <td>4.5±0.5</td>
+      <td>4.3±2.0</td>
+      <td>5.3±0.9</td>
+      <td>5.8±0.9</td>
+      <td>5.5±1.2</td>
+      <td>5.5±1.9</td>
+    </tr>
+    <tr>
+      <td>P3</td>
+      <td>6.3±0.9</td>
+      <td>6.3±0.9</td>
+      <td>6.5±1.0</td>
+      <td>5.5±0.5</td>
+      <td>6.0±0.8</td>
+      <td>6.8±0.5</td>
+      <td>6.3±0.5</td>
+      <td>6.5±0.5</td>
+    </tr>
+  </tbody>
+</table>
+## 📜 Citations
+If you are using SurGrID for your paper, please cite the following paper:
+```
+@article{frisch2025surgrid,
+  title={SurGrID: Controllable Surgical Simulation via Scene Graph to Image Diffusion},
+  author={Frisch, Yannik and Sivakumar, Ssharvien Kumar and K{\"o}ksal, {\c{C}}a{\u{g}}han and B{\"o}hm, Elsa and Wagner, Felix and Gericke, Adrian and Ghazaei, Ghazal and Mukhopadhyay, Anirban},
+  journal={arXiv preprint arXiv:2502.07945},
+  year={2025}
+}
+```
+## ⭐ Acknowledgement
+Thanks for the following projects and theoretical works that we have either used or inspired from:
+- [VQGAN](https://github.com/CompVis/taming-transformers)
+- [Lucidrains' DDPM](https://github.com/lucidrains/denoising-diffusion-pytorch)
+- [SGDiff](https://github.com/YangLing0818/SGDiff)
+- [Endora's README](https://github.com/CUHK-AIM-Group/Endora)