SsharvienKumar commited on
Commit
c721c80
·
verified ·
1 Parent(s): 9503454

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +158 -3
README.md CHANGED
@@ -1,3 +1,158 @@
1
- ---
2
- license: cc-by-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-4.0
3
+ ---
4
+ <div id="top" align="center">
5
+
6
+ # SurGrID: Controllable Surgical Simulation via Scene Graph to Image Diffusion (IPCAI 2025)
7
+
8
+ [![arXiv](https://img.shields.io/badge/arXiv-2502.07945-b31b1b.svg)](https://arxiv.org/abs/2502.07945)
9
+ [![Paper](https://img.shields.io/badge/Paper-Visit-blue)](https://rdcu.be/em4E2)
10
+ [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/SsharvienKumar/SurGrID)
11
+
12
+ </div>
13
+
14
+
15
+ ## 💡Key Features
16
+ - We show that SGs can encode surgical scenes in a human-readable format.
17
+ - We propose a novel pre-training step that encodes global and local information from (image, mask, SG) triplets. The learned embeddings are employed to condition graph to image diffusion for high-quality and precisely controllable surgical simulation.
18
+ - We evaluate our generative approach on scenes from cataract surgeries using quantitative fidelity and diversity measurements, followed by an extensive user study
19
+ involving clinical experts
20
+
21
+
22
+ ## 🛠 Setup
23
+ ```bash
24
+ git clone https://github.com/MECLabTUDA/SurGrID.git
25
+ cd SurGrID
26
+ conda create -n surgrid python=3.8.5 pip=20.3.3
27
+ conda activate surgrid
28
+
29
+ pip install torch==2.0.1 torchvision==0.15.2 --index-url https://download.pytorch.org/whl/cu118
30
+ pip install -r requirements.txt
31
+ ```
32
+
33
+ ## 🏁 Model Checkpoints and Dataset
34
+ Download the checkpoints of all the necessary models from the provided sources and place them in `[results](./results)`. We also provide the processed CADIS dataset, containing images, segmentation masks and their scene graphs. Update the paths of the dataset in `[configs](./configs)`.
35
+ - `Checkpoints`: [VQGANs, GraphEncoders, Diffusion Model](https://huggingface.co/SsharvienKumar/SurGrID/tree/main/checkpoints)
36
+ - `Processed Dataset`: [CADIS](https://huggingface.co/SsharvienKumar/SurGrID/tree/main/dataset)
37
+
38
+
39
+ ## 💥 Sampling SurGrID
40
+ ```bash
41
+ python script/sampler_diffusion.py --conf configs/eval/eval_combined_emb.yaml
42
+ ```
43
+
44
+
45
+ ## ⏳ Training SurGrID
46
+ **Step 1:** Train Separate VQGAN for Image and Segmentation
47
+ ```bash
48
+ python surgrid/taming/main.py --base configs/vqgan/vqgan_image_cadis.yaml -t --gpus 0,
49
+ python surgrid/taming/main.py --base configs/vqgan/vqgan_segmentation_cadis.yaml -t --gpus 0,
50
+ ```
51
+
52
+ **Step 2:** Train Both Graph Encoder
53
+ ```bash
54
+ python script/trainer_graph.py --mode masked --conf configs/graph/graph_cadis.yaml
55
+ python script/trainer_graph.py --mode segclip --conf configs/graph/graph_cadis.yaml
56
+ ```
57
+
58
+ **Step 3:** Train Diffusion Model
59
+ ```bash
60
+ python script/trainer_diffusion.py --conf configs/trainer/combined_emb.yaml
61
+ ```
62
+
63
+
64
+ ## 🔄 Training SurGrID on a New Dataset
65
+ The files below needs to be adapted:
66
+ - [Configs](./configs)
67
+ - [SurGrID Dataset](./surgrid/dataset/cadis_dataset.py)
68
+ - [VQGAN Dataset](./surgrid/taming/taming/data/cadis.py)
69
+ - [CADIS Specifications in Graph Encoder Pre-training](./surgrid/graph/graph_masked_segclip.py)
70
+
71
+
72
+ ## 🥼 Clinical Expert Assesment
73
+ ```bash
74
+ python script/demo_surgrid.py --conf configs/trainer/combined_emb.yaml
75
+ ```
76
+ Our demo GUI allows for loading ground-truth graphs along with the ground-truth image. The graph’s nodes can be moved, deleted, or have their class changed. We instruct our participants to load four different ground-truth graphs and sequentially perform the following actions on each. They are requested to score the samples’ realism and coherence with the graph input using a Likert scale of 1 to 7:
77
+
78
+ - First, participants are instructed to generate a batch of four samples from the groundtruth SG without modifications.
79
+ - Second, the participants are requested to spatially move nodes in the canvas and again judge the synthesised samples.
80
+ - Third, participants change the class of one of the instrument nodes and judge the generated images.
81
+ - Lastly, participants are instructed to remove one of the instruments or miscellaneous classes and judge the synthesised image a final time.
82
+
83
+ <table>
84
+ <thead>
85
+ <tr>
86
+ <th rowspan="2">Clinician</th>
87
+ <th colspan="2">Synthesisation from GT</th>
88
+ <th colspan="2">Spatial Modification</th>
89
+ <th colspan="2">Tool Modification</th>
90
+ <th colspan="2">Tool Removal</th>
91
+ </tr>
92
+ <tr>
93
+ <th>Realism</th>
94
+ <th>Coherence</th>
95
+ <th>Realism</th>
96
+ <th>Coherence</th>
97
+ <th>Realism</th>
98
+ <th>Coherence</th>
99
+ <th>Realism</th>
100
+ <th>Coherence</th>
101
+ </tr>
102
+ </thead>
103
+ <tbody>
104
+ <tr>
105
+ <td>P1</td>
106
+ <td>6.5±0.5</td>
107
+ <td>6.5±1.0</td>
108
+ <td>6.3±0.9</td>
109
+ <td>6.3±0.9</td>
110
+ <td>5.3±1.2</td>
111
+ <td>4.5±1.9</td>
112
+ <td>6.3±0.9</td>
113
+ <td>5.5±2.3</td>
114
+ </tr>
115
+ <tr>
116
+ <td>P2</td>
117
+ <td>5.3±0.9</td>
118
+ <td>5.3±0.5</td>
119
+ <td>4.5±0.5</td>
120
+ <td>4.3±2.0</td>
121
+ <td>5.3±0.9</td>
122
+ <td>5.8±0.9</td>
123
+ <td>5.5±1.2</td>
124
+ <td>5.5±1.9</td>
125
+ </tr>
126
+ <tr>
127
+ <td>P3</td>
128
+ <td>6.3±0.9</td>
129
+ <td>6.3±0.9</td>
130
+ <td>6.5±1.0</td>
131
+ <td>5.5±0.5</td>
132
+ <td>6.0±0.8</td>
133
+ <td>6.8±0.5</td>
134
+ <td>6.3±0.5</td>
135
+ <td>6.5±0.5</td>
136
+ </tr>
137
+ </tbody>
138
+ </table>
139
+
140
+
141
+ ## 📜 Citations
142
+ If you are using SurGrID for your paper, please cite the following paper:
143
+ ```
144
+ @article{frisch2025surgrid,
145
+ title={SurGrID: Controllable Surgical Simulation via Scene Graph to Image Diffusion},
146
+ author={Frisch, Yannik and Sivakumar, Ssharvien Kumar and K{\"o}ksal, {\c{C}}a{\u{g}}han and B{\"o}hm, Elsa and Wagner, Felix and Gericke, Adrian and Ghazaei, Ghazal and Mukhopadhyay, Anirban},
147
+ journal={arXiv preprint arXiv:2502.07945},
148
+ year={2025}
149
+ }
150
+ ```
151
+
152
+
153
+ ## ⭐ Acknowledgement
154
+ Thanks for the following projects and theoretical works that we have either used or inspired from:
155
+ - [VQGAN](https://github.com/CompVis/taming-transformers)
156
+ - [Lucidrains' DDPM](https://github.com/lucidrains/denoising-diffusion-pytorch)
157
+ - [SGDiff](https://github.com/YangLing0818/SGDiff)
158
+ - [Endora's README](https://github.com/CUHK-AIM-Group/Endora)