Spaces:
Running
Running
Update app.py
Browse files
app.py
CHANGED
|
@@ -18,7 +18,8 @@ article = r"""
|
|
| 18 |
π **Citation**
|
| 19 |
<br>
|
| 20 |
If our work is helpful for your research or applications, please cite us via:
|
| 21 |
-
```
|
|
|
|
| 22 |
@article{toker2024diffusion,
|
| 23 |
title={Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines},
|
| 24 |
author={Toker, Michael and Orgad, Hadas and Ventura, Mor and Arad, Dana and Belinkov, Yonatan},
|
|
@@ -26,12 +27,27 @@ If our work is helpful for your research or applications, please cite us via:
|
|
| 26 |
year={2024}
|
| 27 |
}
|
| 28 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
π§ **Contact**
|
| 30 |
<br>
|
| 31 |
-
If you have any questions, please feel free to open an issue or directly reach us out at <b>tok@cs.
|
|
|
|
| 32 |
"""
|
| 33 |
|
| 34 |
|
|
|
|
|
|
|
| 35 |
model_num_of_layers = {
|
| 36 |
'Stable Diffusion 1.4': 12,
|
| 37 |
'Stable Diffusion 2.1': 22,
|
|
|
|
| 18 |
π **Citation**
|
| 19 |
<br>
|
| 20 |
If our work is helpful for your research or applications, please cite us via:
|
| 21 |
+
```
|
| 22 |
+
bibtex
|
| 23 |
@article{toker2024diffusion,
|
| 24 |
title={Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines},
|
| 25 |
author={Toker, Michael and Orgad, Hadas and Ventura, Mor and Arad, Dana and Belinkov, Yonatan},
|
|
|
|
| 27 |
year={2024}
|
| 28 |
}
|
| 29 |
```
|
| 30 |
+
π§ **Abstact**
|
| 31 |
+
<br>
|
| 32 |
+
Text-to-image diffusion models (T2I) use a latent representation of a text prompt to guide the image generation process.
|
| 33 |
+
However, the process by which the encoder produces the text representation is unknown.
|
| 34 |
+
We propose the Diffusion Lens, a method for analyzing the text encoder of T2I models by generating images from its intermediate representations.
|
| 35 |
+
Using the Diffusion Lens, we perform an extensive analysis of two recent T2I models.
|
| 36 |
+
Exploring compound prompts, we find that complex scenes describing multiple objects are composed progressively and more slowly compared to simple scenes;
|
| 37 |
+
Exploring knowledge retrieval, we find that representation of uncommon concepts requires further computation compared to common concepts,
|
| 38 |
+
and that knowledge retrieval is gradual across layers.
|
| 39 |
+
Overall, our findings provide valuable insights into the text encoder component in T2I pipelines.
|
| 40 |
+
<br>
|
| 41 |
+
```
|
| 42 |
π§ **Contact**
|
| 43 |
<br>
|
| 44 |
+
If you have any questions, please feel free to open an issue or directly reach us out at <b>tok@cs.technion.ac.il
|
| 45 |
+
</b>.
|
| 46 |
"""
|
| 47 |
|
| 48 |
|
| 49 |
+
|
| 50 |
+
|
| 51 |
model_num_of_layers = {
|
| 52 |
'Stable Diffusion 1.4': 12,
|
| 53 |
'Stable Diffusion 2.1': 22,
|