Update README.md
Browse files
README.md
CHANGED
|
@@ -73,7 +73,7 @@ The model was fine-tuned on the **MP Bandgap** dataset, a subset of the Material
|
|
| 73 |
|
| 74 |
### Training Procedure
|
| 75 |
|
| 76 |
-
- **Architecture:** GPT-2 Small
|
| 77 |
- **Mechanism:** Continuous property values are projected into the attention mechanism's key-value space (Prefix Tuning), allowing the model to attend to the target properties at every generation step.
|
| 78 |
- **Optimization:** A dual optimization strategy was employed, using a lower learning rate for the pre-trained backbone and a higher learning rate for the condition encoder to prevent catastrophic forgetting.
|
| 79 |
|
|
|
|
| 73 |
|
| 74 |
### Training Procedure
|
| 75 |
|
| 76 |
+
- **Architecture:** GPT-2 Small with additional Property-Key-Value (PKV) encoder layers. (~61.6M parameters)
|
| 77 |
- **Mechanism:** Continuous property values are projected into the attention mechanism's key-value space (Prefix Tuning), allowing the model to attend to the target properties at every generation step.
|
| 78 |
- **Optimization:** A dual optimization strategy was employed, using a lower learning rate for the pre-trained backbone and a higher learning rate for the condition encoder to prevent catastrophic forgetting.
|
| 79 |
|