Model Card for ConfRover-base-20M-v1.0
ConfRover base model trained with forward simulation and iid sampling
Model Details
Model Description
ConfRover is a deep generative model for protein 3D conformation and motion dynamics. It leverages diffusion probability model to learn the distribution of protein 3D conformations and captures the their temporal dependencies between frames through temporal causal transformers. Models are trained using molecular dynamics (MD) trajectories data and can generate protein conformation ensembles and motion trajectories conditioned on the input protein amino acid sequence.
This version was trained with tasks of forward simulation and iid sampling at 1:1 ratio.
Basic info
| Model ID | ConfRover-base-20M-v1.0 |
|---|---|
| Variant | base |
| Size | 20M |
| Version | v1.0 |
| Recommend | For forward simulation and iid sampling tasks |
| License | Apache-2.0 |
Model Sources
- Repository: https://github.com/ByteDance-Seed/ConfRover
- Paper: https://arxiv.org/abs/2505.17478
- Website: https://ByteDance-Seed.github.io/ConfRover
How to Get Started with the Model
Use the code below to get started with the model.
from confrover import ConfRover
model = ConfRover.from_pretrained(<model_name>)
model.to("cuda")
model.generate(
case_id=<case_name>,
seqres=<amino_acid_sequence>,
output_dir=</path/to/save/output>,
task_mode=<"forward"|"iid"|"interp">,
n_replicates=<int>, # number of replicated trajectories (forward and interp) or total number of conformation samples (iid)
n_frames=<int>, # number of frames in the trajectory, including the conditioning frames.
stride_in_10ps=256, # time interval between frames in the unit of 10 ps.
conditions=..., # information for conditioning frames for forward simulation and interp. See `ConfRover.generate` for more details.
)
Technical Specifications
ConfRover contains encoder, temporal module, and diffusion decoder.
- The encoder maps the input amino acid sequence (through a folding model) and coordinates of context frames to a latent representation.
- The temporal module models the temporal dependencies between frames using an interleaving of causal transformers (across the temporal dimension) and pairformers (to update structures).
- The diffusion model learns the probability distribution of protein conformations and generates samples conditioned on the input sequence and conditioning representation.
Bias, Risks, and Limitations
ConfRover is trained on limited MD trajectories data and may not generalize well to out-of-distribution data. The quality of generated conformations is also limited by the quality of the input data and the computational resources. Currently, ConfRover only supports protein conformation generation and models the coordinates of heavy atoms.
Citation
@article{confrover2025,
title={Simultaneous Modeling of Protein Conformation and Dynamics via Autoregression},
author={Shen, Yuning and Wang, Lihao and Yuan, Huizhuo and Wang, Yan and Yang, Bangji and Gu, Quanquan},
journal={arXiv preprint arXiv:2505.17478},
year={2025}
}