YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

ESM2 Protein Model

This is the protein component of a jointly trained NT-ESM2 model pair for DNA-protein analysis.

Model Details

  • Model Type: ESM2 for protein sequences
  • Training: Jointly trained with NT DNA model
  • Architecture: Transformer-based language model for proteins

Usage

from transformers import AutoModel, AutoTokenizer

# Load model and tokenizer
model = AutoModel.from_pretrained("vsubasri/joint-nt-esm2-transcript-coding-protein")
tokenizer = AutoTokenizer.from_pretrained("vsubasri/joint-nt-esm2-transcript-coding-protein")

# Example usage
protein_sequence = "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG"
inputs = tokenizer(protein_sequence, return_tensors="pt")
outputs = model(**inputs)

Training Details

  • Jointly trained with DNA sequences for cross-modal understanding
  • Large model variant
  • Transcript-specific protein coding sequences

Files

  • config.json: Model configuration
  • model.safetensors: Model weights
  • tokenizer_config.json: Tokenizer configuration
  • vocab.txt: Vocabulary file
  • special_tokens_map.json: Special tokens mapping

Citation

If you use this model, please cite the original ESM2 paper and your joint training work.

Downloads last month
10
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support