Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper
•
1908.10084
•
Published
•
9
This is a sentence-transformers model finetuned from sentence-transformers/all-mpnet-base-v2. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("s2593817/sft-question-embedding")
# Run inference
sentences = [
'How many total tours were there for each ranking date?',
'How many total pounds were purchased in the year 2018 at all London branches?',
'What is the carrier of the most expensive phone?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
sentence1, sentence2, and score| sentence1 | sentence2 | score | |
|---|---|---|---|
| type | string | string | int |
| details |
|
|
|
| sentence1 | sentence2 | score |
|---|---|---|
How many singers do we have? |
How many aircrafts do we have? |
1 |
What is the total number of singers? |
What is the total number of students? |
1 |
Show name, country, age for all singers ordered by age from the oldest to the youngest. |
List all people names in the order of their date of birth from old to young. |
1 |
CoSENTLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "pairwise_cos_sim"
}
per_device_train_batch_size: 160learning_rate: 2e-05num_train_epochs: 100warmup_ratio: 0.2fp16: Truedataloader_num_workers: 16batch_sampler: no_duplicatesoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 160per_device_eval_batch_size: 8per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonelearning_rate: 2e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 100max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.2warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 16dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Falsehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseeval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falsebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportional| Epoch | Step | Training Loss |
|---|---|---|
| 1.6949 | 100 | 9.4942 |
| 2.4407 | 200 | 8.3205 |
| 3.1864 | 300 | 6.3257 |
| 3.9322 | 400 | 4.7354 |
| 4.6780 | 500 | 3.6898 |
| 5.4237 | 600 | 3.3736 |
| 6.1695 | 700 | 3.0906 |
| 7.8644 | 800 | 3.1459 |
| 8.6102 | 900 | 3.4447 |
| 9.3559 | 1000 | 3.219 |
| 10.1017 | 1100 | 2.9808 |
| 10.8475 | 1200 | 2.505 |
| 11.5932 | 1300 | 2.0372 |
| 12.3390 | 1400 | 1.8879 |
| 13.0847 | 1500 | 1.8852 |
| 14.7797 | 1600 | 2.1867 |
| 15.5254 | 1700 | 2.0583 |
| 16.2712 | 1800 | 2.0132 |
| 17.0169 | 1900 | 1.8906 |
| 17.7627 | 2000 | 1.4556 |
| 18.5085 | 2100 | 1.2575 |
| 19.2542 | 2200 | 1.258 |
| 20.9492 | 2300 | 0.9423 |
| 21.6949 | 2400 | 1.398 |
| 22.4407 | 2500 | 1.2811 |
| 23.1864 | 2600 | 1.2602 |
| 23.9322 | 2700 | 1.2178 |
| 24.6780 | 2800 | 1.0895 |
| 25.4237 | 2900 | 0.9186 |
| 26.1695 | 3000 | 0.7916 |
| 27.8644 | 3100 | 0.7777 |
| 28.6102 | 3200 | 1.0487 |
| 29.3559 | 3300 | 0.9255 |
| 30.1017 | 3400 | 0.9655 |
| 30.8475 | 3500 | 0.897 |
| 31.5932 | 3600 | 0.7444 |
| 32.3390 | 3700 | 0.6445 |
| 33.0847 | 3800 | 0.5025 |
| 34.7797 | 3900 | 0.681 |
| 35.5254 | 4000 | 0.9227 |
| 36.2712 | 4100 | 0.8631 |
| 37.0169 | 4200 | 0.8573 |
| 37.7627 | 4300 | 0.9496 |
| 38.5085 | 4400 | 0.7243 |
| 39.2542 | 4500 | 0.7024 |
| 40.9492 | 4600 | 0.4793 |
| 41.6949 | 4700 | 0.8076 |
| 42.4407 | 4800 | 0.825 |
| 43.1864 | 4900 | 0.7553 |
| 43.9322 | 5000 | 0.6861 |
| 44.6780 | 5100 | 0.6589 |
| 45.4237 | 5200 | 0.5023 |
| 46.1695 | 5300 | 0.4013 |
| 47.8644 | 5400 | 0.4524 |
| 48.6102 | 5500 | 0.5891 |
| 49.3559 | 5600 | 0.5765 |
| 50.1017 | 5700 | 0.5708 |
| 50.8475 | 5800 | 0.479 |
| 51.5932 | 5900 | 0.4671 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@online{kexuefm-8847,
title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
author={Su Jianlin},
year={2022},
month={Jan},
url={https://kexue.fm/archives/8847},
}
Base model
sentence-transformers/all-mpnet-base-v2