HiTZ
/

BERnaT-base

@@ -7,7 +7,7 @@ license: apache-2.0
 Submitted to LREC 2026
-### Model Description
 BERnaT is a family of monolingual Basque encoder-only language models trained to better represent linguistic variation—including standard, dialectal, historical, and informal Basque—rather than focusing solely on standard textual corpora. Models were trained on corpora that combine high-quality standard Basque with varied sources such as social media and historical texts, aiming to enhance robustness and generalization across natural language understanding (NLU) tasks.
@@ -18,6 +18,38 @@ BERnaT is a family of monolingual Basque encoder-only language models trained to
 - **Languages**: Basque (Euskara)
 ## Training Data
 The BERnaT family was pre-trained on a combination of:

 Submitted to LREC 2026
+## Model Description
 BERnaT is a family of monolingual Basque encoder-only language models trained to better represent linguistic variation—including standard, dialectal, historical, and informal Basque—rather than focusing solely on standard textual corpora. Models were trained on corpora that combine high-quality standard Basque with varied sources such as social media and historical texts, aiming to enhance robustness and generalization across natural language understanding (NLU) tasks.
 - **Languages**: Basque (Euskara)
+## Getting Started
+You can either use this model directly as the example below, or fine-tune it to your task of interest.
+```python
+>>> from transformers import pipeline
+>>> pipe = pipeline("fill-mask", model='HiTZ/BERnaT-base')
+>>> pipe("Kaixo! Ni <mask> naiz!")
+[{'score': 0.022003261372447014,
+  'token': 7497,
+  'token_str': ' euskalduna',
+  'sequence': 'Kaixo! Ni euskalduna naiz!'},
+ {'score': 0.016429167240858078,
+  'token': 14067,
+  'token_str': ' Olentzero',
+  'sequence': 'Kaixo! Ni Olentzero naiz!'},
+ {'score': 0.012804778292775154,
+  'token': 31087,
+  'token_str': ' ahobizi',
+  'sequence': 'Kaixo! Ni ahobizi naiz!'},
+ {'score': 0.01173020526766777,
+  'token': 331,
+  'token_str': ' ez',
+  'sequence': 'Kaixo! Ni ez naiz!'},
+ {'score': 0.010091394186019897,
+  'token': 7618,
+  'token_str': ' irakaslea',
+  'sequence': 'Kaixo! Ni irakaslea naiz!'}]
+```
 ## Training Data
 The BERnaT family was pre-trained on a combination of: