Fill-Mask
Transformers
Safetensors
roberta
OSainz commited on
Commit
ded50f8
·
verified ·
1 Parent(s): 012f2ad

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -1
README.md CHANGED
@@ -7,7 +7,7 @@ license: apache-2.0
7
 
8
  Submitted to LREC 2026
9
 
10
- ### Model Description
11
 
12
  BERnaT is a family of monolingual Basque encoder-only language models trained to better represent linguistic variation—including standard, dialectal, historical, and informal Basque—rather than focusing solely on standard textual corpora. Models were trained on corpora that combine high-quality standard Basque with varied sources such as social media and historical texts, aiming to enhance robustness and generalization across natural language understanding (NLU) tasks.
13
 
@@ -18,6 +18,36 @@ BERnaT is a family of monolingual Basque encoder-only language models trained to
18
  - **Languages**: Basque (Euskara)
19
 
20
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  ## Training Data
22
 
23
  The BERnaT family was pre-trained on a combination of:
 
7
 
8
  Submitted to LREC 2026
9
 
10
+ ## Model Description
11
 
12
  BERnaT is a family of monolingual Basque encoder-only language models trained to better represent linguistic variation—including standard, dialectal, historical, and informal Basque—rather than focusing solely on standard textual corpora. Models were trained on corpora that combine high-quality standard Basque with varied sources such as social media and historical texts, aiming to enhance robustness and generalization across natural language understanding (NLU) tasks.
13
 
 
18
  - **Languages**: Basque (Euskara)
19
 
20
 
21
+ ## Getting Started
22
+
23
+ You can either use this model directly as the example below, or fine-tune it to your task of interest.
24
+
25
+ ```python
26
+ >>> from transformers import pipeline
27
+ >>> pipe = pipeline("fill-mask", model='HiTZ/BERnaT-base')
28
+ >>> pipe("Kaixo! Ni <mask> naiz!")
29
+ [{'score': 0.022003261372447014,
30
+ 'token': 7497,
31
+ 'token_str': ' euskalduna',
32
+ 'sequence': 'Kaixo! Ni euskalduna naiz!'},
33
+ {'score': 0.016429167240858078,
34
+ 'token': 14067,
35
+ 'token_str': ' Olentzero',
36
+ 'sequence': 'Kaixo! Ni Olentzero naiz!'},
37
+ {'score': 0.012804778292775154,
38
+ 'token': 31087,
39
+ 'token_str': ' ahobizi',
40
+ 'sequence': 'Kaixo! Ni ahobizi naiz!'},
41
+ {'score': 0.01173020526766777,
42
+ 'token': 331,
43
+ 'token_str': ' ez',
44
+ 'sequence': 'Kaixo! Ni ez naiz!'},
45
+ {'score': 0.010091394186019897,
46
+ 'token': 7618,
47
+ 'token_str': ' irakaslea',
48
+ 'sequence': 'Kaixo! Ni irakaslea naiz!'}]
49
+ ```
50
+
51
  ## Training Data
52
 
53
  The BERnaT family was pre-trained on a combination of: