Fill-Mask
Transformers
Safetensors
roberta
OSainz commited on
Commit
bd1e162
·
verified ·
1 Parent(s): 429b80f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -1
README.md CHANGED
@@ -7,7 +7,7 @@ license: apache-2.0
7
 
8
  Submitted to LREC 2026
9
 
10
- ### Model Description
11
 
12
  BERnaT is a family of monolingual Basque encoder-only language models trained to better represent linguistic variation—including standard, dialectal, historical, and informal Basque—rather than focusing solely on standard textual corpora. Models were trained on corpora that combine high-quality standard Basque with varied sources such as social media and historical texts, aiming to enhance robustness and generalization across natural language understanding (NLU) tasks.
13
 
@@ -18,6 +18,38 @@ BERnaT is a family of monolingual Basque encoder-only language models trained to
18
  - **Languages**: Basque (Euskara)
19
 
20
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  ## Training Data
22
 
23
  The BERnaT family was pre-trained on a combination of:
 
7
 
8
  Submitted to LREC 2026
9
 
10
+ ## Model Description
11
 
12
  BERnaT is a family of monolingual Basque encoder-only language models trained to better represent linguistic variation—including standard, dialectal, historical, and informal Basque—rather than focusing solely on standard textual corpora. Models were trained on corpora that combine high-quality standard Basque with varied sources such as social media and historical texts, aiming to enhance robustness and generalization across natural language understanding (NLU) tasks.
13
 
 
18
  - **Languages**: Basque (Euskara)
19
 
20
 
21
+ ## Getting Started
22
+
23
+ You can either use this model directly as the example below, or fine-tune it to your task of interest.
24
+
25
+ ```python
26
+ >>> from transformers import pipeline
27
+
28
+ >>> pipe = pipeline("fill-mask", model='HiTZ/BERnaT-base')
29
+
30
+ >>> pipe("Kaixo! Ni <mask> naiz!")
31
+ [{'score': 0.022003261372447014,
32
+ 'token': 7497,
33
+ 'token_str': ' euskalduna',
34
+ 'sequence': 'Kaixo! Ni euskalduna naiz!'},
35
+ {'score': 0.016429167240858078,
36
+ 'token': 14067,
37
+ 'token_str': ' Olentzero',
38
+ 'sequence': 'Kaixo! Ni Olentzero naiz!'},
39
+ {'score': 0.012804778292775154,
40
+ 'token': 31087,
41
+ 'token_str': ' ahobizi',
42
+ 'sequence': 'Kaixo! Ni ahobizi naiz!'},
43
+ {'score': 0.01173020526766777,
44
+ 'token': 331,
45
+ 'token_str': ' ez',
46
+ 'sequence': 'Kaixo! Ni ez naiz!'},
47
+ {'score': 0.010091394186019897,
48
+ 'token': 7618,
49
+ 'token_str': ' irakaslea',
50
+ 'sequence': 'Kaixo! Ni irakaslea naiz!'}]
51
+ ```
52
+
53
  ## Training Data
54
 
55
  The BERnaT family was pre-trained on a combination of: