Instructions to use nelson2424/distilroberta-base-finetuned-cot with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use nelson2424/distilroberta-base-finetuned-cot with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="nelson2424/distilroberta-base-finetuned-cot")# Load model directly from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("nelson2424/distilroberta-base-finetuned-cot") model = AutoModelForMaskedLM.from_pretrained("nelson2424/distilroberta-base-finetuned-cot") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use nelson2424/distilroberta-base-finetuned-cot with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "nelson2424/distilroberta-base-finetuned-cot" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nelson2424/distilroberta-base-finetuned-cot", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/nelson2424/distilroberta-base-finetuned-cot
- SGLang
How to use nelson2424/distilroberta-base-finetuned-cot with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "nelson2424/distilroberta-base-finetuned-cot" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nelson2424/distilroberta-base-finetuned-cot", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "nelson2424/distilroberta-base-finetuned-cot" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nelson2424/distilroberta-base-finetuned-cot", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use nelson2424/distilroberta-base-finetuned-cot with Docker Model Runner:
docker model run hf.co/nelson2424/distilroberta-base-finetuned-cot
This model was created to predict moves in the chess opening. The idea is to test the impact of modeling the game text differently and report the results. You can access the code for training here You can access the different model configurations and results here
Training process:
Training with V1_small dataset:
To understand the following discussion is important to check the structure of the nelson2424/Chess_openings_dataset dataset the V1_small version.
During the training process, multiple challenges arose. -The first problem was the low accuracy in the results the model was getting, to mitigate that problem, I tried the following:
learning rate: The first approach to solve this problem was to modify the learning rate.
A step further involved changing the lr scheduler from a linear configuration to a polynomial decay configuration. These changes did not have a significant effect on the accuracy.Probability of masked tokens: Decreasing the probability of the masked tokens in the dataset increased the accuracy but at the expense of the model having a weaker prediction capability. Having a low masked token probability will result in a model incapable of predicting correct moves on different openings.
Focus on predicting the moves: The current model tries to model the whole text that the V1_small version of the dataset provides, which includes
trying to predict parts of the board after a move or the name of the opening, as seen in the following example:<s>King's Indian <mask>: <mask> Variation, Debrecen Defense r n b q k b n r p p p p p p p p ........ ........ .. P..... ........ P P. P <mask> P P P R N B Q K B N R m:g8f6 <mask>:<mask>b<mask><mask> b q k b. r p p p p p p p p ..... n.. ........ .. P..... ........ P P. P P P P P R N B Q K B N R m:b1c3 <mask><mask><mask> <mask><mask> b q k b. r p p p p p p p p ..... n.. ........ .. P..... .. N..... P P. P P P P P R. B Q K B N'
After realizing that my model was not able to learn a complex enough function to correctly
model the problem at hand due to limited computational resources, I decided to narrow the scope of the problem.
Instead of trying to generate the whole context, the model would only learn to generate moves and the effect they have on the board based on a rich context.
This allowed the model to have a rich representation of the game and predict moves more accurately.
As a result, the data was modified only to mask move predictions and their corresponding effects on the board.
The data now looks as follows:<s>King's Indian Defense: Fianchetto Variation, Debrecen Defense r n b q k b n r p p p p p p p p ........ ........ .. P..... ........ P P. P P P P P R N B Q K B N R <mask><mask><mask><mask><mask><mask> <mask><mask><mask><mask><mask><mask> b q k b. r p p p p p p p p ..... n.. ........ .. P..... ........ P P. P P P P P R N B Q K B N R m:b1c3 <mask><mask><mask><mask><mask><mask> b q k b. r p p p p p p p p ..... n.. ........ .. P..... .. N..... P P. P P P P P R. B Q K B N'
- Downloads last month
- 9