SerialKicked
/

ModelTestingBed

Model card Files Files and versions

SerialKicked commited on Jun 8, 2024

Commit

8615360

·

verified ·

1 Parent(s): 1da6a85

Update README.md

Files changed (1) hide show

README.md +5 -4

README.md CHANGED Viewed

@@ -18,16 +18,17 @@ Simply put, I'm making my methodology to evaluate RP models public. While none o
 - Frontend is staging version of Silly Tavern.
 - Backend is the latest version of KoboldCPP for Windows using CUDA 12.
 - Using **CuBLAS** but **not using QuantMatMul (mmq)**.
 - **7-10B Models:**
   - All models are loaded in Q8_0 (GGUF)
-  - All models are extended to **16K context length** (auto rope from KCPP)
   - **Flash Attention** and **ContextShift** enabled.
 - **11-15B Models:**
   - All models are loaded in Q4_KM or whatever is the highest/closest available (GGUF)
-  - All models are extended to **12K context length** (auto rope from KCPP)
   - **Flash Attention** and **8Bit cache compression** are enabled.
-- Response size set to 1024 tokens max.
-- Fixed Seed for all tests: **123**
 # System Prompt and Instruct Format

 - Frontend is staging version of Silly Tavern.
 - Backend is the latest version of KoboldCPP for Windows using CUDA 12.
 - Using **CuBLAS** but **not using QuantMatMul (mmq)**.
+- Fixed Seed for all tests: **123**
 - **7-10B Models:**
   - All models are loaded in Q8_0 (GGUF)
   - **Flash Attention** and **ContextShift** enabled.
+  - All models are extended to **16K context length** (auto rope from KCPP)
+  - Response size set to 1024 tokens max.
 - **11-15B Models:**
   - All models are loaded in Q4_KM or whatever is the highest/closest available (GGUF)
   - **Flash Attention** and **8Bit cache compression** are enabled.
+  - All models are extended to **12K context length** (auto rope from KCPP)
+  - Response size set to 512 tokens max.
 # System Prompt and Instruct Format