BGE Code v1 GGUF
BGE-Code-v1 is an LLM-based code embedding model that supports code retrieval, text retrieval, and multilingual retrieval. Refer to the original model card for more details on the model.
Prerequisites
- llama.cpp installed
Available Quantizations
- bge-code-v1-F32.gguf - 32-bit float (original precision, largest file, best quality)
- bge-code-v1-F16.gguf - 16-bit float (half precision, excellent quality)
- bge-code-v1-Q8_0.gguf - 8-bit quantization (recommended, great quality-size balance)
- bge-code-v1-Q6_K.gguf - 6-bit quantization (balanced)
- bge-code-v1-Q4_0.gguf - 4-bit quantization (smaller, faster)
Running the Server
You can specify the host, port:
llama-server \
--hf-repo goldpulpy/bge-code-v1-GGUF \
--hf-file bge-code-v1-Q8_0.gguf \ # Model file
--host 0.0.0.0 \ # Server host (default: 127.0.0.1)
--port 8080 \ # Server port (default: 8080)
--embeddings
- Default host:
127.0.0.1 - Default port:
8080
After starting, the server is accessible at http://127.0.0.1:8080.
Python Example (OpenAI-compatible)
from openai import OpenAI
client = OpenAI(base_url="http://127.0.0.1:8080/v1", api_key="") # API key can be empty
response = client.embeddings.create(
model="bge-code-v1",
input="def add(a, b): return a + b"
)
embedding_vector = response.data[0].embedding
print("Embedding length:", len(embedding_vector))
print("First 10 values:", embedding_vector[:10])
- Downloads last month
- 68
Hardware compatibility
Log In
to view the estimation
4-bit
6-bit
8-bit
16-bit
32-bit
Model tree for goldpulpy/bge-code-v1-GGUF
Base model
BAAI/bge-code-v1