Instructions to use arcee-ai/Mistral-7B-Instruct-v0.2-sliced-24-layer with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use arcee-ai/Mistral-7B-Instruct-v0.2-sliced-24-layer with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="arcee-ai/Mistral-7B-Instruct-v0.2-sliced-24-layer")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("arcee-ai/Mistral-7B-Instruct-v0.2-sliced-24-layer")
model = AutoModelForCausalLM.from_pretrained("arcee-ai/Mistral-7B-Instruct-v0.2-sliced-24-layer")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use arcee-ai/Mistral-7B-Instruct-v0.2-sliced-24-layer with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "arcee-ai/Mistral-7B-Instruct-v0.2-sliced-24-layer"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "arcee-ai/Mistral-7B-Instruct-v0.2-sliced-24-layer",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/arcee-ai/Mistral-7B-Instruct-v0.2-sliced-24-layer

SGLang

How to use arcee-ai/Mistral-7B-Instruct-v0.2-sliced-24-layer with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "arcee-ai/Mistral-7B-Instruct-v0.2-sliced-24-layer" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "arcee-ai/Mistral-7B-Instruct-v0.2-sliced-24-layer",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "arcee-ai/Mistral-7B-Instruct-v0.2-sliced-24-layer" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "arcee-ai/Mistral-7B-Instruct-v0.2-sliced-24-layer",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use arcee-ai/Mistral-7B-Instruct-v0.2-sliced-24-layer with Docker Model Runner:
```
docker model run hf.co/arcee-ai/Mistral-7B-Instruct-v0.2-sliced-24-layer
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Quick Summary

This model is an adaptation of the mistralai/Mistral-7B-Instruct-v0.2, refined through the application of layer pruning techniques as detailed in the paper "The Unreasonable Ineffectiveness of the Deeper Layers." It incorporates methodologies from the MergeKit and PruneMe repositories to optimize its structure, focusing on reducing redundancy within the model's deeper layers without compromising its ability to generate coherent text. The model is maintained by Arcee-ai and represents a practical implementation of computational efficiency improvements in Large Language Models (LLMs), aiming to balance performance with resource usage effectively.

Model Description

This model represents a specialized iteration of the mistralai/Mistral-7B-Instruct-v0.2, optimized for efficiency and performance through selective layer pruning. Developed by Arcee-ai, it leverages insights from the "The Unreasonable Ineffectiveness of the Deeper Layers" research. The pruning process was informed by the MergeKit and PruneMe tools, focusing on eliminating redundant layers to ensure a leaner, more efficient model capable of generating high-quality text outputs.

Model Sources

Pruning: PruneMe GitHub (unofficial)
Paper: "The Unreasonable Ineffectiveness of the Deeper Layers"
Merging Repository: MergeKit GitHub

Uses

This pruned model is designed for a range of NLP tasks, with a focus on maintaining or even enhancing the model's original capabilities in generating coherent text, despite the reduction in its size. It stands as a testament to the feasibility of layer pruning in preserving the essential functional attributes of a model while offering a template for computational resource optimization.

Downstream Use

The pruned model serves as a robust foundation for fine-tuning on specific tasks and is an ideal candidate for exploring continuous pre-training opportunities. Its development is a direct application of principles outlined in "The Unreasonable Ineffectiveness of the Deeper Layers," utilizing the MergeKit and PruneMe repositories for practical pruning implementation. This model is a step forward in efficient model design, demonstrating the potential for significant reductions in computational resource requirements without detrimental effects on performance.

Downloads last month: 12

Safetensors

Model size

5B params

Tensor type

BF16

Model tree for arcee-ai/Mistral-7B-Instruct-v0.2-sliced-24-layer

Quantizations

3 models

Dataset used to train arcee-ai/Mistral-7B-Instruct-v0.2-sliced-24-layer

Paper for arcee-ai/Mistral-7B-Instruct-v0.2-sliced-24-layer

The Unreasonable Ineffectiveness of the Deeper Layers

Paper • 2403.17887 • Published Mar 26, 2024 • 83