✨ Finetune for Free
Notebooks are beginner friendly. Read our guide. Add your dataset, click "Run All", and export your finetuned model to GGUF, Ollama, vLLM or Hugging Face.
⚡ Quickstart
Linux or WSL
pip install unsloth
Windows
For Windows, pip install unsloth works only if you have Pytorch installed. For more info, read our Windows Guide.
Docker
Use our official Unsloth Docker image unsloth/unsloth container. Read our Docker Guide.
Blackwell
For RTX 50x, B200, 6000 GPUs, simply do pip install unsloth. Read our Blackwell Guide for more details.
🦥 Unsloth.ai News
- 📣 Memory-efficient RL We're introducing even better RL. Our new kernels & algos allows faster RL with 50% less VRAM & 10× more context. Read blog
- 📣 gpt-oss by OpenAI: For details on Unsloth Flex Attention, long-context training, bug fixes, Read our Guide. 20B works on a 14GB GPU and 120B on 65GB VRAM. gpt-oss uploads.
- 📣 Gemma 3n by Google: Read Blog. We uploaded GGUFs, 4-bit models.
- 📣 Text-to-Speech (TTS) is now supported, including
sesame/csm-1b and STT openai/whisper-large-v3.
- 📣 Qwen3 is now supported. Qwen3-30B-A3B fits on 17.5GB VRAM.
- 📣 Introducing Dynamic 2.0 quants that set new benchmarks on 5-shot MMLU & KL Divergence.
- 📣 EVERYTHING is now supported - all models (BERT, diffusion, Cohere, Mamba), FFT, etc. MultiGPU coming soon. Enable FFT with
full_finetuning = True, 8-bit with load_in_8bit = True.
Click for more news
🔗 Links and Resources
⭐ Key Features
- Supports full-finetuning, pretraining, 4b-bit, 16-bit and 8-bit training
- Supports all transformer-style models including TTS, STT, multimodal, diffusion, BERT and more!
- All kernels written in OpenAI's Triton language. Manual backprop engine.
- 0% loss in accuracy - no approximation methods - all exact.
- No change of hardware. Supports NVIDIA GPUs since 2018+. Minimum CUDA Capability 7.0 (V100, T4, Titan V, RTX 20, 30, 40x, A100, H100, L40 etc) Check your GPU! GTX 1070, 1080 works, but is slow.
- Works on Linux and Windows
- If you trained a model with 🦥Unsloth, you can use this cool sticker!

💾 Install Unsloth
[!warning]
Python 3.14 does not support Unsloth. Use 3.13 or lower.
You can also see our documentation for more detailed installation and updating instructions here.
Pip Installation
Install with pip (recommended) for Linux devices:
pip install unsloth
To update Unsloth:
pip install --upgrade --force-reinstall --no-cache-dir unsloth unsloth_zoo
See here for advanced pip install instructions.
Windows Installation
-
Install NVIDIA Video Driver:
You should install the latest version of your GPUs driver. Download drivers here: NVIDIA GPU Drive.
-
Install Visual Studio C++:
You will need Visual Studio, with C++ installed. By default, C++ is not installed with Visual Studio, so make sure you select all of the C++ options. Also select options for Windows 10/11 SDK. For detailed instructions with options, see here.
-
Install CUDA Toolkit:
Follow the instructions to install CUDA Toolkit.
-
Install PyTorch:
You will need the correct version of PyTorch that is compatible with your CUDA drivers, so make sure to select them carefully.
Install PyTorch.
-
Install Unsloth:
pip install unsloth
Notes
To run Unsloth directly on Windows:
- Install Triton from this Windows fork and follow the instructions here (be aware that the Windows fork requires PyTorch >= 2.4 and CUDA 12)
- In the
SFTConfig, set dataset_num_proc=1 to avoid a crashing issue:
SFTConfig(
dataset_num_proc=1,
...
)
Advanced/Troubleshooting
For advanced installation instructions or if you see weird errors during installations:
First try using an isolated environment via then pip install unsloth
python -m venv unsloth
source unsloth/bin/activate
pip install unsloth
- Install
torch and triton. Go to https://pytorch.org to install it. For example pip install torch torchvision torchaudio triton
- Confirm if CUDA is installed correctly. Try
nvcc. If that fails, you need to install cudatoolkit or CUDA drivers.
- Install
xformers manually via:
pip install ninja
pip install -v --no-build-isolation -U git+https://github.com/facebookresearch/xformers.git@main
Check if `xformers` succeeded with `python -m xformers.info` Go to https://github.com/facebookresearch/xformers. Another option is to install `flash-attn` for Ampere GPUs and ignore `xformers`
5. For GRPO runs, you can try installing vllm and seeing if pip install vllm succeeds.
6. Double check that your versions of Python, CUDA, CUDNN, torch, triton, and xformers are compatible with one another. The PyTorch Compatibility Matrix may be useful.
5. Finally, install bitsandbytes and check it with python -m bitsandbytes
Conda Installation (Optional)
⚠️Only use Conda if you have it. If not, use Pip. Select either pytorch-cuda=11.8,12.1 for CUDA 11.8 or CUDA 12.1. We support python=3.10,3.11,3.12.
conda create --name unsloth_env \
python=3.11 \
pytorch-cuda=12.1 \
pytorch cudatoolkit xformers -c pytorch -c nvidia -c xformers \
-y
conda activate unsloth_env
pip install unsloth
If you're looking to install Conda in a Linux environment, read here, or run the below 🔽
mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh
~/miniconda3/bin/conda init bash
~/miniconda3/bin/conda init zsh
Advanced Pip Installation
⚠️Do **NOT** use this if you have Conda. Pip is a bit more complex since there are dependency issues. The pip command is different for torch 2.2,2.3,2.4,2.5 and CUDA versions.
For other torch versions, we support torch211, torch212, torch220, torch230, torch240 and for CUDA versions, we support cu118 and cu121 and cu124. For Ampere devices (A100, H100, RTX3090) and above, use cu118-ampere or cu121-ampere or cu124-ampere.
For example, if you have torch 2.4 and CUDA 12.1, use:
pip install --upgrade pip
pip install "unsloth[cu121-torch240] @ git+https://github.com/unslothai/unsloth.git"
Another example, if you have torch 2.5 and CUDA 12.4, use:
pip install --upgrade pip
pip install "unsloth[cu124-torch250] @ git+https://github.com/unslothai/unsloth.git"
And other examples:
pip install "unsloth[cu121-ampere-torch240] @ git+https://github.com/unslothai/unsloth.git"
pip install "unsloth[cu118-ampere-torch240] @ git+https://github.com/unslothai/unsloth.git"
pip install "unsloth[cu121-torch240] @ git+https://github.com/unslothai/unsloth.git"
pip install "unsloth[cu118-torch240] @ git+https://github.com/unslothai/unsloth.git"
pip install "unsloth[cu121-torch230] @ git+https://github.com/unslothai/unsloth.git"
pip install "unsloth[cu121-ampere-torch230] @ git+https://github.com/unslothai/unsloth.git"
pip install "unsloth[cu121-torch250] @ git+https://github.com/unslothai/unsloth.git"
pip install "unsloth[cu124-ampere-torch250] @ git+https://github.com/unslothai/unsloth.git"
Or, run the below in a terminal to get the optimal pip installation command:
wget -qO- https://raw.githubusercontent.com/unslothai/unsloth/main/unsloth/_auto_install.py | python -
Or, run the below manually in a Python REPL:
try: import torch
except: raise ImportError('Install torch via `pip install torch`')
from packaging.version import Version as V
import re
v = V(re.match(r"[0-9\.]{3,}", torch.__version__).group(0))
cuda = str(torch.version.cuda)
is_ampere = torch.cuda.get_device_capability()[0] >= 8
USE_ABI = torch._C._GLIBCXX_USE_CXX11_ABI
if cuda not in ("11.8", "12.1", "12.4", "12.6", "12.8"): raise RuntimeError(f"CUDA = {cuda} not supported!")
if v <= V('2.1.0'): raise RuntimeError(f"Torch = {v} too old!")
elif v <= V('2.1.1'): x = 'cu{}{}-torch211'
elif v <= V('2.1.2'): x = 'cu{}{}-torch212'
elif v < V('2.3.0'): x = 'cu{}{}-torch220'
elif v < V('2.4.0'): x = 'cu{}{}-torch230'
elif v < V('2.5.0'): x = 'cu{}{}-torch240'
elif v < V('2.5.1'): x = 'cu{}{}-torch250'
elif v <= V('2.5.1'): x = 'cu{}{}-torch251'
elif v < V('2.7.0'): x = 'cu{}{}-torch260'
elif v < V('2.7.9'): x = 'cu{}{}-torch270'
elif v < V('2.8.0'): x = 'cu{}{}-torch271'
elif v < V('2.8.9'): x = 'cu{}{}-torch280'
else: raise RuntimeError(f"Torch = {v} too new!")
if v > V('2.6.9') and cuda not in ("11.8", "12.6", "12.8"): raise RuntimeError(f"CUDA = {cuda} not supported!")
x = x.format(cuda.replace(".", ""), "-ampere" if is_ampere else "")
print(f'pip install --upgrade pip && pip install "unsloth[{x}] @ git+https://github.com/unslothai/unsloth.git"')
Docker Installation
You can use our pre-built Docker container with all dependencies to use Unsloth instantly with no setup required.
Read our guide.
This container requires installing NVIDIA's Container Toolkit.
docker run -d -e JUPYTER_PASSWORD="mypassword" \
-p 8888:8888 -p 2222:22 \
-v $(pwd)/work:/workspace/work \
--gpus all \
unsloth/unsloth
Access Jupyter Lab at http://localhost:8888 and start fine-tuning!
📜 Documentation
- Go to our official Documentation for saving to GGUF, checkpointing, evaluation and more!
- We support Huggingface's TRL, Trainer, Seq2SeqTrainer or even Pytorch code!
- We're in 🤗Hugging Face's official docs! Check out the SFT docs and DPO docs!
- If you want to download models from the ModelScope community, please use an environment variable:
UNSLOTH_USE_MODELSCOPE=1, and install the modelscope library by: pip install modelscope -U.
unsloth_cli.py also supports UNSLOTH_USE_MODELSCOPE=1 to download models and datasets. please remember to use the model and dataset id in the ModelScope community.
from unsloth import FastLanguageModel, FastModel
import torch
from trl import SFTTrainer, SFTConfig
from datasets import load_dataset
max_seq_length = 2048
url = "https://huggingface.co/datasets/laion/OIG/resolve/main/unified_chip2.jsonl"
dataset = load_dataset("json", data_files = {"train" : url}, split = "train")
fourbit_models = [
"unsloth/Meta-Llama-3.1-8B-bnb-4bit",
"unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
"unsloth/Meta-Llama-3.1-70B-bnb-4bit",
"unsloth/Meta-Llama-3.1-405B-bnb-4bit",
"unsloth/Mistral-Small-Instruct-2409",
"unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
"unsloth/Phi-3.5-mini-instruct",
"unsloth/Phi-3-medium-4k-instruct",
"unsloth/gemma-2-9b-bnb-4bit",
"unsloth/gemma-2-27b-bnb-4bit",
"unsloth/Llama-3.2-1B-bnb-4bit",
"unsloth/Llama-3.2-1B-Instruct-bnb-4bit",
"unsloth/Llama-3.2-3B-bnb-4bit",
"unsloth/Llama-3.2-3B-Instruct-bnb-4bit",
"unsloth/Llama-3.3-70B-Instruct-bnb-4bit"
]
model, tokenizer = FastModel.from_pretrained(
model_name = "unsloth/gemma-3-4B-it",
max_seq_length = 2048,
load_in_4bit = True,
load_in_8bit = False,
full_finetuning = False,
)
model = FastLanguageModel.get_peft_model(
model,
r = 16,
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",],
lora_alpha = 16,
lora_dropout = 0,
bias = "none",
use_gradient_checkpointing = "unsloth",
random_state = 3407,
max_seq_length = max_seq_length,
use_rslora = False,
loftq_config = None,
)
trainer = SFTTrainer(
model = model,
train_dataset = dataset,
tokenizer = tokenizer,
args = SFTConfig(
max_seq_length = max_seq_length,
per_device_train_batch_size = 2,
gradient_accumulation_steps = 4,
warmup_steps = 10,
max_steps = 60,
logging_steps = 1,
output_dir = "outputs",
optim = "adamw_8bit",
seed = 3407,
),
)
trainer.train()
💡 Reinforcement Learning
RL including DPO, GRPO, PPO, Reward Modelling, Online DPO all work with Unsloth. We're in 🤗Hugging Face's official docs! We're on the GRPO docs and the DPO docs! List of RL notebooks:
- Advanced Qwen3 GRPO notebook: Link
- ORPO notebook: Link
- DPO Zephyr notebook: Link
- KTO notebook: Link
- SimPO notebook: Link
Click for DPO code
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
from unsloth import FastLanguageModel
import torch
from trl import DPOTrainer, DPOConfig
max_seq_length = 2048
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "unsloth/zephyr-sft-bnb-4bit",
max_seq_length = max_seq_length,
load_in_4bit = True,
)
model = FastLanguageModel.get_peft_model(
model,
r = 64,
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",],
lora_alpha = 64,
lora_dropout = 0,
bias = "none",
use_gradient_checkpointing = "unsloth",
random_state = 3407,
max_seq_length = max_seq_length,
)
dpo_trainer = DPOTrainer(
model = model,
ref_model = None,
train_dataset = YOUR_DATASET_HERE,
tokenizer = tokenizer,
args = DPOConfig(
per_device_train_batch_size = 4,
gradient_accumulation_steps = 8,
warmup_ratio = 0.1,
num_train_epochs = 3,
logging_steps = 1,
optim = "adamw_8bit",
seed = 42,
output_dir = "outputs",
max_length = 1024,
max_prompt_length = 512,
beta = 0.1,
),
)
dpo_trainer.train()
🥇 Performance Benchmarking
We tested using the Alpaca Dataset, a batch size of 2, gradient accumulation steps of 4, rank = 32, and applied QLoRA on all linear layers (q, k, v, o, gate, up, down):
| Llama 3.3 (70B) | 80GB | 2x | >75% | 13x longer | 1x |
| Llama 3.1 (8B) | 80GB | 2x | >70% | 12x longer | 1x |
Context length benchmarks
Llama 3.1 (8B) max. context length
We tested Llama 3.1 (8B) Instruct and did 4bit QLoRA on all linear layers (Q, K, V, O, gate, up and down) with rank = 32 with a batch size of 1. We padded all sequences to a certain maximum sequence length to mimic long context finetuning workloads.
| 8 GB | 2,972 | OOM |
| 12 GB | 21,848 | 932 |
| 16 GB | 40,724 | 2,551 |
| 24 GB | 78,475 | 5,789 |
| 40 GB | 153,977 | 12,264 |
| 48 GB | 191,728 | 15,502 |
| 80 GB | 342,733 | 28,454 |
Llama 3.3 (70B) max. context length
We tested Llama 3.3 (70B) Instruct on a 80GB A100 and did 4bit QLoRA on all linear layers (Q, K, V, O, gate, up and down) with rank = 32 with a batch size of 1. We padded all sequences to a certain maximum sequence length to mimic long context finetuning workloads.
| 48 GB | 12,106 | OOM |
| 80 GB | 89,389 | 6,916 |
Citation
You can cite the Unsloth repo as follows:
@software{unsloth,
author = {Daniel Han, Michael Han and Unsloth team},
title = {Unsloth},
url = {http://github.com/unslothai/unsloth},
year = {2023}
}
Thank You to