Unsloth

Open-source Python library that makes fine-tuning open-weights LLMs fast, free, and accessible. Wraps the underlying training stack (PyTorch + Hugging Face Transformers + LoRA via PEFT) and adds aggressive optimizations for memory and speed. The wiki’s anchor entry for fine-tuning workflows.

License: Open source
Site: unsloth.ai
GitHub: unslothai/unsloth
Best fit: fine-tuning small-to-medium open-weights models (Llama 3.1, Mistral, Phi3, Gemma, etc.) on consumer GPUs or free Google Colab runtimes

What it does

Unsloth pre-patches model architectures and the training loop with custom CUDA kernels so that fine-tuning a 4-bit quantized model uses dramatically less VRAM than the equivalent vanilla Hugging Face setup, and trains 2–5× faster on the same hardware. The trade-off: Unsloth supports a curated list of model architectures (the popular open-weights families) rather than every model on Hugging Face.

The combined effect is what makes the wiki care about Unsloth specifically: a Phi3 mini fine-tune that would need a workstation GPU and hours of training in vanilla HF can run on the free Google Colab T4 runtime in ~10 minutes, end-to-end, including the LoRA adapter setup.

How Tech With Tim uses it (the wiki’s primary recipe)

From the fine-tuning walkthrough:

Dataset: 500 input/output JSON pairs (HTML → structured extraction in Tim’s demo). The format is plain [{"input": "...", "output": "..."}, ...] — Unsloth doesn’t impose a schema, the user formats the prompt template themselves.
Load the base model: FastLanguageModel.from_pretrained(model_name=...) with load_in_4bit=True. Tim uses unsloth/Phi-3-mini-4k-instruct-bnb-4bit for the demo because it trains fastest.
Add LoRA adapters: FastLanguageModel.get_peft_model(...) — adds the trainable layers on top of the frozen 4-bit base. This is the step that makes fine-tuning tractable on a T4.
Train: SFTTrainer from TRL, with the formatted dataset and standard hyperparameters. Tim’s demo trains in ~10 minutes for 500 examples.
Save to GGUF: model.save_pretrained_gguf(...) — exports the merged fine-tuned weights in GGUF format, the file format Ollama (and llama.cpp) expect. This step takes another 10–15 minutes.
Load into Ollama: download the unsloth.q4_K_M.gguf file from Colab, drop it next to a custom Modelfile, run ollama create <name> -f Modelfile, then ollama run <name>.

The whole pipeline is reproducible from the linked Colab notebook — see Tim’s video description.

Where Unsloth fits in the wiki’s local-AI threads

Fine-tuning — Unsloth is the practical entry point. The concept page documents the workflow shape; this entry is the tool that makes it tractable.
Ollama — Unsloth’s GGUF export is what lets fine-tuned models become running local models via Ollama’s Modelfile system.
llama.cpp — GGUF is llama.cpp’s native format; Unsloth’s export is going through that pipeline.
Gemma 4 VRAM Requirements — adjacent reference. Hardware sizing for running models; Unsloth is for training them. The same VRAM constraints apply.
Local-AI cost reduction — Unsloth is part of the “you don’t need to pay for fine-tuning APIs” thread alongside Ollama’s “you don’t need to pay for inference APIs” framing.

Caveats from the source

Tim’s demo intentionally uses a small dataset (500 examples) and a small model (Phi3 mini) to keep the video runtime manageable. The output is “barely working” — sometimes correct, sometimes hallucinating the JSON shape. Real fine-tuning needs more examples (thousands, not hundreds).
Fine-tuned models get worse at general tasks while getting better at the trained one. Tim is explicit about this trade-off.
The trainer step’s optimal hyperparameters are model-specific; Tim leaves them at Unsloth’s defaults and tells viewers to read the Unsloth docs for tuning.

AI For Dev

Explorer

unsloth

Unsloth

What it does

How Tech With Tim uses it (the wiki’s primary recipe)

Where Unsloth fits in the wiki’s local-AI threads

Caveats from the source

See Also

Graph View

Table of Contents

Backlinks