title: “Installing Ollama” parent: “Installation” nav_order: 5 —

Installing Ollama

Ollama has become my preferred choice for terminal-based LLM work. It’s incredibly simple to set up and use, especially if you’re comfortable with command-line tools. Here’s how I got it working and what I learned along the way.

Why I Like Ollama

Advantages:

Simple Setup: Single binary installation with minimal configuration
Broad Model Support: Supports Llama, Mistral, Phi, Qwen, CodeLlama, and many more
Automatic Model Management: Downloads, updates, and manages models automatically
Cross-Platform: Works on Windows, macOS, and Linux
Active Development: Regular updates and new model support
REST API: Built-in API server for integration with other tools

Best For:

Developers who prefer command-line tools
Users who want to quickly try multiple models
Integration with development workflows
Automated deployments and scripting

Installation by Platform

Windows Installation

Method 1: Direct Download (Recommended)

Download Ollama for Windows
- Visit ollama.com
- Click “Download for Windows”
- Download the .exe installer

Run the Installer

# Run the downloaded installer
.\OllamaSetup.exe

Verify Installation

# Open PowerShell and verify Ollama is installed
ollama --version

Method 2: Windows Package Manager

# Install via winget
winget install Ollama.Ollama

# Or via Chocolatey
choco install ollama

Method 3: Manual Installation

# Download and extract manually
Invoke-WebRequest -Uri "https://ollama.com/download/ollama-windows-amd64.zip" -OutFile "ollama.zip"
Expand-Archive -Path "ollama.zip" -DestinationPath "C:\Program Files\Ollama"

# Add to PATH
$env:PATH += ";C:\Program Files\Ollama"

macOS Installation

Method 1: Direct Download (Recommended)

Download Ollama for macOS
- Visit ollama.com
- Click “Download for macOS”
- Download the .pkg installer
Install the Package
- Double-click the downloaded .pkg file
- Follow the installation wizard
- Enter your admin password when prompted

Verify Installation

# Open Terminal and verify
ollama --version

Method 2: Homebrew

# Install via Homebrew
brew install ollama

# Start Ollama service
brew services start ollama

Method 3: Manual Installation

# Download and install manually
curl -fsSL https://ollama.com/install.sh | sh

Linux Installation

Method 1: Official Install Script (Recommended)

# Download and run the official install script
curl -fsSL https://ollama.com/install.sh | sh

# Verify installation
ollama --version

Method 2: Package Managers

Ubuntu/Debian:

# Add Ollama repository
curl -fsSL https://ollama.com/install.sh | sh

# Or download .deb package
wget https://ollama.com/download/ollama-linux-amd64.deb
sudo dpkg -i ollama-linux-amd64.deb

CentOS/RHEL/Fedora:

# Download and install RPM
wget https://ollama.com/download/ollama-linux-amd64.rpm
sudo rpm -i ollama-linux-amd64.rpm

Arch Linux:

# Install from AUR
yay -S ollama
# or
paru -S ollama

Method 3: Docker Installation

# Run Ollama in Docker
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

# For GPU support (NVIDIA)
docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

First Steps with Ollama

1. Start Ollama Service

# Start Ollama service (runs in background)
ollama serve

Note: On Windows and macOS, Ollama typically starts automatically after installation.

2. Download Your First Model

# Download and run Llama 3.1 8B (good starting model)
ollama run llama3.1

# Or try a coding-specific model
ollama run codellama

# For a smaller, faster model
ollama run phi3

3. Basic Ollama Commands

# List available models
ollama list

# Pull a model without running it
ollama pull mistral

# Show model information
ollama show llama3.1

# Remove a model
ollama rm llama3.1

# Update all models
ollama pull --all

4. Interactive Chat

# Start interactive chat with a model
ollama run llama3.1
>>> Hello! Can you help me write Python code?

# Exit chat mode
>>> /bye

Model Recommendations

For Beginners

llama3.1 (8B) - Excellent general-purpose model
phi3 (3.8B) - Fast and efficient, good for basic tasks
gemma2 (9B) - Good balance of capability and speed

For Coding

codellama (7B/13B/34B) - Specialized for code generation
codegemma (7B) - Google’s coding model
deepseek-coder (6.7B/33B) - Strong coding performance

For Advanced Users

llama3.1 (70B) - High capability, requires significant RAM
mistral (7B/8x7B) - Excellent reasoning capabilities
qwen2.5 (7B/14B/32B) - Strong multilingual support

Configuration

Environment Variables

# Set custom model directory
export OLLAMA_MODELS=/path/to/your/models

# Set custom host and port
export OLLAMA_HOST=0.0.0.0:11434

# Enable debug logging
export OLLAMA_DEBUG=1

GPU Configuration

NVIDIA GPU Support:

# Ollama automatically detects NVIDIA GPUs
# Ensure NVIDIA drivers and CUDA are installed

# Check GPU detection
ollama serve --help

AMD GPU Support (ROCm):

# Install ROCm drivers first
# Ollama will automatically use ROCm if available

Apple Silicon:

# Metal acceleration is automatic on Apple Silicon Macs
# No additional configuration needed

Memory Management

# Set context length (affects memory usage)
ollama run llama3.1 --context-length 4096

# Set number of GPU layers (hybrid CPU/GPU)
ollama run llama3.1 --num-gpu 20

Integration Examples

REST API Usage

# Start Ollama API server
ollama serve

# Generate text via API
curl http://localhost:11434/api/generate -d '{
  "model": "llama3.1",
  "prompt": "Why is the sky blue?",
  "stream": false
}'

Python Integration

import requests
import json

def query_ollama(prompt, model="llama3.1"):
    url = "http://localhost:11434/api/generate"
    data = {
        "model": model,
        "prompt": prompt,
        "stream": False
    }

    response = requests.post(url, json=data)
    return response.json()["response"]

# Example usage
result = query_ollama("Write a Python function to calculate fibonacci numbers")
print(result)

VS Code Extension

# Install the Ollama VS Code extension
# Search for "Ollama" in VS Code Extensions marketplace

Troubleshooting

Common Issues

Issue: “ollama command not found”

# Solution: Add Ollama to PATH
export PATH=$PATH:/usr/local/bin/ollama

# Or reinstall Ollama
curl -fsSL https://ollama.com/install.sh | sh

Issue: “Connection refused” when using API

# Solution: Start Ollama service
ollama serve

# Check if service is running
ps aux | grep ollama

Issue: Model download fails

# Solution: Check internet connection and disk space
df -h  # Check disk space
ping ollama.com  # Check connectivity

# Try downloading again
ollama pull llama3.1

Issue: Out of memory errors

# Solution: Use smaller models or adjust memory settings
ollama run phi3  # Try smaller model

# Or set memory limits
ollama run llama3.1 --memory 8GB

Issue: Slow performance

# Solution: Check GPU detection
ollama run llama3.1 --verbose

# Ensure drivers are installed
nvidia-smi  # For NVIDIA
rocm-smi   # For AMD

GPU Troubleshooting

NVIDIA GPU not detected:

# Check NVIDIA drivers
nvidia-smi

# Install CUDA toolkit
# Visit: https://developer.nvidia.com/cuda-downloads

# Reinstall Ollama after CUDA installation

AMD GPU not detected:

# Install ROCm
# Ubuntu/Debian:
sudo apt update
sudo apt install rocm-dev

# Check ROCm installation
rocm-smi

Performance Optimization

# Monitor resource usage
htop
nvidia-smi -l 1  # For NVIDIA GPU monitoring

# Optimize for speed vs quality
ollama run llama3.1 --temperature 0.1  # More deterministic
ollama run llama3.1 --top-p 0.9        # Focused responses

Advanced Usage

Custom Models

# Create custom model from Modelfile
echo 'FROM llama3.1
PARAMETER temperature 0.1
SYSTEM "You are a helpful coding assistant."' > Modelfile

# Build custom model
ollama create mycodellama -f Modelfile

# Use custom model
ollama run mycodellama

Model Quantization

# Ollama automatically uses optimized quantizations
# Q4_0, Q4_1, Q5_0, Q5_1, Q8_0 variants available

# Check available variants
ollama list | grep llama3.1

Batch Processing

# Process multiple prompts
echo "Explain machine learning" | ollama run llama3.1
echo "Write a Python function" | ollama run codellama

Next Steps

Once you have Ollama installed and running:

Explore Models: Try different models for various tasks
Integration: Connect Ollama to your development workflow
API Development: Build applications using Ollama’s REST API
Performance Tuning: Optimize settings for your hardware

Installing GPT4All - Alternative GUI-based option
Model Selection Guide - Choosing the right model
Hardware Requirements - Memory and GPU guidance
Best Practices - Optimization tips

Last updated: July 20, 2025

Installing Ollama

Why I Like Ollama

Installation by Platform

Windows Installation

Method 1: Direct Download (Recommended)

Method 2: Windows Package Manager

Method 3: Manual Installation

macOS Installation

Method 1: Direct Download (Recommended)

Method 2: Homebrew

Method 3: Manual Installation

Linux Installation

Method 1: Official Install Script (Recommended)

Method 2: Package Managers

Method 3: Docker Installation

First Steps with Ollama

1. Start Ollama Service

2. Download Your First Model

3. Basic Ollama Commands

4. Interactive Chat

Model Recommendations

For Beginners

For Coding

For Advanced Users

Configuration

Environment Variables

GPU Configuration

Memory Management

Integration Examples

REST API Usage

Python Integration

VS Code Extension

Troubleshooting

Common Issues

GPU Troubleshooting

Performance Optimization

Advanced Usage

Custom Models

Model Quantization

Batch Processing

Next Steps

Related Sections