title: “Installing Ollama” parent: “Installation” nav_order: 5 —
Installing Ollama
Ollama has become my preferred choice for terminal-based LLM work. It’s incredibly simple to set up and use, especially if you’re comfortable with command-line tools. Here’s how I got it working and what I learned along the way.
Why I Like Ollama
Advantages:
- Simple Setup: Single binary installation with minimal configuration
- Broad Model Support: Supports Llama, Mistral, Phi, Qwen, CodeLlama, and many more
- Automatic Model Management: Downloads, updates, and manages models automatically
- Cross-Platform: Works on Windows, macOS, and Linux
- Active Development: Regular updates and new model support
- REST API: Built-in API server for integration with other tools
Best For:
- Developers who prefer command-line tools
- Users who want to quickly try multiple models
- Integration with development workflows
- Automated deployments and scripting
Installation by Platform
Windows Installation
Method 1: Direct Download (Recommended)
-
Download Ollama for Windows
- Visit ollama.com
- Click “Download for Windows”
- Download the
.exe
installer
-
Run the Installer
# Run the downloaded installer .\OllamaSetup.exe
-
Verify Installation
# Open PowerShell and verify Ollama is installed ollama --version
Method 2: Windows Package Manager
# Install via winget
winget install Ollama.Ollama
# Or via Chocolatey
choco install ollama
Method 3: Manual Installation
# Download and extract manually
Invoke-WebRequest -Uri "https://ollama.com/download/ollama-windows-amd64.zip" -OutFile "ollama.zip"
Expand-Archive -Path "ollama.zip" -DestinationPath "C:\Program Files\Ollama"
# Add to PATH
$env:PATH += ";C:\Program Files\Ollama"
macOS Installation
Method 1: Direct Download (Recommended)
-
Download Ollama for macOS
- Visit ollama.com
- Click “Download for macOS”
- Download the
.pkg
installer
-
Install the Package
- Double-click the downloaded
.pkg
file - Follow the installation wizard
- Enter your admin password when prompted
- Double-click the downloaded
-
Verify Installation
# Open Terminal and verify ollama --version
Method 2: Homebrew
# Install via Homebrew
brew install ollama
# Start Ollama service
brew services start ollama
Method 3: Manual Installation
# Download and install manually
curl -fsSL https://ollama.com/install.sh | sh
Linux Installation
Method 1: Official Install Script (Recommended)
# Download and run the official install script
curl -fsSL https://ollama.com/install.sh | sh
# Verify installation
ollama --version
Method 2: Package Managers
Ubuntu/Debian:
# Add Ollama repository
curl -fsSL https://ollama.com/install.sh | sh
# Or download .deb package
wget https://ollama.com/download/ollama-linux-amd64.deb
sudo dpkg -i ollama-linux-amd64.deb
CentOS/RHEL/Fedora:
# Download and install RPM
wget https://ollama.com/download/ollama-linux-amd64.rpm
sudo rpm -i ollama-linux-amd64.rpm
Arch Linux:
# Install from AUR
yay -S ollama
# or
paru -S ollama
Method 3: Docker Installation
# Run Ollama in Docker
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
# For GPU support (NVIDIA)
docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
First Steps with Ollama
1. Start Ollama Service
# Start Ollama service (runs in background)
ollama serve
Note: On Windows and macOS, Ollama typically starts automatically after installation.
2. Download Your First Model
# Download and run Llama 3.1 8B (good starting model)
ollama run llama3.1
# Or try a coding-specific model
ollama run codellama
# For a smaller, faster model
ollama run phi3
3. Basic Ollama Commands
# List available models
ollama list
# Pull a model without running it
ollama pull mistral
# Show model information
ollama show llama3.1
# Remove a model
ollama rm llama3.1
# Update all models
ollama pull --all
4. Interactive Chat
# Start interactive chat with a model
ollama run llama3.1
>>> Hello! Can you help me write Python code?
# Exit chat mode
>>> /bye
Model Recommendations
For Beginners
- llama3.1 (8B) - Excellent general-purpose model
- phi3 (3.8B) - Fast and efficient, good for basic tasks
- gemma2 (9B) - Good balance of capability and speed
For Coding
- codellama (7B/13B/34B) - Specialized for code generation
- codegemma (7B) - Google’s coding model
- deepseek-coder (6.7B/33B) - Strong coding performance
For Advanced Users
- llama3.1 (70B) - High capability, requires significant RAM
- mistral (7B/8x7B) - Excellent reasoning capabilities
- qwen2.5 (7B/14B/32B) - Strong multilingual support
Configuration
Environment Variables
# Set custom model directory
export OLLAMA_MODELS=/path/to/your/models
# Set custom host and port
export OLLAMA_HOST=0.0.0.0:11434
# Enable debug logging
export OLLAMA_DEBUG=1
GPU Configuration
NVIDIA GPU Support:
# Ollama automatically detects NVIDIA GPUs
# Ensure NVIDIA drivers and CUDA are installed
# Check GPU detection
ollama serve --help
AMD GPU Support (ROCm):
# Install ROCm drivers first
# Ollama will automatically use ROCm if available
Apple Silicon:
# Metal acceleration is automatic on Apple Silicon Macs
# No additional configuration needed
Memory Management
# Set context length (affects memory usage)
ollama run llama3.1 --context-length 4096
# Set number of GPU layers (hybrid CPU/GPU)
ollama run llama3.1 --num-gpu 20
Integration Examples
REST API Usage
# Start Ollama API server
ollama serve
# Generate text via API
curl http://localhost:11434/api/generate -d '{
"model": "llama3.1",
"prompt": "Why is the sky blue?",
"stream": false
}'
Python Integration
import requests
import json
def query_ollama(prompt, model="llama3.1"):
url = "http://localhost:11434/api/generate"
data = {
"model": model,
"prompt": prompt,
"stream": False
}
response = requests.post(url, json=data)
return response.json()["response"]
# Example usage
result = query_ollama("Write a Python function to calculate fibonacci numbers")
print(result)
VS Code Extension
# Install the Ollama VS Code extension
# Search for "Ollama" in VS Code Extensions marketplace
Troubleshooting
Common Issues
Issue: “ollama command not found”
# Solution: Add Ollama to PATH
export PATH=$PATH:/usr/local/bin/ollama
# Or reinstall Ollama
curl -fsSL https://ollama.com/install.sh | sh
Issue: “Connection refused” when using API
# Solution: Start Ollama service
ollama serve
# Check if service is running
ps aux | grep ollama
Issue: Model download fails
# Solution: Check internet connection and disk space
df -h # Check disk space
ping ollama.com # Check connectivity
# Try downloading again
ollama pull llama3.1
Issue: Out of memory errors
# Solution: Use smaller models or adjust memory settings
ollama run phi3 # Try smaller model
# Or set memory limits
ollama run llama3.1 --memory 8GB
Issue: Slow performance
# Solution: Check GPU detection
ollama run llama3.1 --verbose
# Ensure drivers are installed
nvidia-smi # For NVIDIA
rocm-smi # For AMD
GPU Troubleshooting
NVIDIA GPU not detected:
# Check NVIDIA drivers
nvidia-smi
# Install CUDA toolkit
# Visit: https://developer.nvidia.com/cuda-downloads
# Reinstall Ollama after CUDA installation
AMD GPU not detected:
# Install ROCm
# Ubuntu/Debian:
sudo apt update
sudo apt install rocm-dev
# Check ROCm installation
rocm-smi
Performance Optimization
# Monitor resource usage
htop
nvidia-smi -l 1 # For NVIDIA GPU monitoring
# Optimize for speed vs quality
ollama run llama3.1 --temperature 0.1 # More deterministic
ollama run llama3.1 --top-p 0.9 # Focused responses
Advanced Usage
Custom Models
# Create custom model from Modelfile
echo 'FROM llama3.1
PARAMETER temperature 0.1
SYSTEM "You are a helpful coding assistant."' > Modelfile
# Build custom model
ollama create mycodellama -f Modelfile
# Use custom model
ollama run mycodellama
Model Quantization
# Ollama automatically uses optimized quantizations
# Q4_0, Q4_1, Q5_0, Q5_1, Q8_0 variants available
# Check available variants
ollama list | grep llama3.1
Batch Processing
# Process multiple prompts
echo "Explain machine learning" | ollama run llama3.1
echo "Write a Python function" | ollama run codellama
Next Steps
Once you have Ollama installed and running:
- Explore Models: Try different models for various tasks
- Integration: Connect Ollama to your development workflow
- API Development: Build applications using Ollama’s REST API
- Performance Tuning: Optimize settings for your hardware
Related Sections
- Installing GPT4All - Alternative GUI-based option
- Model Selection Guide - Choosing the right model
- Hardware Requirements - Memory and GPU guidance
- Best Practices - Optimization tips
Last updated: July 20, 2025