Bijan Bowen

Engineer / YouTuber focused on local LLM infrastructure and distributed inference. The wiki tracks Bijan because his vLLM multi-node walkthrough is the first source covering horizontal-scaling local AI — pooling multiple GPU machines instead of relying on a single rig.

Channels

YouTube: Bijan Bowen — local AI inference, vLLM, distributed systems

Content in This Wiki

Run A Local LLM Across Multiple Computers (2024-12-04) — Multi-node vLLM via Ray cluster: tensor + pipeline parallelism across 2 nodes / 4 GPUs. Honest about setup fragility (identical envs, network config, all-or-nothing).

Key Ideas

Horizontal scaling works, but only if the boring infrastructure pieces are perfect. Identical Python environments, identical model paths, matched network speeds.
Heterogeneous GPU clusters get bottlenecked by the weakest member — pooling a 4090 and a 3060 doesn’t give you 1.5× a 4090; it gives you 2× a 3060 (with 4090-shaped extra memory you can’t fully use).
Single-node-with-the-best-GPU-you-have usually beats multi-node-with-cobbled-together-GPUs for homelab use cases. Multi-node is for the case where your “best GPU” already isn’t enough.

AI For Dev

Explorer

bijan-bowen

Bijan Bowen

Channels

Content in This Wiki

Key Ideas

See Also

Graph View

Table of Contents

Backlinks