Bijan Bowen

Engineer / YouTuber focused on local LLM infrastructure and distributed inference. The wiki tracks Bijan because his vLLM multi-node walkthrough is the first source covering horizontal-scaling local AI — pooling multiple GPU machines instead of relying on a single rig.

Channels

  • YouTube: Bijan Bowen — local AI inference, vLLM, distributed systems

Content in This Wiki

  • Run A Local LLM Across Multiple Computers (2024-12-04) — Multi-node vLLM via Ray cluster: tensor + pipeline parallelism across 2 nodes / 4 GPUs. Honest about setup fragility (identical envs, network config, all-or-nothing).

Key Ideas

  • Horizontal scaling works, but only if the boring infrastructure pieces are perfect. Identical Python environments, identical model paths, matched network speeds.
  • Heterogeneous GPU clusters get bottlenecked by the weakest member — pooling a 4090 and a 3060 doesn’t give you 1.5× a 4090; it gives you 2× a 3060 (with 4090-shaped extra memory you can’t fully use).
  • Single-node-with-the-best-GPU-you-have usually beats multi-node-with-cobbled-together-GPUs for homelab use cases. Multi-node is for the case where your “best GPU” already isn’t enough.

See Also