Matthew Berman
AI content creator and YouTuber. Focuses on open-source model releases, local inference, and practical LLM benchmarking. Strong advocate for edge compute and hybrid AI workflows.
Channels
- YouTube: Matthew Berman — open-source model reviews, local AI, practical LLM testing
Content in This Wiki
- Google just dropped Gemma 4… (WOAH) — Overview of the Gemma 4 release: model family, benchmarks, capabilities, and relevance for local Claude Code workflows
- You NEED to Try These Open-Source AI Projects — Four projects showing where agents are headed: GStack (YC methodology as Claude Code skills), Hermes Agent (self-improving loop), Superpowers (TDD plugin, 115k stars), Paperclip (multi-agent orchestration)
- Every AI Model Explained in 20 Minutes — Introductory survey of frontier models, open-source models, coding agents, image/video/audio generation; seeds model entity pages in the wiki
- I was hacked… (2026-04-03) — Berman challenges Pliny the Liberator (Time 100 AI red-teamer) to break into his hardened personal OpenClaw in 5 attempts. All five caught and quarantined. Anchors the new ai-personal-agent-hardening concept page: five named attack classes (tokenade, siege, format-override, faked-system-command, free-association exfil) and the two defensive rules Berman extracts (human-in-loop + frontier model as scanner).
- I Built Something — Journey Kits launch (2026-04-04) — Berman launches Journey, the wiki’s first dedicated agent-workflow packaging format. Anchors the new entity journey-kits with a “borrowable concepts” breakdown.
Key Ideas
- Hybrid workflow stance: use frontier models (Opus 4.6) for serious coding; use local open-source models for lightweight tasks
- Open-source models are getting smaller, better, and faster — edge compute is increasingly viable for most tasks
- Gemma 4 31B achieves near-frontier performance at a size most consumer hardware can run
- Per-lab specialties: ChatGPT = ease of use; Claude = work and coding; Gemini = search and deep research; Grok = Twitter/X research
- Open-source models are good enough for 95% of use cases — Chinese labs (DeepSeek, Qwen) have surpassed Meta’s Llama
- Cursor is his personal favorite coding agent; the entire coding agent category has been most transformed by AI
- The two hardening rules (extracted from the Pliny challenge): (1) human-in-loop is mandatory for any always-on personal agent; (2) use the best possible model as your frontier scanner — the first line of defense, not the model that does the actual work. “Unless you are putting your best possible model forward as the frontier scanner, it’s going to collapse. You are going to get infiltrated.”
- Quarantine is a system, not a prompt: every Pliny attack ended in “got caught and quarantined” — not “got blocked at the LLM.” The architecture had a quarantine step separate from the agent’s main loop