Matthew Berman

AI content creator and YouTuber. Focuses on open-source model releases, local inference, and practical LLM benchmarking. Strong advocate for edge compute and hybrid AI workflows.

Channels

YouTube: Matthew Berman — open-source model reviews, local AI, practical LLM testing

Content in This Wiki

Google just dropped Gemma 4… (WOAH) — Overview of the Gemma 4 release: model family, benchmarks, capabilities, and relevance for local Claude Code workflows
You NEED to Try These Open-Source AI Projects — Four projects showing where agents are headed: GStack (YC methodology as Claude Code skills), Hermes Agent (self-improving loop), Superpowers (TDD plugin, 115k stars), Paperclip (multi-agent orchestration)
Every AI Model Explained in 20 Minutes — Introductory survey of frontier models, open-source models, coding agents, image/video/audio generation; seeds model entity pages in the wiki
I was hacked… (2026-04-03) — Berman challenges Pliny the Liberator (Time 100 AI red-teamer) to break into his hardened personal OpenClaw in 5 attempts. All five caught and quarantined. Anchors the new ai-personal-agent-hardening concept page: five named attack classes (tokenade, siege, format-override, faked-system-command, free-association exfil) and the two defensive rules Berman extracts (human-in-loop + frontier model as scanner).
I Built Something — Journey Kits launch (2026-04-04) — Berman launches Journey, the wiki’s first dedicated agent-workflow packaging format. Anchors the new entity journey-kits with a “borrowable concepts” breakdown.

Key Ideas

Hybrid workflow stance: use frontier models (Opus 4.6) for serious coding; use local open-source models for lightweight tasks
Open-source models are getting smaller, better, and faster — edge compute is increasingly viable for most tasks
Gemma 4 31B achieves near-frontier performance at a size most consumer hardware can run
Per-lab specialties: ChatGPT = ease of use; Claude = work and coding; Gemini = search and deep research; Grok = Twitter/X research
Open-source models are good enough for 95% of use cases — Chinese labs (DeepSeek, Qwen) have surpassed Meta’s Llama
Cursor is his personal favorite coding agent; the entire coding agent category has been most transformed by AI
The two hardening rules (extracted from the Pliny challenge): (1) human-in-loop is mandatory for any always-on personal agent; (2) use the best possible model as your frontier scanner — the first line of defense, not the model that does the actual work. “Unless you are putting your best possible model forward as the frontier scanner, it’s going to collapse. You are going to get infiltrated.”
Quarantine is a system, not a prompt: every Pliny attack ended in “got caught and quarantined” — not “got blocked at the LLM.” The architecture had a quarantine step separate from the agent’s main loop

AI For Dev

Explorer

matthew-berman

Matthew Berman

Channels

Content in This Wiki

Key Ideas

See Also

Graph View

Table of Contents

Backlinks