All Writings
-
2026-05-13
Goodput or It Didn't HappenGPU utilization can be 78% while 30% of requests fail SLO constraints. The goodput frontier test replaces single-number benchmarks with decision-grade surfaces that measure...
-
2026-05-11
The Idiot Index for AI DeploymentThe gap between what a model is supposed to do and what it actually does in production is enormous. Forward-deployed engineering exists to close it. Here's why i left Amazon...
-
2026-05-06
What Your Workload Actually CostsNot all inference is the same. Per-workload LCPR exposes the cross-subsidy that blended averages hide, with cost models for conversational, agentic, RAG, extraction, voice, and...
-
2026-04-29
The LCPR CalculatorOpen-source calculator for loaded cost per result. Three worked examples, cache break-even analysis, and KV memory sizing.
-
2026-04-22
Trace Autopsy: Following One Day's Inference BillA repeatable diagnostic for going from raw trace events to loaded cost per accepted result. Twelve requests, four data sources, five cost mechanisms, and the reconciliation protocol.
-
2026-04-15
The Denominator ProblemThe most common mistake in inference economics is dividing by the wrong number. LCPR (loaded cost per accepted result) reveals a 12x gap between naive token cost and actual...
-
2026-04-08
The Honest Field Guide to Production InferenceTCO frameworks, vendor evaluation, and architecture patterns for teams adopting open-model inference. Includes the LCPR calculator, migration gates, and a staged playbook from...
-
2026-03-20
KV Cache Drift and the Interpretability Blind SpotKV cache research and interpretability research are measuring the same prefill->decode shift from different angles. This post argues for a shared consistency-model framing and...
-
2026-03-20
Continuous Batching: What It Changes in Activation DataContinuous batching is algorithmically activation-safe under per-token normalization and masked attention, but hardware nondeterminism, prefix caching, and data-methodology...
-
2026-03-20
How Backpressure Silently Biases Activation CaptureActivation-capture pipelines can look healthy in steady state while silently dropping the most informative samples under transient pressure; this piece maps the main failure...
-
2026-03-20
Why Prefill-Trained Interpretability Breaks in DecodePrefill-trained interpretability dictionaries are routinely deployed in decode-time regimes; this piece argues for a concrete measurement standard for prefill->decode drift...
-
2026-03-17
Persona Circuits: Progress & Findings (3-17-2026)Current-state synthesis of the persona-circuits project: robust steering and partial concentration support, but weaker-than-expected distinctness, necessity, and sufficiency...
-
2026-03-17
Persona Circuits: Exploring GLP ApplicationBranch report on using Generative Latent Priors (GLP) for activation repair in persona steering: public-checkpoint transfer failed in this setting, matched checkpoints were...
-
2026-03-11
Stop Putting Allah in a BoxOn trusting deeply in the waiting period and being intentional about reaching out to people.
-
2026-03-10
The Paradox of First Principles (And Why Nobody Wants to Hear That You're in Pursuit of Greatness)First-principles thinking matters, but reality and rapid experimentation are the only reliable validators for complex systems and ambitious work.
-
2026-03-03
Why I Create: Control, Craft, and Meaning in PublicWhy creating in public is less about attention and more about agency, proof of work, and meaning.
-
2026-02-24
Why Responsibility Feels Like a Rewardresponsibility is often its own reward, whether in crisis operations or self-funded interpretability research.
-
2026-02-23
Goals for Alignment, Not Attachmentgoals are useful for alignment. attachment to timeline and route is what creates stress, rigidity, and worse decisions.
-
2026-02-19
Taste Is Reproducible. Obsession Isn't.AI can match your aesthetic judgment, curate better than your friends, and predict trends before they happen. So what's the actual human edge?
-
2026-02-18
Advanced vLLM Deployment, Part 1: Hardware and Stack ChoicesGPU selection, inference framework choice, and the upstream decisions that determine what optimizations are even possible in production vLLM deployments.
-
2026-02-15
Desire Is a Contract to Be UnhappyNaval Ravikant says desire is a contract you make with yourself to be unhappy until you have the thing. Here's what that looks like in practice.
-
2026-02-13
Integration, Not OscillationSomething shifted recently. Not the usual swing between secular ambition and spiritual grounding. This time it feels like both are pointing at the same thing.
-
2026-02-13
For Its Own SakeOn writing, presence, and reclaiming the moment.
-
2026-02-12
Sampling Everything at the Frontier (And When That Stops)You don't know your research focus in advance. You try things and notice what pulls you back.
-
2026-02-12
"Models Aren't Creative" is a Skill IssueThe lack of creative output isn't a model problem. It's a skill issue. Yours, not the model's.
-
2026-02-11
Writing Culture is Agent InfrastructureThe orgs best positioned for AI agents aren't the ones with the best tooling. They're the ones with the strongest human writing cultures.
-
2026-02-10
Robotics Beyond Humanoids: Getting My Hands DirtyEveryone's watching humanoid robots. But if you think something is revolutionary, variants probably exist. The robotics community is vast. Here's what i'm learning as a...
-
2026-02-10
Today, Like Every Other DayRumi on moving within, fear, and finding beauty in the doing.
-
2026-02-09
What Three Jobs Taught Me About Agent OrchestrationOrg theory applies to agent swarms. I've lived three different org structures—here's how they map to multi-agent patterns.
-
2026-02-07
Latency-Bound vs Throughput-Bound: The Missing DimensionCAP theorem gives you a lens for distributed systems. There's a missing dimension for infrastructure decisions: whether your system is latency-bound or throughput-bound. Most...
-
2026-02-06
Managing Agents: The First Time It Actually WorkedThree weeks into a new role, agent orchestration finally clicked. Not just using AI tools—actually managing complexity with them.
-
2026-02-05
Ray in Production: What Dozens of GPUs and a Lot of 3am Pages Taught MeReal production failures from running Ray at scale: lost training runs, enterprise network disasters, and cascade outages the documentation never warns you about.
-
2026-02-04
Automated X/Twitter Feed Curation with Bird CLI + ClaudeI built an automated Twitter feed scraper that triages 100 tweets daily, analyzes community sentiment with Claude, and delivers a 5-minute digest to Slack.
-
2026-02-04
Slack vs Telegram: Catching Myself in a PatternI chose Telegram over Slack for my AI assistant setup, then realized I was making decisions based on old baggage instead of current evidence.
-
2026-02-03
Optimizing vLLM at Production Scale: Lessons from Conversational AI InfrastructureMemory fragmentation, throughput cliffs, and quantization accuracy issues that only show up in production—lessons from running vLLM at scale for conversational AI.
-
2026-02-02
Multi-Mode AI Assistants with Telegram ForumsHow I eliminated context bleeding in my AI assistant using openclaw's multi-agent features and Telegram forum topics.
-
2026-01-31
Lessons from Scaling Enterprise RAG: Data Residency, Multi-Tenancy, and Production ReliabilityPatterns and tradeoffs from building RAG infrastructure in regulated environments.