All Writings

2026-06-08

Inference Flexibility Starts in the API

A response to Mario Souto's clay jar essay: energy orchestration only works if inference systems expose workload class, latency tolerance, and value per accepted result.

Writings
2026-05-13

Goodput or It Didn't Happen

GPU utilization can be 78% while 30% of requests fail SLO constraints. The goodput frontier test replaces single-number benchmarks with decision-grade surfaces that measure...

Writings
2026-05-11

The Idiot Index for AI Deployment

The gap between what a model is supposed to do and what it actually does in production is enormous. Forward-deployed engineering exists to close it. Here's why i left Amazon...

Essays
2026-05-06

What Your Workload Actually Costs

Not all inference is the same. Per-workload LCPR exposes the cross-subsidy that blended averages hide, with cost models for conversational, agentic, RAG, extraction, voice, and...

Writings
2026-04-29

The LCPR Calculator

Open-source calculator for loaded cost per result. Three worked examples, cache break-even analysis, and KV memory sizing.

Writings
2026-04-22

Trace Autopsy: Following One Day's Inference Bill

A repeatable diagnostic for going from raw trace events to loaded cost per accepted result. Twelve requests, four data sources, five cost mechanisms, and the reconciliation protocol.

Writings
2026-04-15

The Denominator Problem

The most common mistake in inference economics is dividing by the wrong number. LCPR (loaded cost per accepted result) reveals a 12x gap between naive token cost and actual...

Writings
2026-04-08

The Honest Field Guide to Production Inference

TCO frameworks, vendor evaluation, and architecture patterns for teams adopting open-model inference. Includes the LCPR calculator, migration gates, and a staged playbook from...

Writings
2026-03-20

KV Cache Drift and the Interpretability Blind Spot

KV cache research and interpretability research are measuring the same prefill->decode shift from different angles. This post argues for a shared consistency-model framing and...

Case Studies
2026-03-20

Continuous Batching: What It Changes in Activation Data

Continuous batching is algorithmically activation-safe under per-token normalization and masked attention, but hardware nondeterminism, prefix caching, and data-methodology...

Case Studies
2026-03-20

How Backpressure Silently Biases Activation Capture

Activation-capture pipelines can look healthy in steady state while silently dropping the most informative samples under transient pressure; this piece maps the main failure...

Case Studies
2026-03-20

Why Prefill-Trained Interpretability Breaks in Decode

Prefill-trained interpretability dictionaries are routinely deployed in decode-time regimes; this piece argues for a concrete measurement standard for prefill->decode drift...

Case Studies
2026-03-17

Persona Circuits: Progress & Findings (3-17-2026)

Current-state synthesis of the persona-circuits project: robust steering and partial concentration support, but weaker-than-expected distinctness, necessity, and sufficiency...

Research
2026-03-17

Persona Circuits: Exploring GLP Application

Branch report on using Generative Latent Priors (GLP) for activation repair in persona steering: public-checkpoint transfer failed in this setting, matched checkpoints were...

Research
2026-03-11

Stop Putting Allah in a Box

On trusting deeply in the waiting period and being intentional about reaching out to people.

Essays
2026-03-10

The Paradox of First Principles (And Why Nobody Wants to Hear That You're in Pursuit of Greatness)

First-principles thinking matters, but reality and rapid experimentation are the only reliable validators for complex systems and ambitious work.

Essays
2026-03-03

Why I Create: Control, Craft, and Meaning in Public

Why creating in public is less about attention and more about agency, proof of work, and meaning.

Essays
2026-02-24

Why Responsibility Feels Like a Reward

responsibility is often its own reward, whether in crisis operations or self-funded interpretability research.

Thoughts
2026-02-23

Goals for Alignment, Not Attachment

goals are useful for alignment. attachment to timeline and route is what creates stress, rigidity, and worse decisions.

Thoughts
2026-02-19

Taste Is Reproducible. Obsession Isn't.

AI can match your aesthetic judgment, curate better than your friends, and predict trends before they happen. So what's the actual human edge?

Thoughts
2026-02-18

Advanced vLLM Deployment, Part 1: Hardware and Stack Choices

GPU selection, inference framework choice, and the upstream decisions that determine what optimizations are even possible in production vLLM deployments.

Case Studies
2026-02-15

Desire Is a Contract to Be Unhappy

Naval Ravikant says desire is a contract you make with yourself to be unhappy until you have the thing. Here's what that looks like in practice.

Thoughts
2026-02-13

Integration, Not Oscillation

Something shifted recently. Not the usual swing between secular ambition and spiritual grounding. This time it feels like both are pointing at the same thing.

Thoughts
2026-02-13

For Its Own Sake

On writing, presence, and reclaiming the moment.

Poems
2026-02-12

Sampling Everything at the Frontier (And When That Stops)

You don't know your research focus in advance. You try things and notice what pulls you back.

Thoughts
2026-02-12

"Models Aren't Creative" is a Skill Issue

The lack of creative output isn't a model problem. It's a skill issue. Yours, not the model's.

Thoughts
2026-02-11

Writing Culture is Agent Infrastructure

The orgs best positioned for AI agents aren't the ones with the best tooling. They're the ones with the strongest human writing cultures.

Thoughts
2026-02-10

Robotics Beyond Humanoids: Getting My Hands Dirty

Everyone's watching humanoid robots. But if you think something is revolutionary, variants probably exist. The robotics community is vast. Here's what i'm learning as a...

Thoughts
2026-02-10

Today, Like Every Other Day

Rumi on moving within, fear, and finding beauty in the doing.

Poems
2026-02-09

What Three Jobs Taught Me About Agent Orchestration

Org theory applies to agent swarms. I've lived three different org structures—here's how they map to multi-agent patterns.

Notes & Projects
2026-02-07

Latency-Bound vs Throughput-Bound: The Missing Dimension

CAP theorem gives you a lens for distributed systems. There's a missing dimension for infrastructure decisions: whether your system is latency-bound or throughput-bound. Most...

Thoughts
2026-02-06

Managing Agents: The First Time It Actually Worked

Three weeks into a new role, agent orchestration finally clicked. Not just using AI tools—actually managing complexity with them.

Thoughts
2026-02-05

Ray in Production: What Dozens of GPUs and a Lot of 3am Pages Taught Me

Real production failures from running Ray at scale: lost training runs, enterprise network disasters, and cascade outages the documentation never warns you about.

Case Studies
2026-02-04

Automated X/Twitter Feed Curation with Bird CLI + Claude

I built an automated Twitter feed scraper that triages 100 tweets daily, analyzes community sentiment with Claude, and delivers a 5-minute digest to Slack.

Notes & Projects
2026-02-04

Slack vs Telegram: Catching Myself in a Pattern

I chose Telegram over Slack for my AI assistant setup, then realized I was making decisions based on old baggage instead of current evidence.

Thoughts
2026-02-03

Optimizing vLLM at Production Scale: Lessons from Conversational AI Infrastructure

Memory fragmentation, throughput cliffs, and quantization accuracy issues that only show up in production—lessons from running vLLM at scale for conversational AI.

Case Studies
2026-02-02

Multi-Mode AI Assistants with Telegram Forums

How I eliminated context bleeding in my AI assistant using openclaw's multi-agent features and Telegram forum topics.

Notes & Projects
2026-01-31

Lessons from Scaling Enterprise RAG: Data Residency, Multi-Tenancy, and Production Reliability

Patterns and tradeoffs from building RAG infrastructure in regulated environments.

Case Studies