hi, i'm
Sohail Mohammad
Tech, AI infrastructure, and adjacent thinking. Notes from a working engineer.
Featured
-
2026-05-13
Goodput or It Didn't HappenGPU utilization can be 78% while 30% of requests fail SLO constraints. The goodput frontier test replaces single-number benchmarks with decision-grade surfaces that measure...
-
2026-05-06
What Your Workload Actually CostsNot all inference is the same. Per-workload LCPR exposes the cross-subsidy that blended averages hide, with cost models for conversational, agentic, RAG, extraction, voice, and...
-
2026-04-29
The LCPR CalculatorOpen-source calculator for loaded cost per result. Three worked examples, cache break-even analysis, and KV memory sizing.
-
2026-04-22
Trace Autopsy: Following One Day's Inference BillA repeatable diagnostic for going from raw trace events to loaded cost per accepted result. Twelve requests, four data sources, five cost mechanisms, and the reconciliation protocol.
-
2026-04-15
The Denominator ProblemThe most common mistake in inference economics is dividing by the wrong number. LCPR (loaded cost per accepted result) reveals a 12x gap between naive token cost and actual...
-
2026-04-08
The Honest Field Guide to Production InferenceTCO frameworks, vendor evaluation, and architecture patterns for teams adopting open-model inference. Includes the LCPR calculator, migration gates, and a staged playbook from...
-
2026-02-05
Ray in Production: What Dozens of GPUs and a Lot of 3am Pages Taught MeReal production failures from running Ray at scale: lost training runs, enterprise network disasters, and cascade outages the documentation never warns you about.
Latest Research
see all โ
-
2026-03-17
Persona Circuits: Progress & Findings (3-17-2026)Current-state synthesis of the persona-circuits project: robust steering and partial concentration support, but weaker-than-expected distinctness, necessity, and sufficiency...
-
2026-03-17
Persona Circuits: Exploring GLP ApplicationBranch report on using Generative Latent Priors (GLP) for activation repair in persona steering: public-checkpoint transfer failed in this setting, matched checkpoints were...
Latest Writings
see all โ
-
2026-05-13
Goodput or It Didn't HappenGPU utilization can be 78% while 30% of requests fail SLO constraints. The goodput frontier test replaces single-number benchmarks with decision-grade surfaces that measure...
-
2026-05-11
The Idiot Index for AI DeploymentThe gap between what a model is supposed to do and what it actually does in production is enormous. Forward-deployed engineering exists to close it. Here's why i left Amazon...
-
2026-05-06
What Your Workload Actually CostsNot all inference is the same. Per-workload LCPR exposes the cross-subsidy that blended averages hide, with cost models for conversational, agentic, RAG, extraction, voice, and...