A curated index of publications, experiments, and negative results.
Recent Publications and Reports¶
Persona Circuits: Progress & Findings (3-17-2026)¶
Steering, concentration, and selectivity analysis
Current-state synthesis of persona-circuits: robust steering and partial concentration support, but mixed-to-negative evidence for stronger distinctness, necessity, and sufficiency claims under current protocols.
Persona Circuits: Exploring GLP Application¶
GLP transfer for activation geometry repair
Branch report testing whether GLP can preserve steering semantics while repairing activation geometry. In this setting, public-checkpoint transfer failed, matched checkpoints were more stable but still nonselective, and mixed clean+edited training is the key pending test.
Latent Depth-Routing Spectroscopy in Standard Transformers¶
Oracle evidence for input-dependent effective depth mixtures in frozen Gemma-2-2b
Sohail Mohammad · Preprint, 2026
Oracle-alpha recovers input-dependent effective depth mixtures from frozen Gemma-2-2b without modifying model weights. On 1024 stratified prompts, oracle routing improves next-token loss by +1.63 nats (1024/1024 positive), with held-out predicted routing generalizing positively (R^2 = 0.24). Softmax-constrained routing outperforms unconstrained and top-k alternatives via a competition mechanism. Extension lanes (tool-breakage, safety, OIH) provide bounded evidence with explicit caveats.
Distill page · Paper (PDF) · Code (GitHub)
Depth-Dynamics Signatures of Conversational Collapse¶
Finite-Time Lyapunov Analysis of Transformer Forward Passes
Sohail Mohammad · Preprint, 2026
This asks whether we can detect early warning signals of conversational collapse from internal depth dynamics.
Condition-Dependent Collapse Dynamics in Multi-Turn LLM Self-Play¶
Baseline collapse dynamics with transparent reliability limits
Sohail Mohammad · Preprint, 2026
This baseline maps which interaction setups remain stable versus collapse in multi-turn model conversations.
Path B disclosure: Detector reliability prereg gate was not met; no detector-validation claim is made.
Inverse Scaling in Activation Steering¶
Architecture and Scale Dependence of Refusal Manipulation
Sohail Mohammad · Preprint, 2026
This evaluates when activation steering remains reliable across scale and architecture changes.
Pilots¶
Pilot study: Distributional bias shifts across preference-tuning stages¶
Dataset-scoped pre-registered pilot with bounded empirical claims
Sohail Mohammad · Draft, 2026
This pilot examines how behavior shifts across base, SFT, and preference-tuning stages while controlling for measurement artifacts.
Experiments¶
Teaching an LLM to Trade Prediction Markets¶
Chain-of-Thought Reasoning Solves Action Collapse in Low-Cardinality RL
Sohail Mohammad · February 2025
This experiment shows how reasoning steps can preserve action diversity and reduce policy-collapse behavior in sequential decision settings.
Negative Results¶
Publishing dead ends and blocked paths is part of the research process here.
B6 Failure Case: Reliable Decisions, Blocked Internal Explanation¶
Decision-valid behavioral pipeline achieved; mechanism-level path blocked
Sohail Mohammad · February 2026
Behavior-level decisions were made reliable, but mechanism-level reconstruction gates did not pass. A bounded remediation path then terminated via K2 due to missing required candidate coverage.