Papers
Depth-Dynamics Signatures of Conversational Collapse
Finite-Time Lyapunov Analysis of Transformer Forward Passes
Sohail Mohammad · Preprint, 2026
This asks whether we can spot early warning signs of conversational breakdown by looking inside model layer dynamics. If these signals hold up, they could help us diagnose unstable model behavior before it shows up in user-facing conversations.
Condition-Dependent Collapse Dynamics in Multi-Turn LLM Self-Play
Baseline collapse dynamics with transparent reliability limits
Sohail Mohammad · Preprint, 2026
This baseline study asks a simple question: when LLMs talk over many turns, which setups stay coherent and which ones collapse into repetition? The goal is to map the failure landscape clearly so future research can build better conversation stability tests and safeguards.
Path B disclosure: Detector reliability prereg gate was not met; no detector-validation claim is made.
Inverse Scaling in Activation Steering
Architecture and Scale Dependence of Refusal Manipulation
Sohail Mohammad · Preprint, 2026
This tests how reliably we can nudge model refusal behavior using activation steering across different model sizes and families. The purpose is to understand where steering is practical versus brittle, so safety and control methods are used with realistic expectations.
Experiments
Teaching an LLM to Trade Prediction Markets
Chain-of-Thought Reasoning Solves Action Collapse in Low-Cardinality RL
Sohail Mohammad · February 2025
This experiment explores why RL agents in trading settings often overfit to one repetitive action even when returns look fine. It shows that adding reasoning steps can preserve better decision diversity, which matters for robustness in real sequential decision tasks.
Pilots
Pilot study: Distributional bias shifts across preference-tuning stages
Dataset-scoped pre-registered pilot with bounded empirical claims
Sohail Mohammad · Draft, 2026
This pilot examines how model bias signals shift from base training to instruction tuning and preference tuning. The aim is to separate real behavior changes from measurement artifacts so conclusions about alignment effects are more trustworthy.
Failures
This section is for dead ends, null results, and failed hypotheses that still teach something important. Publishing failures makes the research process more honest and helps others avoid repeating the same mistakes.
B6 Failure Case: Reliable Decisions, Blocked Internal Explanation
Decision-valid behavioral pipeline achieved; mechanism-level path blocked
Sohail Mohammad · February 2026
In plain terms: we succeeded in making behavior-level decisions reliable, but failed the internal reconstruction gate needed for mechanism-level claims. A bounded remediation path (Option 2) then terminated via K2 when required candidate coverage did not exist (A=0/4, B=0/4).