Research

Papers

Depth-Dynamics Signatures of Conversational Collapse

Finite-Time Lyapunov Analysis of Transformer Forward Passes

Sohail Mohammad · Preprint, 2026

This asks whether we can spot early warning signs of conversational breakdown by looking inside model layer dynamics. If these signals hold up, they could help us diagnose unstable model behavior before it shows up in user-facing conversations.

Paper (PDF) · Code (GitHub)

Condition-Dependent Collapse Dynamics in Multi-Turn LLM Self-Play

Baseline collapse dynamics with transparent reliability limits

Sohail Mohammad · Preprint, 2026

This baseline study asks a simple question: when LLMs talk over many turns, which setups stay coherent and which ones collapse into repetition? The goal is to map the failure landscape clearly so future research can build better conversation stability tests and safeguards.

Path B disclosure: Detector reliability prereg gate was not met; no detector-validation claim is made.

Paper (PDF) · Code (GitHub)

Inverse Scaling in Activation Steering

Architecture and Scale Dependence of Refusal Manipulation

Sohail Mohammad · Preprint, 2026

This tests how reliably we can nudge model refusal behavior using activation steering across different model sizes and families. The purpose is to understand where steering is practical versus brittle, so safety and control methods are used with realistic expectations.

Paper (PDF) · Code (GitHub)

Experiments

Teaching an LLM to Trade Prediction Markets

Chain-of-Thought Reasoning Solves Action Collapse in Low-Cardinality RL

Sohail Mohammad · February 2025

This experiment explores why RL agents in trading settings often overfit to one repetitive action even when returns look fine. It shows that adding reasoning steps can preserve better decision diversity, which matters for robustness in real sequential decision tasks.

Write-up · Code (GitHub)

Pilots

Pilot study: Distributional bias shifts across preference-tuning stages

Dataset-scoped pre-registered pilot with bounded empirical claims

Sohail Mohammad · Draft, 2026

This pilot examines how model bias signals shift from base training to instruction tuning and preference tuning. The aim is to separate real behavior changes from measurement artifacts so conclusions about alignment effects are more trustworthy.

Pilot (Draft) · Code (GitHub)

Failures

This section is for dead ends, null results, and failed hypotheses that still teach something important. Publishing failures makes the research process more honest and helps others avoid repeating the same mistakes.

B6 Failure Case: Reliable Decisions, Blocked Internal Explanation

Decision-valid behavioral pipeline achieved; mechanism-level path blocked

Sohail Mohammad · February 2026

In plain terms: we succeeded in making behavior-level decisions reliable, but failed the internal reconstruction gate needed for mechanism-level claims. A bounded remediation path (Option 2) then terminated via K2 when required candidate coverage did not exist (A=0/4, B=0/4).

Failure write-up

Papers

Depth-Dynamics Signatures of Conversational Collapse

Condition-Dependent Collapse Dynamics in Multi-Turn LLM Self-Play

Inverse Scaling in Activation Steering

Experiments

Teaching an LLM to Trade Prediction Markets

Pilots

Pilot study: Distributional bias shifts across preference-tuning stages

Failures

B6 Failure Case: Reliable Decisions, Blocked Internal Explanation

📬 Get new articles in your inbox