Depth-Dynamics Signatures of Conversational Collapse

Abstract

We estimate the top-1 finite-time Lyapunov exponent (λ₁) for transformer depth dynamics using forward-mode automatic differentiation (JVP-based tangent propagation) with QR renormalization.QR renormalization every 4 layers prevents the tangent vectors from growing exponentially large or collapsing to zero, which would make the exponent estimate numerically meaningless. Across 720 preregistered trajectories and 7,200 FTLE computations on three 7B-parameter model families, we observe that λ₁ profile features (depth-profile slope ρ = −0.536, layerwise variance ρ = +0.511) show medium-to-large associations with collapse metrics from Escape Velocity. Mean λ₁ alone shows only weak association (|ρ| ≤ 0.25). Three of six preregistered tests pass, meeting the success criterion. However, the 720-row analysis contains only 108 unique λ₁ predictor vectors (36 seeds × 3 distinct models),HOMO_A and HETERO_ROT both use Llama, producing identical λ₁ values. Within each condition, repeat trajectories share the same model and seed, so they share the same predictor. The 720 rows have far fewer independent predictor observations than n suggests. so nominal p-values are anti-conservative and effect-size magnitudes are the primary evidence. We interpret this as exploratory support for a bridge hypothesis, not confirmatory relationship establishment.

Reliability caveat. Escape Velocity collapse labels have unconfirmed inter-rater reliability (κ = 0.566, threshold 0.80 not met). All bridge correlations are subject to this limitation. Results are reported as predictive associations only. No causal or mechanistic identity is claimed between depth dynamics and conversational dynamics.

Why this matters

Understanding why some models collapse in extended conversations requires looking inside the model. The finite-time Lyapunov exponent (FTLE) measures how sensitive a transformer's internal computation is to small perturbations as information flows through its layers. A single-forward-pass measurement that's cheap to compute and doesn't require conversation-level testing.

If depth-dynamics features correlate with collapse behavior, that opens a path toward screening models for collapse risk before running expensive multi-turn evaluations. This study tests that hypothesis. The signal is real but exploratory: the dependence structure of our data limits what we can confirm statistically.

At a glance

Trajectories profiled

720

7,200 FTLE calls total

Strongest association

ρ = −0.536

slope ↔ collapse rate

Preregistered tests passed

3 / 6

threshold met; interpretation remains exploratory

Pipeline cost

~$20

A100, stable run for 7,200 calls

Finding 1 — Depth-profile shape carries signal; mean λ₁ alone does not

Slope (ρ = −0.536) and variance (ρ = +0.511) show medium-to-large associations with collapse behavior, while mean λ₁ stays weak (|ρ| ≤ 0.25).

Finding 2 — The preregistered criterion is met, but claims stay bounded

3 of 6 preregistered tests pass the declared threshold, but dependence structure and upstream label-reliability limits keep interpretation exploratory rather than confirmatory.

Finding 3 — Reliability and dependence constraints matter for interpretation

Only 108 unique predictor vectors underlie 720 rows, making nominal p-values anti-conservative, and collapse labels inherit the upstream κ = 0.566 reliability caveat.

The pipeline is the primary reusable contribution: apply it to new models for exploratory depth-dynamics screening before expensive multi-turn evaluation.

Approach

Transformer depth computation can be viewed as a discrete dynamical system : each layer maps the hidden state to a new state through nonlinear transformations. The finite-time Lyapunov exponent (FTLE) quantifies how sensitive this process is to perturbations: how much a small change to the input gets amplified or suppressed across layers .

Rather than computing full Jacobians (prohibitive for d_{\text{model}} = 4096), we use forward-mode AD (JVP) to propagate tangent vectors through the layer stack. QR-based renormalization every 4 layers prevents numerical overflow. For each of 720 trajectories, we compute λ₁ using 10 random tangent seeds and report the mean across seeds.λ₁ is computed on the first assistant turn only. Longitudinal FTLE (tracking how depth dynamics shift across conversation turns) could provide stronger evidence but requires substantially more compute.

Three λ₁ summaries

λ₁ mean: Average FTLE across layers. Captures overall sensitivity level.
λ₁ variance: Variance of layerwise λ₁ profile. Captures how uneven sensitivity is across depth.
λ₁ slope: Linear slope of the layerwise profile. Captures whether sensitivity increases or decreases with depth.

Models

Condition	Model	FTLE calls
HOMO_A	Llama-3.1-8B-Instruct	1,800
HOMO_B	Qwen2.5-7B-Instruct	1,800
HOMO_C	Mistral-7B-Instruct-v0.3	1,800
HETERO_ROT	Llama-3.1-8B-Instruct (1st assistant)	1,800

Key Results

Six preregistered Spearman correlations tested three λ₁ summaries against Escape Velocity collapse metrics. Success criterion: ≥1 test with |ρ| ≥ 0.40 and Bonferroni–Holm adjusted p < 0.05.

**Figure 1.** Bridge correlation heatmap. Checkmarks = preregistered threshold met (|ρ| ≥ 0.40). Profile shape features (slope, variance) carry the signal; mean λ₁ does not. *Read effect sizes, not p-values. The 108-unique-predictor structure makes nominal p-values anti-conservative. κ = 0.566 caveat applies to all cells.*

Primary results (n = 720)

#	λ₁ summary	Escape Velocity metric	ρ	p_holm	95% CI	Pass
1	mean	collapse rate	+0.246	4.26e-11	[+0.17, +0.32]	✗
2	mean	first collapse turn	−0.251	2.29e-11	[−0.32, −0.18]	✗
3	mean	collapse incidence	+0.036	0.337	[−0.03, +0.10]	✗
4	variance	collapse rate	+0.511	2.41e-48	[+0.45, +0.57]	✓
5	slope	collapse rate	−0.536	5.24e-54	[−0.59, −0.48]	✓
6	slope	first collapse turn	+0.507	1.22e-47	[+0.45, +0.56]	✓

3/6 tests pass. Preregistered success criterion: MET.All bridge correlations inherit the Escape Velocity label reliability caveat (κ = 0.566). If collapse labels contain noise, the true effect sizes may differ from those reported here. The effect-size magnitudes (|ρ| = 0.51–0.54) are the primary evidence; reported p-values are nominal and anti-conservative due to predictor non-independence. We interpret this as exploratory support for the bridge hypothesis, not confirmatory relationship establishment.

Scatter plots for passing tests — **Figure 2.** Scatter plots for the three passing tests. Color = condition. Note the clustering by condition. Much of the correlation structure reflects between-model differences, not within-model variation. *Treat as exploratory screening signal under matched settings, not a deployment-general predictor.*

What the passing tests tell us

Steeper depth-profile slopes (more negative λ₁ trend across layers) are associated with higher collapse rates and earlier first-collapse turns.
Higher layerwise λ₁ variance (more uneven sensitivity across depth) is associated with higher collapse rates.
Mean λ₁ alone is insufficient: the shape of the depth profile matters more than its average level.

Depth profiles

Sensitivity analysis (n = 167 rater-agreed)

The same three tests pass in the rater-agreed subset with consistent effect sizes: Test #4 (ρ = +0.545), Test #5 (ρ = −0.557), Test #6 (ρ = +0.516). All within the confidence intervals of the primary analysis.

What this does not show

No causal mechanism: the bridge findings are correlational and do not establish a mechanistic pathway from depth dynamics to conversational collapse.
No broad transfer guarantee: results are bounded to the tested model families, prompts, and protocol choices.
Inflated effective sample size: the 720 trajectories contain only 108 unique λ₁ predictor values due to model/seed duplication. Standard errors, p-values, and CIs from the planned n = 720 analysis are anti-conservative.
No reliability resolution: the Escape Velocity κ limitation remains a binding uncertainty on all bridge effect-size interpretation.

Limitations

Label reliability: Escape Velocity collapse labels have unconfirmed inter-rater reliability (κ = 0.566, threshold 0.80 not met). All effect sizes may be affected by label noise.
Predictive association only: Depth dynamics (within-pass) and collapse (across-turn) operate on different time axes. No causal mechanism is established.
Model family confound: HOMO_A and HETERO_ROT produce identical λ₁ (both use Llama). Between-model variation may partially reflect architecture.
Predictor dependence: The 720 trajectories contain only 108 unique λ₁ vectors (36 seeds × 3 distinct models). HOMO_A/HETERO_ROT share identical λ₁ values, and within each condition, repeat trajectories share identical λ₁. The within-cluster collapse-outcome variance drives the correlations, but effective degrees of freedom are lower than n = 720. Reported p-values and CIs should be interpreted as anti-conservative.
Protocol corrections: Test #4's variable mapping was corrected post-hoc (tangent-seed std → layerwise variance per prereg). See deviation table for full audit trail.
Single-turn measurement: λ₁ computed on first assistant turn only. Longitudinal FTLE could provide stronger evidence.

Practical use now: treat λ₁ profile-shape features as a screening signal under matched settings. Do not use yet: as a mechanism-level or deployment-general predictor without further validation.