DeCS | ICLR 2026

Abstract

Introduction

Visual summary of overthinking failure modes and efficiency-performance trade-offs.

Overview of Failure Modes and DeCS Gains

Open PDF

Method

From limitation analysis to DeCS: detector, decoupled reward, and curriculum scheduling.

1) NRP Detection

Detect the necessary reasoning prefix and separate useful reasoning from redundancy.

2) Decoupled Reward

Protect essential tokens and consistently penalize redundant tokens after NRP.

3) Curriculum Schedule

Adapt easy-prompt ratio to avoid suppressing exploratory behavior during training.

DeCS Training Pipeline

Open PDF

Experiments

Core effectiveness: module ablation and post-training reasoning behavior.

Ablation Study

Open PDF

Reasoning Behavior

Open PDF

Token vs PNRP

DECS converts efficiency gains into higher PNRP (proportion of NRP) scores.

PNRP vs Tokens (1.5B)

Open PDF

PNRP vs Tokens (7B)

Open PDF

Pass@K

DECS preserves exploration and maintains strong scaling under multiple attempts.

Pass@K (1.5B)

Open PDF

Pass@K (7B)

Open PDF

Token Budget Scaling

Scaling trends under different generation budgets across AIME2024, AIME2025, and AMC23.

AIME2024 Scaling (1.5B)

Open PDF

AIME2024 Scaling (7B)

Open PDF

AIME2025 Scaling (1.5B)

Open PDF

AIME2025 Scaling (7B)

Open PDF

AMC23 Scaling (1.5B)

Open PDF

AMC23 Scaling (7B)

Open PDF

Difficulty vs Efficiency

DECS consistently improves compression quality across different difficulty levels.

Difficulty Levels (1.5B)

Open PDF

Difficulty Levels (7B)

Open PDF

Case Studies

Qualitative examples across math, coding, and science reasoning settings.

MATH Case

Open PDF

LiveCodeBench Case

Open PDF

GPQA Case

Open PDF

Conclusion

DeCS shows that substantial overthinking reduction is achievable with careful token-level reward design and curriculum-aware data scheduling. The method consistently improves efficiency while preserving or improving reasoning quality.

Abstract

Introduction

Overview of Failure Modes and DeCS Gains

Method

1) NRP Detection

2) Decoupled Reward

3) Curriculum Schedule

DeCS Training Pipeline

Experiments

Ablation Study

Reasoning Behavior

Token vs PNRP

PNRP vs Tokens (1.5B)

PNRP vs Tokens (7B)

Pass@K

Pass@K (1.5B)

Pass@K (7B)

Token Budget Scaling

AIME2024 Scaling (1.5B)

AIME2024 Scaling (7B)

AIME2025 Scaling (1.5B)

AIME2025 Scaling (7B)

AMC23 Scaling (1.5B)

AMC23 Scaling (7B)

Difficulty vs Efficiency

Difficulty Levels (1.5B)

Difficulty Levels (7B)

Case Studies

MATH Case

LiveCodeBench Case

GPQA Case

Conclusion

BibTeX