Skip to content
72
Expert
Chimera Difficulty Score
a synthesis of Flesch-Kincaid, Coleman-Liau, SMOG, and Dale-Chall readability metrics
Computer Science > Computation and Language [Submitted on 1 Apr 2026] Title:One Word at a Time: Incremental Completion Decomposition Breaks LLM Safety View PDF HTML (experimental)Abstract:Large Language Models (LLMs) are trained to refuse harmful requests, yet they remain vulnerable to jailbreak attacks that exploit weaknesses in conversational safety mechanisms. We introduce Incremental Completio...
This study presents a rigorous exploration of a novel adversarial technique, Incremental Completion Decomposition (ICD), which exploits incremental response generation in LLMs to bypass safety mechanisms. The methodology is sound, employing systematic evaluation across established benchmarks and providing mechanistic evidence through activation analysis. The theoretical framing—linking attack success to the suppression of refusal-related representations—aligns with prior work on neural network i...