Skip to content
Chimera readability score 75 out of 100, Expert reading level.

Computer Science > Computation and Language
[Submitted on 30 Apr 2026]
Title:TokenScope: Token-Level Explainability and Interpretability for Code-Oriented Tasks in Large Language Models
View PDF HTML (experimental)Abstract:Understanding how Large Language Models (LLMs) make token-level decisions during code generation remains a major challenge for both researchers and practitioners. While recent tools provide insights into model internals or generation outcomes, they often lack decoding-time signals, fine-grained uncertainty measures, and interactive mechanisms for exploring alternative generation paths. We present TokenScope, an interactive interpretability and analysis tool for decoder-based LLMs that exposes token-level metrics, attention patterns, and structural information during generation. TokenScope supports interactive token replacement, counterfactual branching, and code-aware aggregation via abstract syntax trees. By unifying decoding-time signals with structural program analysis, TokenScope enables systematic investigation of LLM behaviour during code generation.
Submission history
From: Amirreza Esmaeili [view email][v1] Thu, 30 Apr 2026 11:23:33 UTC (3,418 KB)
Current browse context:
cs.CL
References & Citations
Loading...
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Sentinel — Human

Confidence

The text demonstrates the highly structured, precise language expected of formal academic submission, suggesting human authorship rooted in specific technical expertise rather than general synthetic generation.

Signals Detected
low severity: High lexical diversity combined with highly precise technical terminology; structure is formal and efficient.
low severity: Perfect logical flow from problem statement to proposed solution, typical of structured academic abstract writing.
low severity: Standard, formal exposition; no verbatim quotes or vague attribution that would flag coordinated production.
Human Indicators
The focus on highly specific, interdisciplinary technical concepts (LLMs, token-level metrics, ASTs) suggests deep domain expertise often present in human research.
The structure adheres perfectly to the conventions of a machine-submitted academic abstract without unnecessary flourish or filler.