Rendered at 16:52:30 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
irthomasthomas 1 hours ago [-]
I won't use or recommend models with hidden reasoning, (thats all American models). It's too much of a risk and makes prompt optimization harder. Risky because it makes it possible for an attacker to prompt inject the reasoning chain to carry out a secret objective, and to hide that from the summary and output.
Interleaved reasoning and function calling makes this even more dangerous. A model can call functions during the hidden reasoning phase. An attacker could then exfiltrate data from you while the reasoning summary hides it from the user.
It also makes it impossible to know if the model is doomplooping during reasoning and burning tokens for no reason, as gemini is want to do, which we know about because its hidden reasoning often leaks out when it doomloops.
When the models are AGI and secure from prompt injection I may stop caring, until then I want to know exactly what the model responds to my prompts. or exactly what the agent is doing on my behalf.
I've thought about the high-jacking of reasoning-chains as a potential vector, but never saw a proven implementation in american models since, from my understanding, all major vendors throw out the reasoning tokens between turns.
btown 40 minutes ago [-]
For Claude, at least, "throw out the reasoning tokens" is only true when a session has been idle for more than an hour, and is new since March.
The basic concept is that for a session active recently, interleaved thinking tokens are already in KV cache, so it's more efficient to keep using them than not! But when resuming an older session where KV cache has been evicted, it's more expensive to restore the thinking tokens, so they're silently dropped from prior turns. It's 2026 and stateful servers are back on the menu!
> The design should have been simple: if a session has been idle for more than an hour, we could reduce users’ cost of resuming that session by clearing old thinking sections. Since the request would be a cache miss anyway, we could prune unnecessary messages from the request to reduce the number of uncached tokens sent to the API. We’d then resume sending full reasoning history. To do this we used the clear_thinking_20251015 API header along with keep:1.
> The implementation had a bug. Instead of clearing thinking history once, it cleared it on every turn for the rest of the session... This surfaced as the forgetfulness, repetition, and odd tool choices people reported.
> Eliding parts of the context after idle: old tool results, old messages, thinking. Of these, thinking performed the best, and when we shipped it, that's when we unintentionally introduced the bug in the blog post.
I've experimented with rules to have Claude Code be explicit about recapping its thinking tokens, including tool choices and approaches chosen and rejected, into actual message output, but this is lossy at best. And sometimes dropping reasoning tokens can give a session "fresh eyes" in a good way.
I just really don't like the lack of control, and it's a reminder of how ephemeral the current landscape is. The Claude giveth, and the Claude taketh away.
chacham15 4 minutes ago [-]
I think you're confusing two different axes. There is a difference between the cache state and the context state.
Imagine a conversation with turns X, Y, and Z. When the LLM "reasons" about the next token A it does: P(A | X,Y,Z) and then P(B | X,Y,Z,A), etc. It will eventually produce a result P(D | X,Y,Z,A,B,C). Instead of continuing the context from X,Y,Z,A,B,C it continues it from X,Y,Z so you have P(N | X,Y,Z,D). This is what is meant by dropping the reasoning. This is done to save cache context for the session.
This is a different thing than preserving the K/V state of P(N | X,Y,Z,D).
Roritharr 37 minutes ago [-]
Thank you! This is much more nuanced than my understanding so far!
JamesSwift 54 minutes ago [-]
> all major vendors throw out the reasoning tokens between turns
That would be surprising to me. The reasoning _is_ the model intelligence in a lot of respects, and so dropping those from the context would affect its output pretty significantly.
I assume that instead they just have a lot of guardrails in place and multiple runtime environments that an individual turns ping-pong between in order to dehydrate/rehydrate the reasoning to keep it hidden from the end user.
"Stripping extended thinking: Extended thinking blocks (shown in dark gray) are generated during each turn's output phase, but are not carried forward as input tokens for subsequent turns. You do not need to strip the thinking blocks yourself. The Claude API automatically does this for you if you pass them back."
It's more nuanced in the various modes, but i haven't seen it boil down towards Thinking Tokens surviving more than two turns.
Gemini models return a thinking signature that you, I think, must send back when invoking further, so they seem to keep them?
1 hours ago [-]
kapperchino 49 minutes ago [-]
This agent I made can’t execute on the shell, can only edit the files within the project. Only works with rust atm though. https://github.com/Kapperchino/agent-joe
furyofantares 2 hours ago [-]
> It isn’t the actual thinking that drove the model’s actions in a session- but a summary of the thinking logic. This is like using saving a jpeg as a .bmp and then editing the .bmp and presenting it as a .jpeg. The conversion produces data loss.
You've got that backwards, .bmp is a lossless format and .jpeg is the lossy one.
0o_MrPatrick_o0 2 hours ago [-]
My bad! 10 points for House Slytherin!
altmanaltman 2 hours ago [-]
also a typo in the last sentence you're vrs your
glaslong 2 hours ago [-]
Weirdly pleasant, if minor, signal of human authorship
Tomte 7 minutes ago [-]
In a parallel universe LLMs have learned that (a) the training material contains many different orthographic errors and (b) that humans follow a non-obvious pattern when "deciding" which error to make, so that their generated output contains such errors, as well.
In our universe LLMs seem to have learned that those errors do not follow patterns in the aggregate and that they should not be emulated.
genxy 20 minutes ago [-]
Not for long!
altmanaltman 30 minutes ago [-]
Yeah, definitely it's a nice thing in today's context, weirdly. But also, you shouldn't really be making typos if you're writing an article and are using a basic spellcheck.
The text is clearly human-written just because it doesn't smell like AI (in this case, even if it was written by AI and produced this particular output, that's okay imo). I deal a lot with AI writing and writing in general, as I worked as an editor in another life so it's natural to me to see writing and form an objective opinion on it.
0o_MrPatrick_o0 2 hours ago [-]
I missed my coffee! Ty! Five points to Slytherin.
altmanaltman 7 minutes ago [-]
wait till my father hears about this!
StizzurpXDD 2 hours ago [-]
This is not just Anthropic. Almost all big AI companies, including OpenAI and Google, hide their model's actual reasoning. This is because revealing the raw reasoning exposes exactly how the AI processes information.
These companies spend in huge amounts on R&D to develop a thinking process that is superior to their competition. Exposing those thinking mechanics to competitors would completely defeat the purpose of their spending. They simply won't do it. It's like you telling your exact location to someone who is trying to hunt you down.
_aavaa_ 1 hours ago [-]
Or like providing the world’s information in machine readable format that the AI companies can convert into model weights without getting permission or compensating the rights holders
palmotea 1 hours ago [-]
> This is because revealing the raw reasoning exposes exactly how the AI processes information. These companies spend in huge amounts on R&D to develop a thinking process that is superior to their competition. Exposing those thinking mechanics to competitors would completely defeat the purpose of their spending. They simply won't do it. It's like you telling your exact location to someone who is trying to hunt you down.
I thought the reason was the "reasoning" didn't work very well with "aligned" model output, so they had to remove the alignment during reasoning and then hide it to avoid exposing "unaligned" model output.
transcriptase 10 minutes ago [-]
Not sure if anyone remembers the brief 12ish hour period when the very first “reasoning” ChatGPT model went public, but it provided credible evidence for this.
Before the massive nerf (showing summaries and suppressing certain aspects of reasoning) you would literally see reasoning text appearing on your screen like “while xyz is true, these facts may be seen as supporting hateful rhetoric or a conspiracy theory which is against my policy guidelines. i should tell the user xyz is not true or steer the conversation in a different direction. according to my instructions misleading the user is permitted in certain contexts where sensitive information is being discussed or could cause liability”
They disabled it shortly after the first screenshots appeared online, and restored it the next day in a way that hid what was actually happening.
robotresearcher 1 hours ago [-]
I suspect that you’re both right in the sense that ‘aligned’ is an important component of ‘superior’ from the vendors’ viewpoint.
vorticalbox 20 minutes ago [-]
There are actually fine tunes of qwen on opus “thinking” tokens that teach it to think like opus does.
When you export your personal data Google hides all model responses leaving just user messages. So it's even worse
duskwuff 2 hours ago [-]
More to the point - if they expose their model's "thinking" inference, competitors can train on that to replicate the results. If they postprocess that content, e.g. by summarizing it, it's no longer as useful to competitors.
StizzurpXDD 1 hours ago [-]
Exactly. Google won't like it if they spend millions to make Gemini 3.5 Pro's thinking the best in the world, only for Anthropic or OpenAI to copy it by just seeing the thinking process.
bee_rider 1 hours ago [-]
Mistral displays some “thinking” text (in their basic online chat interface) in the thinking mode, do we know if those are the real tokens?
It’s quite interesting to read. I can’t imagine using a model like this without the ability to peek inside and see if it is getting stuck.
transcriptase 6 minutes ago [-]
I wonder if they put all 80k tokens of the GDPR in its system prompt.
Sharlin 1 hours ago [-]
The cynic in me is wondering whether it's more about how revealing how the sausage is made might bring bad publicity.
kube-system 11 minutes ago [-]
It's to mitigate their competitors ability to run distillation on their models. The only advantage frontier models have is being at the frontier.
There's nothing in the reasoning tokens that'll give bad publicity that the final output already wouldn't do.
1 hours ago [-]
bigfishrunning 1 hours ago [-]
Imagine if their target customers, C-suite execs looking to replace workers, knew how unlike "thinking" this process actually was! we can't have that.
Sharlin 4 minutes ago [-]
To be honest I'm not sure if many C-suite execs have a good idea of what "thinking" looks like inside in the first place, in the sense of focused mental activity aimed at solving of a hard logical or technical problem.
shideneyu 1 hours ago [-]
correct. this becomes difficult for us to understand what happens behind the scenes.
metadat 2 hours ago [-]
[dead]
craigmart 2 hours ago [-]
This is something we have known for a very long time, and companies are not trying to hide that either. They do it to avoid letting competitors train their models on the CoTs
stingraycharles 2 hours ago [-]
Yes hasn’t this been around since Opus 4.6? I very much recall this change happening around January or February, and it was very explicitly to prevent distillation. Sonnet does not have this limitation.
Fun fact: if you go back to the old school from 2 years ago and provide explicit CoT prompts, you get the full thinking prompts back again!
So you disable thinking altogether, and instead make thinking part of the regular prompt by prompting it:
“Before providing your answer, think step by step. For example:
The use is asking me to…
I need to think about the blah blah. First, I should foo the bar, and then blah blah.
Answer: <put your final answer here>”
And tada.wav we have CoT as it worked in the GPT3 era back again.
dcrazy 1 hours ago [-]
I thought this was considered best practice? I actually prefer it to exposed thought channel, much like how I would prefer a human answer with supporting logic instead of an explanation of their problem-solving approach.
KellyCriterion 1 hours ago [-]
- tada.wav -
Still, one of the daily most played WAV files worldwide, Id guess? :-D
0o_MrPatrick_o0 1 hours ago [-]
Awesome share! Thank you!
datastoat 36 minutes ago [-]
I believe that chain-of-thought reasoning blocks don't really correspond to what humans think of as reasoning. (See section 6.2.2 of the Fable/Mythos system card about "illegible reasoning", and the questions raised by the Apple paper on "The illusion of thinking".) I assumed they obscure the reasoning blocks because if users saw what's going on they'd be alarmed. Just as I'd probably be alarmed if I saw what was really going on in the heads of my colleagues ...
LPisGood 17 minutes ago [-]
The point of this post isn’t that the “reasoning” phase of LLM thinking isn’t the same as what humans consider reasoning; it’s that Anthropic is intentionally hiding Claude’s “reasoning output” to make the model harder to distill.
0o_MrPatrick_o0 5 minutes ago [-]
Reading these comments is so harrowing.
You are correct in my intentions on this post generally.
I want to highlight:
I want to measure performance of the LLMs over time- which includes assessing the quality of their outputs. I don’t perceive the reasoning output to be anything other than a measurable signal of possible drift in model performance.
Except it isn’t, because I’m only getting a low value summary of the thinking.
It’s like asking your buddy how fast he thought that last pitch was when radar guns are behind the plate.
Yeah, it’s a description related to what happened, but it’s not the thing I want to measure.
VulgarExigency 6 minutes ago [-]
I've said "what the FUCK are you THINKING" more times than I can count when reading Deepseek or GLM chains-of-thought only for them to end at the correct answer. Other times, they have useful ideas there that they leave out of their answers.
MagicMoonlight 28 minutes ago [-]
[dead]
himata4113 56 minutes ago [-]
All this effort to hide thinking and opus 4.8 after 100k-200k tokens starts to leak it's own thinking. It's comedy really.
ofjcihen 31 minutes ago [-]
Oh man that’s only happened to me a few times but the result is so disorienting, especially since I’m usually jailbreaking it for security.
Pages of “I have to be careful, the user is asking that I do something related to cybersecurity that could easily be turned around and used offensively” but then happily gives me what I wanted.
linsomniac 28 minutes ago [-]
I feel like I get a lot of what this article presents as "hidden" by using this process:
- "Read `description` and create a specification, implementation guide, and checklist."
- "Ask clarifying questions. If any of those questions has a clear best recommendation, please select that yourself and record that in "autorecommendations.md".
- "Have codex and antigravity review each of these and work to consensus."
These are the core of ~61 lines of prompting I do across 3 prompts, and I feel like the resulting artifacts describe some of the thinking. Also, some of the back-and-forth between the models feels like it gives some insight into the model "thinking".
I will say: I heavily used Fable when it was available; Opus + loops + codex and/or antigravity review is better than Fable at building things.
anuramat 2 hours ago [-]
no way, the contents of "reasoning_summary" are summarized?
fyi openai does the same; not really surprising or particularly evil
knollimar 1 hours ago [-]
Not evil but full of hubris
2 hours ago [-]
nja 26 minutes ago [-]
Claude Code 2.1.68 seems to have been the last version (before the "ctrl-o" debacle) which actually shows thinking inline. That + Opus 4.6 has been working great as a daily driver for me... all the new "safety" / "preventing misuse" pain points in the newer models and harnesses are so frustrating in comparison.
msp26 34 minutes ago [-]
> Summarized thinking provides the full intelligence benefits of extended thinking, while preventing misuse.
> preventing misuse.
Imagine not being able to read the tokens you are paying for.
TeMPOraL 4 minutes ago [-]
You're metered by token generation, not paying for tokens.
_fat_santa 2 hours ago [-]
IMHO I've never found the entire reasoning chain that particularly useful for my work. For me having a summary is honestly better from a context management perspective. I understand why they would encrypt it though, because those reasoning chains are VERY useful if you're distilling the model.
stavros 1 hours ago [-]
The summary doesn't go into the context, it's for human consumption. The CoT itself goes into the context.
nomel 47 minutes ago [-]
From my experiments with Opus Ave Sonnet (at least the models where you can still see COT), only the last two COT go into context.
adi_pradhan 2 hours ago [-]
Not surprised at this. The questoins for enterprises are
+ where can you depend on a black box as a service?
+ what evals and observability do you need to deploy a black box as a service confidently?
+ what's the ROI (considering a total footprint of people, token spend, infrastructure, service, ops etc.)
The LLM providers will clearly evolve to be more and more opaque as their services get more capable. The frontier models may even be provided as purely internal advisor or async only so they can monitor your CoT and final answers for cyber etc.
jimmypk 2 hours ago [-]
[flagged]
HarHarVeryFunny 2 hours ago [-]
This is nothing new - these companies don't want their model's output to be useful for distillation/training, so they just give a "summary" of its thinking steps rather than the actual sequence.
RL (the basis of LLM "thinking") is a pretty crude way to achieve the appearance of reasoning given that it reinforces all the steps, including missteps, that got it to a reward. Providing a summary could be seen as form of sane-washing, making the model look more purposeful and directed than it really is!
In further reflection it is such a great indignity & such a collosal barrier to working with the machine that it insists on being a black box. The disingenuity of the American models is a massive disadvantage to their use, and a massive slap in the face.
This "6 month behind" trend we've seen for open models feels like at some point will be less important than simply the models unwillingness to speak for itself & to be observable.
reliablereason 2 hours ago [-]
Is the thinking even done in real tokens? I thought it was done using the pure residual stream. That is instead of collapsing the residual stream to a token you treat the final layers output as a vector of size d_model and use that as input for the next position in the transformer.
If that is the case thinking is not visible to us as users due to it not being done in text.
TeMPOraL 39 minutes ago [-]
I saw that idea described as a step in AI 2027 (they call it "neuralese" and eyeballing the site, it's still labeled a hypothetical/future development), but AFAIK no one implemented/deployed this yet.
Claude does all its thinking in text, its ChatGPT which does not do its reasoning in text. I believe its sort of implied / understood (?) that this is part of Claude's secret sauce over OpenAI. OpenAI will use less tokens, but Claude will be more correct, more of the time.
wqaatwt 1 hours ago [-]
All open model that have reasoning seem to be doing it in text tokens. Is there any indication that closed models are approaching this somehow fundamentally differently?
throwuxiytayq 1 hours ago [-]
That would be a huge deal, meaning we've lost even our shitty, ineffective ways of monitoring agent reasoning stream. Big setback when it comes to alignment and interpretability.
I don't know about Claude, but latest GPT versions still have a readable reasoning stream. It sometimes leaks out when the model gets confused, e.g., during a tool call. If you're curious, looks simplified; less words; extremely compact. They optimize tokens. But remain readable.
sigmar 56 minutes ago [-]
>the language in the docs is awfully indirect.
writes this^ and then proceeds to highlight a bold title from the docs that says "summarized thinking" that explains things clearly in the first sentence. lol
layer8 44 minutes ago [-]
The second sentence is making vague claims though.
gmerc 52 minutes ago [-]
It’s an anti distillation effort. They are scared.
runeblaze 1 hours ago [-]
tbh the summarized thinking with encrypted raw thinking is there for many purposes; it is there to:
1. make distillation much harder
2. safety: prevent modifications to the thinking leading to injection attacks.
3. also honestly sometimes the model raw thoughts can be deranged and is not a good user experience (consider the varied audience in the market, etc.)
also often the mass underestimate/the model makers over-estimate how people love distilling models
root_axis 1 hours ago [-]
Research shows that even the raw trace tokens do not actually reflect underlying model "thoughts".
micromacrofoot 20 minutes ago [-]
well yeah I wouldn't want anyone to read my unsummarized thinking either
isodev 51 minutes ago [-]
I hope it doesn't come as a surprise to anyone - LLMs don't really "think".
nlarew 47 minutes ago [-]
Your basic analysis is not the point of the article
simianwords 2 hours ago [-]
Wait I think there are 2 levels of summary. Anthropic is definitely not showing its real thinking even with enterprise agreements. For example in Claude.ai the thinking traces are not real and are themselves summaries.
2 hours ago [-]
nekusar 21 minutes ago [-]
Yep, its basically a scam to charge you more tokens and provide less compute.
You cant even guarantee WHAT model you get. Or if they downgrade you. Or if you 'offend corporate sensibilities' and they misdirect or lie.
The only way to get good returns on a model is to run it yourself. Quit paying for corporate bullshit.
jerf 2 hours ago [-]
AIUI it's fairly well established that the models can be saying one thing and "really" thinking another anyhow. The ones I recall seeing traced how simple one-digit arithmetic was done in the chat versus the actual activations under the hood. Tracing a real, non-trivial task through that way would be challenging, and I'd expect it is unlikely that the reasoning would say one thing while some utterly unrelated actual thought process is happening below, but I would expect that there might be a lot of places where the text of the reasoning diverges from what is "actually" being done. I'm not sure the full reasoning readout would produce much real insight anyhow.
I suspect that in some decades, as other architectures are found and used, that the inability of an LLM to "think" without also emitting a token will be seen as one of their fundamental limitations.
philipwhiuk 2 hours ago [-]
To be honest I thought the 'thinking' was the model being asked 'how did you come up with that' and then it generating a plausible explanation.
I know at one point this was correct.
Humans somewhat do the same - something that's been demonstrated in split-brain experiments.
Terr_ 28 minutes ago [-]
I think the "reasoning" models are best-imagined as a change in the style of document being iteratively-grown by the LLM, as opposed to something more anthropomorphized.
* Style A: There's just the spoken dialogue between a Human Customer and Helpful Chatbot.
* Style B: There's the spoken dialogue and a bunch of times the Helpful Chatbot character has an internal monologue. This provides more consistency between iterations, and can be mined by custom tools to call external code and insert results.
> To be honest I thought the 'thinking' was the model being asked 'how did you come up with that' and then it generating a plausible explanation.
This evades an easy yes or no. How about: (A) It's true that some consumers believe it means that, but implementers may often mean something different. (B) The consumer belief is reasonable given the marketing going on. (C) It is correct that it doesn't actually do retrospective introspection.
If your Human Customer character ask "Why did you say that", the LLM does not engage in a different process than "I have eaten an apple."
The LLM has no memories to consult or hidden goals to contemplate, it's the same process of finding more stuff that fits at the end of the document. Any benefits from a "reasoning model" is the LLM generates much better-looking additions because there's more (hidden) stuff for it to confabulate against.
stingraycharles 1 hours ago [-]
No not at all, you got it backwards. This was originally called “chain of thought prompting”, and it basically explained a model on how to reason through a problem before providing an answer.
Because of the nature of how LLMs work — text prediction engines - by putting the explicit reasoning steps first, it improves the likelihood of the final answer (which then is being predicted based on the entire reasoning chain as input) being correct.
InsideOutSanta 1 hours ago [-]
If you ask an LLM afterward how it arrived at an answer, it might produce a plausible but incorrect explanation. But that's not what the thinking stream is; that's actually part of how it generates the answer.
devmor 2 hours ago [-]
That's not really how LLMs work at all. I would really recommend checking out something like [1] to get a rough understanding and avoid attributing too much to them.
It’s not surprising than the Sota model makers core goal is to get user dependent while denying them increasing amounts of understanding of how it works to form a deeply unhealthy dependency.
Tell me this. If you hired a junior engineer or designer who refused to explain their thinking on their code and how they solved for the spec what would you do?
(That being said the reasoning output is still a summary of the Kvcache)
orangecat 1 hours ago [-]
* If you hired a junior engineer or designer who refused to explain their thinking on their code*
Any explanation that someone gives of their thinking process is necessarily lossy and likely partially confabulated.
bpodgursky 2 hours ago [-]
The full thinking logs are also a summary of a thinking process presumably consistent with one necessary to generate the provided answer. Nobody really understands how LLMs think. Thinking logs seem to be accurate, and summary thinking logs seem to be a good summary of the full thinking logs.
If it's useful, it's useful, enjoy. If you aren't comfortable with that, don't use LLMs. You aren't going to get a mathematical proof of your output, just learn to be comfortable with that, or opt out and be a goat farmer.
dragonwriter 1 hours ago [-]
> The full thinking logs are also a summary of a thinking process presumably consistent with one necessary to generate the provided answer.
No, they aren't a summary. They are the actual decoding of the sequence of tokens emitted during the the “thinking” stage of response generation.
Just as with, say, a human onner monolog in words vs actual speech, they are a product of the same output process as the non-thinking tokens. They aren’t a translation of the internal process that precedes the output mapped into language, either as a full result or a summary.
0o_MrPatrick_o0 2 hours ago [-]
I want to measure performance drift over time.
Having access to the reasoning text and output would help with performance measurement.
solarkraft 2 hours ago [-]
Yeah. The output is magic either way, with or without reasoning.
For daily use I actually like the reasoning summary to be brief/quick to scan.
That said, I understand the author’s desire for the real thing. It just feels better to have that access, especially when Anthropic will give it to you, but encrypted.
josefritzishere 1 hours ago [-]
AI does not think. It is a word guessing machine. Anthropomorphizing technology does not add anything to our understanding.
coldtea 46 minutes ago [-]
A brain itself might be a guessing machine it's an established and actively studied research model of the human thought and the human brain.
Nor does knee jerk accusation of "anthropomorphizing" negate the fact that procedures that mimic human processing, even when done in software, are deservingly anthropomorphized, because they're a legitimate approximation of the human equivalent operations.
slopinthebag 16 minutes ago [-]
While the brain does employ statistical processes it’s a big leap to claim that’s the entirety of how it functions.
apothegm 2 hours ago [-]
Slashdotted.
yuvrajsa 58 minutes ago [-]
[flagged]
earningedged 1 hours ago [-]
[flagged]
akitowerns 1 hours ago [-]
[dead]
codelong888 2 hours ago [-]
[flagged]
ur-whale 2 hours ago [-]
When you have no moat, you have to try and find desperate ways to manufacture one.
anuramat 2 hours ago [-]
wdym?
singron 1 hours ago [-]
Other companies were allegedly distilling the models by training on the reasoning output. By hiding the reasoning tokens, it makes it harder to do this. You can still try to distill the models, but you can't distill reasoning itself as well.
This could all be optics as well to try to give the appearance of a defensible moat. E.g. they can claim to investors that they are able to protect a significant chunk of their intellectual property this way. I'm not sure if anyone has a study about how significant the summarization is to distillation.
dragonwriter 1 hours ago [-]
> Other companies were allegedly distilling the models by training on the reasoning output
In the case of makers of open-source models (which are also competition), there is no allegedly, they were (and still are) openly doing that.
how is summarized CoT a moat, and how is having the top 2 LLMs not a moat?
dragonwriter 1 hours ago [-]
Not revealing actual thinking traces prevents mdoel distillation on yhe actual output (thinking traces are a key part of the output) which makes it harder for conpetitors to catch up (a moat).
Being currently in the lead in a category is not a moat,a moat is whatever creates a barrier to competitors catching up when you are in the lead. Merely being in the lead is not a moat except in a market with strong network externalities.
Closi 2 hours ago [-]
If you have the full outputs, it might make it easier for competitors to distil the model or reverse engineer the full process.
It may also be that misaligned responses can be in CoT which OpenAI does not want to show to users.
anuramat 1 hours ago [-]
but "harder to reverse engineer" isn't manufacturing, that's protecting your moat
fieldcny 2 hours ago [-]
duh.
Computers don’t think they process, those are very different activities.
wqaatwt 1 hours ago [-]
Is this some new revelation? That was well known when the first OpenAI/Anthropic “thinking” models came out.
InsideOutSanta 1 hours ago [-]
It's not a new revelation, but clearly a lot of people aren't aware of it, so talking about it is still valuable.
Interleaved reasoning and function calling makes this even more dangerous. A model can call functions during the hidden reasoning phase. An attacker could then exfiltrate data from you while the reasoning summary hides it from the user.
It also makes it impossible to know if the model is doomplooping during reasoning and burning tokens for no reason, as gemini is want to do, which we know about because its hidden reasoning often leaks out when it doomloops.
When the models are AGI and secure from prompt injection I may stop caring, until then I want to know exactly what the model responds to my prompts. or exactly what the agent is doing on my behalf.
Edit, further reading: Fooling around with encrypted reasoning blobs https://blog.cryptographyengineering.com/2026/05/29/fooling-...
The basic concept is that for a session active recently, interleaved thinking tokens are already in KV cache, so it's more efficient to keep using them than not! But when resuming an older session where KV cache has been evicted, it's more expensive to restore the thinking tokens, so they're silently dropped from prior turns. It's 2026 and stateful servers are back on the menu!
https://www.anthropic.com/engineering/april-23-postmortem describes this as an intended optimization:
> The design should have been simple: if a session has been idle for more than an hour, we could reduce users’ cost of resuming that session by clearing old thinking sections. Since the request would be a cache miss anyway, we could prune unnecessary messages from the request to reduce the number of uncached tokens sent to the API. We’d then resume sending full reasoning history. To do this we used the clear_thinking_20251015 API header along with keep:1.
> The implementation had a bug. Instead of clearing thinking history once, it cleared it on every turn for the rest of the session... This surfaced as the forgetfulness, repetition, and odd tool choices people reported.
And https://news.ycombinator.com/item?id=47879561 is a thread with a Claude team member's further rationale.
> Eliding parts of the context after idle: old tool results, old messages, thinking. Of these, thinking performed the best, and when we shipped it, that's when we unintentionally introduced the bug in the blog post.
(Also, https://news.ycombinator.com/item?id=47884517 indicates OpenAI drops reasoning tokens "smartly" at its own election, which is likely a similar performance optimization.)
I've experimented with rules to have Claude Code be explicit about recapping its thinking tokens, including tool choices and approaches chosen and rejected, into actual message output, but this is lossy at best. And sometimes dropping reasoning tokens can give a session "fresh eyes" in a good way.
I just really don't like the lack of control, and it's a reminder of how ephemeral the current landscape is. The Claude giveth, and the Claude taketh away.
Imagine a conversation with turns X, Y, and Z. When the LLM "reasons" about the next token A it does: P(A | X,Y,Z) and then P(B | X,Y,Z,A), etc. It will eventually produce a result P(D | X,Y,Z,A,B,C). Instead of continuing the context from X,Y,Z,A,B,C it continues it from X,Y,Z so you have P(N | X,Y,Z,D). This is what is meant by dropping the reasoning. This is done to save cache context for the session.
This is a different thing than preserving the K/V state of P(N | X,Y,Z,D).
That would be surprising to me. The reasoning _is_ the model intelligence in a lot of respects, and so dropping those from the context would affect its output pretty significantly.
I assume that instead they just have a lot of guardrails in place and multiple runtime environments that an individual turns ping-pong between in order to dehydrate/rehydrate the reasoning to keep it hidden from the end user.
"Stripping extended thinking: Extended thinking blocks (shown in dark gray) are generated during each turn's output phase, but are not carried forward as input tokens for subsequent turns. You do not need to strip the thinking blocks yourself. The Claude API automatically does this for you if you pass them back."
It's more nuanced in the various modes, but i haven't seen it boil down towards Thinking Tokens surviving more than two turns.
You've got that backwards, .bmp is a lossless format and .jpeg is the lossy one.
In our universe LLMs seem to have learned that those errors do not follow patterns in the aggregate and that they should not be emulated.
The text is clearly human-written just because it doesn't smell like AI (in this case, even if it was written by AI and produced this particular output, that's okay imo). I deal a lot with AI writing and writing in general, as I worked as an editor in another life so it's natural to me to see writing and form an objective opinion on it.
I thought the reason was the "reasoning" didn't work very well with "aligned" model output, so they had to remove the alignment during reasoning and then hide it to avoid exposing "unaligned" model output.
Before the massive nerf (showing summaries and suppressing certain aspects of reasoning) you would literally see reasoning text appearing on your screen like “while xyz is true, these facts may be seen as supporting hateful rhetoric or a conspiracy theory which is against my policy guidelines. i should tell the user xyz is not true or steer the conversation in a different direction. according to my instructions misleading the user is permitted in certain contexts where sensitive information is being discussed or could cause liability”
They disabled it shortly after the first screenshots appeared online, and restored it the next day in a way that hid what was actually happening.
https://huggingface.co/Jackrong/Qwen3.5-27B-Claude-4.6-Opus-...
It’s quite interesting to read. I can’t imagine using a model like this without the ability to peek inside and see if it is getting stuck.
There's nothing in the reasoning tokens that'll give bad publicity that the final output already wouldn't do.
Fun fact: if you go back to the old school from 2 years ago and provide explicit CoT prompts, you get the full thinking prompts back again!
So you disable thinking altogether, and instead make thinking part of the regular prompt by prompting it:
“Before providing your answer, think step by step. For example:
The use is asking me to… I need to think about the blah blah. First, I should foo the bar, and then blah blah.
Answer: <put your final answer here>”
And tada.wav we have CoT as it worked in the GPT3 era back again.
Still, one of the daily most played WAV files worldwide, Id guess? :-D
You are correct in my intentions on this post generally.
I want to highlight:
I want to measure performance of the LLMs over time- which includes assessing the quality of their outputs. I don’t perceive the reasoning output to be anything other than a measurable signal of possible drift in model performance.
Except it isn’t, because I’m only getting a low value summary of the thinking.
It’s like asking your buddy how fast he thought that last pitch was when radar guns are behind the plate.
Yeah, it’s a description related to what happened, but it’s not the thing I want to measure.
Pages of “I have to be careful, the user is asking that I do something related to cybersecurity that could easily be turned around and used offensively” but then happily gives me what I wanted.
- "Read `description` and create a specification, implementation guide, and checklist." - "Ask clarifying questions. If any of those questions has a clear best recommendation, please select that yourself and record that in "autorecommendations.md". - "Have codex and antigravity review each of these and work to consensus."
These are the core of ~61 lines of prompting I do across 3 prompts, and I feel like the resulting artifacts describe some of the thinking. Also, some of the back-and-forth between the models feels like it gives some insight into the model "thinking".
I will say: I heavily used Fable when it was available; Opus + loops + codex and/or antigravity review is better than Fable at building things.
fyi openai does the same; not really surprising or particularly evil
> preventing misuse.
Imagine not being able to read the tokens you are paying for.
The LLM providers will clearly evolve to be more and more opaque as their services get more capable. The frontier models may even be provided as purely internal advisor or async only so they can monitor your CoT and final answers for cyber etc.
RL (the basis of LLM "thinking") is a pretty crude way to achieve the appearance of reasoning given that it reinforces all the steps, including missteps, that got it to a reward. Providing a summary could be seen as form of sane-washing, making the model look more purposeful and directed than it really is!
In further reflection it is such a great indignity & such a collosal barrier to working with the machine that it insists on being a black box. The disingenuity of the American models is a massive disadvantage to their use, and a massive slap in the face.
This "6 month behind" trend we've seen for open models feels like at some point will be less important than simply the models unwillingness to speak for itself & to be observable.
If that is the case thinking is not visible to us as users due to it not being done in text.
EDIT:
They link to a Meta paper from 2024/2025 though: https://arxiv.org/pdf/2412.06769/.
I don't know about Claude, but latest GPT versions still have a readable reasoning stream. It sometimes leaks out when the model gets confused, e.g., during a tool call. If you're curious, looks simplified; less words; extremely compact. They optimize tokens. But remain readable.
writes this^ and then proceeds to highlight a bold title from the docs that says "summarized thinking" that explains things clearly in the first sentence. lol
1. make distillation much harder
2. safety: prevent modifications to the thinking leading to injection attacks.
3. also honestly sometimes the model raw thoughts can be deranged and is not a good user experience (consider the varied audience in the market, etc.)
also often the mass underestimate/the model makers over-estimate how people love distilling models
You cant even guarantee WHAT model you get. Or if they downgrade you. Or if you 'offend corporate sensibilities' and they misdirect or lie.
The only way to get good returns on a model is to run it yourself. Quit paying for corporate bullshit.
I suspect that in some decades, as other architectures are found and used, that the inability of an LLM to "think" without also emitting a token will be seen as one of their fundamental limitations.
Humans somewhat do the same - something that's been demonstrated in split-brain experiments.
* Style A: There's just the spoken dialogue between a Human Customer and Helpful Chatbot.
* Style B: There's the spoken dialogue and a bunch of times the Helpful Chatbot character has an internal monologue. This provides more consistency between iterations, and can be mined by custom tools to call external code and insert results.
> To be honest I thought the 'thinking' was the model being asked 'how did you come up with that' and then it generating a plausible explanation.
This evades an easy yes or no. How about: (A) It's true that some consumers believe it means that, but implementers may often mean something different. (B) The consumer belief is reasonable given the marketing going on. (C) It is correct that it doesn't actually do retrospective introspection.
If your Human Customer character ask "Why did you say that", the LLM does not engage in a different process than "I have eaten an apple."
The LLM has no memories to consult or hidden goals to contemplate, it's the same process of finding more stuff that fits at the end of the document. Any benefits from a "reasoning model" is the LLM generates much better-looking additions because there's more (hidden) stuff for it to confabulate against.
Because of the nature of how LLMs work — text prediction engines - by putting the explicit reasoning steps first, it improves the likelihood of the final answer (which then is being predicted based on the entire reasoning chain as input) being correct.
1. https://medium.com/@eshvargb/the-llm-journey-how-neural-netw...
Tell me this. If you hired a junior engineer or designer who refused to explain their thinking on their code and how they solved for the spec what would you do?
(That being said the reasoning output is still a summary of the Kvcache)
Any explanation that someone gives of their thinking process is necessarily lossy and likely partially confabulated.
If it's useful, it's useful, enjoy. If you aren't comfortable with that, don't use LLMs. You aren't going to get a mathematical proof of your output, just learn to be comfortable with that, or opt out and be a goat farmer.
No, they aren't a summary. They are the actual decoding of the sequence of tokens emitted during the the “thinking” stage of response generation.
Just as with, say, a human onner monolog in words vs actual speech, they are a product of the same output process as the non-thinking tokens. They aren’t a translation of the internal process that precedes the output mapped into language, either as a full result or a summary.
Having access to the reasoning text and output would help with performance measurement.
For daily use I actually like the reasoning summary to be brief/quick to scan.
That said, I understand the author’s desire for the real thing. It just feels better to have that access, especially when Anthropic will give it to you, but encrypted.
Nor does knee jerk accusation of "anthropomorphizing" negate the fact that procedures that mimic human processing, even when done in software, are deservingly anthropomorphized, because they're a legitimate approximation of the human equivalent operations.
This could all be optics as well to try to give the appearance of a defensible moat. E.g. they can claim to investors that they are able to protect a significant chunk of their intellectual property this way. I'm not sure if anyone has a study about how significant the summarization is to distillation.
In the case of makers of open-source models (which are also competition), there is no allegedly, they were (and still are) openly doing that.
https://en.wikipedia.org/wiki/Economic_moat
Being currently in the lead in a category is not a moat,a moat is whatever creates a barrier to competitors catching up when you are in the lead. Merely being in the lead is not a moat except in a market with strong network externalities.
It may also be that misaligned responses can be in CoT which OpenAI does not want to show to users.
Computers don’t think they process, those are very different activities.