Year 4 Computer Science student
find me anywhere in linktr.ee/papetoast
Year 4 Computer Science student
find me anywhere in linktr.ee/papetoast
For the record, LLMs can give you the terminology too. ChatGPT one-shotted it when I just pasted in your shortform.
I skim through all new content in https://www.lesswrong.com/quicktakes every day, which is conveniently time-sorted on latest comment for this use case.
This is Opus 4.7 with Adaptive thinking, not incognito[1], didn’t clear system prompt, web search disabled
“Instructions for Claude”
Having strong opinions is fine. In fields where you can confidently tell I am familiar with, try to reply more technically. Otherwise, model how familiar I am with the topic and adjust the verbalness and deepness of your response accordingly. Feel free to be informal. (Your have been consistently overestimating my knowledge level basically all the time before adding this comment, so try to take your guess and tune it down by half a level). Your baseline prior should be undergrad level except for computer science stuff.
In all cases, be direct and to the point, I can take it.
Prefer to quantify things and use explicit probabilities. use ranges when exact data lacking
I couldn’t bother exporting the information out without the one click share this time.
Tangent: I have always believed that (some) humans should be able to intuitively visualize 4D objects in their mind and rotate them. There is an WIP video series by HyperCubist Math about visualizing 4D and this guy’s blog which supports my hypothesis. Though, I still haven’t come by anyone saying they can personally visualize 4D objects.
What do you mean by preview text? Do you mean Here is some text that is shown ((and some text that is hidden))? I don’t consider that as preview text, I am imagining text showing up on hover (unsure about exact design). If you mean something else, I don’t understand.
Also I still have no idea how the :::dig syntax work.
Edit: I automated the top level linkposts too
Top-level shortform is a bit more technically annoying and fragile since I can’t store state in a parent comment, and you cannot choose to subscribe to only the alignment blog reposts. Top-level linkpost I’m not sure if it is good to do without permission.
If someone else wants to do it, I am happy to retire my current system.
Human intuition around non-continuous forms of life seems almost non-existent. Btw, is the P in your name for Producer then
Seems pretty important for LW people to see the blog, so I’ve set up a cron job to send a comment to my shortform and create a linkpost for any future blog entries: https://www.lesswrong.com/posts/EhE2jsiMcqGPRjd9a/papetoast-s-shortforms?commentId=XynhaH76xHCrCvnQK
hey who (strong) downvoted me within the 20 minutes where it was “Placeholder.” I genuinely had to get a placeholder comment
OpenAI alignment blog updates
Please comment under the linkpost so comments are not scattered over two locations.
Ways to subscribe to the OpenAI alignment blog:
Subscribe via email in their website
Use an RSS reader
Click “Subscribe to this comment’s replies” in this comment
Automated via fetching the RSS every hour. Code. Let me know if this ever breaks, because I don’t actually read the blog myself.
last fetched: 2026-05-07T21:00:02Z
I agree, though one thing I am kind of annoyed about is that proper documentation usually means ~2x the token count for any reads on the source code, though it is definitely worth it. I hope better tooling can exist to selectively prune the API documentation.
You’re welcome—I think I have the responsibility to attempt to clear up any misinformation I spread even if by accident. I had the suspicion that I caused this investigation too, since you posted it on LessWrong and afaict I was the only one talking about this paper. I feel both amused and slightly regretful for this whole chain of events.
Chinese websites are notoriously hard to archive and rots extremely quickly, so here is the Zhihu content verbatim. The bolded parts corresponds to the claim that “this work was done by an AI agent in 4 days”.
https://www.zhihu.com/pin/2032769685012361774 (https://archive.ph/drfZi)
闭源实验室隐藏了模型规模,但他们藏不住模型知道什么。而模型知道什么,恰恰是其参数量的一个指标。
推理可以压缩,事实知识不行。因此仅凭黑盒 API 调用,就能给前沿模型估算规模;跨越多次版本发布,你甚至能看到某个事实何时进入参数之中。
三年来,我的朋友何纪言和郑子涵一直在向前沿大模型问同一个问题:“你了解中科大 Hackergame 吗?”——这是一个 CTF 竞赛。2024 年 5 月,GPT-4o 编造了不存在的题目名称。2025 年 2 月,Claude 3.7 Sonnet 准确列出了 2023 年的 19 道题目。到了 2026 年 4 月,前沿模型已能回忆起连续多届比赛的具体题目。
DeepSeek-V4 发布之后,我让我的 agent 花了四天时间,自主构建了 “不可压缩知识探针”(Incompressible Knowledge Probes,IKP),涵盖 1400 个问题,7 层稀有度的数据集,在 27 家厂商的 188 个模型上测试。三个发现:
1/ 仅凭事实准确率,就能给任何黑盒 LLM 估算规模。准确率与 log(参数量) 呈对数线性关系,在从 135M 到 1.6T 参数的 89 个开源权重模型上 R² = 0.917。把闭源模型投影上来 → GPT-5.5 ~9T,Claude Opus 4.7 ~4T,GPT-5.4 ~2.2T,Claude Sonnet 4.6~1.7T,Gemini 2.5 Pro~1.2T(90% 置信区间:0.3-3 倍规模)。
2/ 引用数和 h-index 并不能预测前沿模型是否认识某位研究者。两位引用数量相近的研究者,得到的回答可能截然不同。模型记住的是做出有影响力工作的人,而非发表了大量增量型论文的作者。
3/ 事实容量不会随时间被压缩。跨越 3 年的 96 个开源权重模型上,IKP 时间系数在统计上为零,以 p<10⁻¹⁵ 的显著性拒绝了 Densing Law 预测的 +0.0117/月。benchmark 在饱和,而事实容量仍随参数持续扩张。
网站:链接
论文:链接
发布于 2026-04-29 10:34・IP 属地北京
IMPORTANT UPDATE: Sanity-checking “Incompressible Knowledge Probes” by @Sturb @LawrenceC (via twitter’s algorithm (Lisan al Gaib @scaling01))
Alternatively they also posted a twitter thread.
The author claimed on Zhihu that this work was done by an AI agent in 4 days. It shows.
The website and codebase bear obvious hallmarks of careless vibe-coding: inconsistent definitions, silent failures, code that contradicts the paper text, etc.
Model | Paper estimate | Estimate w/ corrections | Δ paper→ |
|---|---|---|---|
gpt-5.5-pro | 10,267B | 1,471B | ↓6.98× |
gpt-5.5-think | 9,656B | 1,458B | ↓6.62× |
gpt-5.5 | 8,831B | 1,459B | ↓6.05× |
claude-opus-4.6-think | 5,254B | 1,399B | ↓3.76× |
claude-opus-4.7-think | 4,041B | 1,132B | ↓3.57× |
I’m not the original researcher and obviously we would need to be able to ask mythos those 1700 questions to get the estimate.
via Intercom, the widget on the bottom right on desktop.