papetoast

Karma: 878

Year 4 Computer Science student

find me anywhere in linktr.ee/papetoast

papetoast 6 May 2026 23:29 UTC
2 points
0
in reply to: Priyanka Bharadwaj’s comment on: Priyanka Bharadwaj’s Shortform
via Intercom, the widget on the bottom right on desktop.

papetoast 6 May 2026 23:19 UTC
0 points
0
in reply to: less_raichu’s comment on: less_raichu’s Shortform
For the record, LLMs can give you the terminology too. ChatGPT one-shotted it when I just pasted in your shortform.

papetoast 6 May 2026 1:28 UTC
1 point
1
in reply to: leogao’s comment on: leogao’s Shortform
I skim through all new content in https://www.lesswrong.com/quicktakes every day, which is conveniently time-sorted on latest comment for this use case.

papetoast 5 May 2026 6:33 UTC
1 point
0
in reply to: Linch’s comment on: Linch’s Shortform
This is Opus 4.7 with Adaptive thinking, not incognito^[1], didn’t clear system prompt, web search disabled
“Instructions for Claude”
Having strong opinions is fine. In fields where you can confidently tell I am familiar with, try to reply more technically. Otherwise, model how familiar I am with the topic and adjust the verbalness and deepness of your response accordingly. Feel free to be informal. (Your have been consistently overestimating my knowledge level basically all the time before adding this comment, so try to take your guess and tune it down by half a level). Your baseline prior should be undergrad level except for computer science stuff.
In all cases, be direct and to the point, I can take it.
Prefer to quantify things and use explicit probabilities. use ranges when exact data lacking
1. ^
  I couldn’t bother exporting the information out without the one click share this time.

papetoast 5 May 2026 6:32 UTC
1 point
0
on: April 2026 Links
Court of appeal says it cannot rule on which identical twin fathered a child
Wrong link?

papetoast 5 May 2026 6:25 UTC
1 point
0
in reply to: Linch’s comment on: Linch’s Shortform
Cannot reproduce in claude web (I didn’t clean up system prompts and stuff)
Attempts: 1, 2, 3
What links here?
- Linch's comment on Linch’s Shortform by Linch (5 May 2026 17:38 UTC; 4 points)

papetoast 5 May 2026 5:56 UTC
1 point
0
on: Returns to intelligence
Tangent: I have always believed that (some) humans should be able to intuitively visualize 4D objects in their mind and rotate them. There is an WIP video series by HyperCubist Math about visualizing 4D and this guy’s blog which supports my hypothesis. Though, I still haven’t come by anyone saying they can personally visualize 4D objects.

papetoast 4 May 2026 13:23 UTC
1 point
0
in reply to: Paweł Sysiak’s comment on: Paweł Sysiak’s Shortform
What do you mean by preview text? Do you mean Here is some text that is shown ((and some text that is hidden))? I don’t consider that as preview text, I am imagining text showing up on hover (unsure about exact design). If you mean something else, I don’t understand.
Also I still have no idea how the :::dig syntax work.

Auto-review of agent actions without synchronous human oversight

papetoast4 May 2026 2:12 UTC

6 points

0 comments1 min readLW link

(alignment.openai.com)

papetoast 4 May 2026 1:56 UTC
2 points
0
in reply to: Caleb Biddulph’s comment on: boazbarak’s Shortform
Edit: I automated the top level linkposts too
Top-level shortform is a bit more technically annoying and fragile since I can’t store state in a parent comment, and you cannot choose to subscribe to only the alignment blog reposts. Top-level linkpost I’m not sure if it is good to do without permission.
If someone else wants to do it, I am happy to retire my current system.

papetoast 3 May 2026 12:23 UTC
3 points
0
on: The Repugnant Lifespan Conclusion
Human intuition around non-continuous forms of life seems almost non-existent. Btw, is the P in your name for Producer then

papetoast 3 May 2026 12:10 UTC
7 points
4
in reply to: Boaz Barak’s comment on: boazbarak’s Shortform
Seems pretty important for LW people to see the blog, so I’ve set up a cron job to send a comment to my shortform and create a linkpost for any future blog entries: https://www.lesswrong.com/posts/EhE2jsiMcqGPRjd9a/papetoast-s-shortforms?commentId=XynhaH76xHCrCvnQK

papetoast 3 May 2026 11:56 UTC
0 points
0
in reply to: papetoast’s comment on: papetoast’s low quality shortforms
Auto-review of agent actions without synchronous human oversight (linkpost)

papetoast 3 May 2026 11:56 UTC
1 point
0
in reply to: papetoast’s comment on: papetoast’s low quality shortforms
hey who (strong) downvoted me within the 20 minutes where it was “Placeholder.” I genuinely had to get a placeholder comment

papetoast 3 May 2026 11:35 UTC
13 points
4
on: papetoast’s low quality shortforms
OpenAI alignment blog updates

Please comment under the linkpost so comments are not scattered over two locations.

Ways to subscribe to the OpenAI alignment blog:
- Subscribe via email in their website
- Use an RSS reader
- Click “Subscribe to this comment’s replies” in this comment
Automated via fetching the RSS every hour. Code. Let me know if this ever breaks, because I don’t actually read the blog myself.

last fetched: 2026-05-07T21:00:02Z
What links here?
- papetoast's comment on boazbarak’s Shortform by Boaz Barak (3 May 2026 12:10 UTC; 7 points)

papetoast 3 May 2026 2:54 UTC
2 points
−1
in reply to: Dylan Bowman’s comment on: Dylan Bowman’s Shortform
I agree, though one thing I am kind of annoyed about is that proper documentation usually means ~2x the token count for any reads on the source code, though it is definitely worth it. I hope better tooling can exist to selectively prune the API documentation.

papetoast 2 May 2026 12:05 UTC
1 point
0
in reply to: LawrenceC’s comment on: papetoast’s low quality shortforms
You’re welcome—I think I have the responsibility to attempt to clear up any misinformation I spread even if by accident. I had the suspicion that I caused this investigation too, since you posted it on LessWrong and afaict I was the only one talking about this paper. I feel both amused and slightly regretful for this whole chain of events.

papetoast 2 May 2026 11:51 UTC
1 point
0
in reply to: papetoast’s comment on: papetoast’s low quality shortforms
Chinese websites are notoriously hard to archive and rots extremely quickly, so here is the Zhihu content verbatim. The bolded parts corresponds to the claim that “this work was done by an AI agent in 4 days”.
https://www.zhihu.com/pin/2032769685012361774 (https://archive.ph/drfZi)
李博杰
闭源实验室隐藏了模型规模，但他们藏不住模型知道什么。而模型知道什么，恰恰是其参数量的一个指标。

推理可以压缩，事实知识不行。因此仅凭黑盒 API 调用，就能给前沿模型估算规模；跨越多次版本发布，你甚至能看到某个事实何时进入参数之中。

三年来，我的朋友何纪言和郑子涵一直在向前沿大模型问同一个问题：“你了解中科大 Hackergame 吗？”——这是一个 CTF 竞赛。2024 年 5 月，GPT-4o 编造了不存在的题目名称。2025 年 2 月，Claude 3.7 Sonnet 准确列出了 2023 年的 19 道题目。到了 2026 年 4 月，前沿模型已能回忆起连续多届比赛的具体题目。

DeepSeek-V4 发布之后，我让我的 agent 花了四天时间，自主构建了 “不可压缩知识探针”（Incompressible Knowledge Probes，IKP），涵盖 1400 个问题，7 层稀有度的数据集，在 27 家厂商的 188 个模型上测试。三个发现：

1/ 仅凭事实准确率，就能给任何黑盒 LLM 估算规模。准确率与 log(参数量) 呈对数线性关系，在从 135M 到 1.6T 参数的 89 个开源权重模型上 R² = 0.917。把闭源模型投影上来 → GPT-5.5 ～9T，Claude Opus 4.7 ～4T，GPT-5.4 ～2.2T，Claude Sonnet 4.6～1.7T，Gemini 2.5 Pro～1.2T（90% 置信区间：0.3-3 倍规模）。

2/ 引用数和 h-index 并不能预测前沿模型是否认识某位研究者。两位引用数量相近的研究者，得到的回答可能截然不同。模型记住的是做出有影响力工作的人，而非发表了大量增量型论文的作者。

3/ 事实容量不会随时间被压缩。跨越 3 年的 96 个开源权重模型上，IKP 时间系数在统计上为零，以 p<10⁻¹⁵ 的显著性拒绝了 Densing Law 预测的 +0.0117/月。benchmark 在饱和，而事实容量仍随参数持续扩张。

网站：链接
论文：链接
发布于 2026-04-29 10:34・IP 属地北京

papetoast 2 May 2026 11:41 UTC

1 point

in reply to: papetoast’s comment on: papetoast’s low quality shortforms

IMPORTANT UPDATE: Sanity-checking “Incompressible Knowledge Probes” by @Sturb @LawrenceC (via twitter’s algorithm (Lisan al Gaib @scaling01))

Alternatively they also posted a twitter thread.

The author claimed on Zhihu that this work was done by an AI agent in 4 days. It shows.
The website and codebase bear obvious hallmarks of careless vibe-coding: inconsistent definitions, silent failures, code that contradicts the paper text, etc.
https://www.zhihu.com/pin/2032769685012361774

Model	Paper estimate [90% PI]	Estimate w/ corrections [90% PI]	Δ paper→ corrected
gpt-5.5-pro	10,267B [3,422 – 30,801]	1,471B [258 – 8,385]	↓6.98×
gpt-5.5-think	9,656B [3,219 – 28,968]	1,458B [256 – 8,311]	↓6.62×
gpt-5.5	8,831B [2,944 – 26,493]	1,459B [256 – 8,316]	↓6.05×
claude-opus-4.6-think	5,254B [1,751 – 15,762]	1,399B [245 – 7,974]	↓3.76×
claude-opus-4.7-think	4,041B [1,347 – 12,123]	1,132B [199 – 6,452]	↓3.57×

papetoast 2 May 2026 7:07 UTC
1 point
0
in reply to: frmsaul’s comment on: papetoast’s low quality shortforms
I’m not the original researcher and obviously we would need to be able to ask mythos those 1700 questions to get the estimate.

papetoast

Auto-re­view of agent ac­tions with­out syn­chronous hu­man oversight

Auto-review of agent actions without synchronous human oversight