Towards_Keeperhood comments on How will we update about scheming?

Towards_Keeperhood 7 Nov 2025 8:31 UTC
1 point
0
Thanks for giving a legible account of your probability distribution that dominate top human experts.
- 1:1 very good: top human expert at opaque goal-directed reasoning with the relevant context with a bunch of time to think _(My modal view)_
Could you clarify what you mean by “a bunch of time to think” here? 30 minutes? 1 week? roughly indefinite?
(Btw: Top human experts is quite vague and problematic because the human capability distribution is often quite heavy-tailed. And I am most interested in the question of whether AIs will be scheming when they start to be able to AI safety research as well as the best humans, which I think may be a chunk after they surpass humans at capabilities research, since safety research seems less verifiable and people will try less hard to automate safety. On my model that does make success a decent amount harder than if the bar is “ordinary top human experts”.)