Vladimir_Nesov comments on StefanHex’s Shortform

Vladimir_Nesov 26 Jun 2025 18:40 UTC
5 points
0
That’s basically my argument in this post, and it applies to most AI risk related activities that naively would need to proceed much further than likely actually possible before takeoff. So not just more careful kinds of interpretability, but all sorts of things, such as control-enhancing automated governance of AI bureaucracies, or agent foundations / decision theory, or saner definitions of potential eutopias.

That is, shortly before takeoff AIs might be able to prioritize and complete your project, but only if you aim them at the particular things you’ve worked out so far. You can only start asking the right questions (without relying on AIs asking these questions themselves) by already being deconfused enough through previous human effort.

But this forgets that automated AI R&D means we’ll have decades of subjective research-time in months or years of wall-clock time!

It’s only the AIs that straightforwardly get the decades of subjective research-time, while we don’t. Humans would have to struggle to understand what the AIs are developing, in order to have a chance of meaningfully directing their efforts, while to a large extent being at the mercy of AI advice about how to think about what’s going on.
- StefanHex 28 Jun 2025 20:38 UTC
  2 points
  0
  Parent
  Thanks for flagging this, I missed that post! The advice in the post & its comments are very useful, especially considerations like preparing to aim the AIs, setting oneself up to provide oversight to many AI agents, and whether we’ll understand what the AIs are developing.