Vladimir_Nesov comments on abramdemski’s Shortform

Vladimir_Nesov 8 Apr 2025 19:51 UTC
LW: 6 AF: 4
3
AF

prioritization depends in part on timelines

Any research rebalances the mix of currently legible research directions that could be handed off to AI-assisted alignment researchers or early autonomous AI researchers whenever they show up. Even hopelessly incomplete research agendas could still be used to prompt future capable AI to focus on them, while in the absence of such incomplete research agendas we’d need to rely on AI’s judgment more completely. So it makes sense to still prioritize things that have no hope at all of becoming practical for decades (with human effort), to make as much partial progress as possible in developing (and deconfusing) them in the next few years.

In this sense current human research, however far from practical usefulness, forms the data for alignment of the early AI-assisted or AI-driven alignment research efforts. The judgment of human alignment researchers who are currently working makes it possible to formulate more knowably useful prompts for future AIs that nudge them in the direction of actually developing practical alignment techniques.
- Cole Wyeth 8 Apr 2025 21:01 UTC
  6 points
  0
  Parent
  I haven’t heard this said explicitly before but it helps me understand your priorities a lot better.
  - Vladimir_Nesov 9 Apr 2025 0:43 UTC
    6 points
    0
    Parent
    
    haven’t heard this said explicitly before
    
    Okay, this prompted me to turn the comment into a post, maybe this point is actually new to someone.
- abramdemski 9 Apr 2025 16:15 UTC
  LW: 2 AF: 2
  0
  AF Parent
  This sort of approach doesn’t make so much sense for research explicitly aiming at changing the dynamics in this critical period. Having an alternative, safer idea almost ready-to-go (with some explicit support from some fraction of the AI safety community) is a lot different from having some ideas which the AI could elaborate.
  - Vladimir_Nesov 9 Apr 2025 16:34 UTC
    LW: 3 AF: 2
    0
    AF Parent
    With AI assistance, the degree to which an alternative is ready-to-go can differ a lot compared to its prior human-developed state. Also, an idea that’s ready-to-go is not yet an edifice of theory and software that’s ready-to-go in replacing 5e28 FLOPs transformer models, so some level of AI assistance is still necessary with 2 year timelines. (I’m not necessarily arguing that 2 year timelines are correct, but it’s the kind of assumption that my argument should survive.)
    
    The critical period includes the time when humans are still in effective control of the AIs, or when vaguely aligned and properly incentivised AIs are in control and are actually trying to help with alignment, even if their natural development and increasing power would end up pushing them out of that state soon thereafter. During this time, the state of current research culture shapes the path-dependent outcomes. Superintelligent AIs that are reflectively stable will no longer allow path dependence in their further development, but before that happens the dynamics can be changed to an arbitrary extent, especially with AI efforts as leverage in implementing the changes in practice.
- cdt 11 Apr 2025 15:27 UTC
  1 point
  0
  Parent
  in the absence of such incomplete research agendas we’d need to rely on AI’s judgment more completely
  This is a key insight and I think that operationalising or pinning down the edges of a new research area is one of the longest time-horizon projects there is. If the METR estimate is accurate, then developing research directions is a distinct value-add even after AI research is semi-automatable.