Noosphere89 comments on The Field of AI Alignment: A Postmortem, and What To Do About It

Noosphere89 1 Jan 2025 17:21 UTC
4 points
0
- Think about what has been most productive in reducing AI risks so far? My short list would be:
  The proposed SB 1047 legislation.
  The short statement on AI risks
  Frontier AI Safety Commitments, AI Seoul Summit 2024, to encourage labs to publish their responsible scaling policies.
  Scary demonstrations to showcase toy models of deception, fake alignment, etc, and to create more scientific consensus, which is very very needed
I’d probably add AI control to this list, as it’s a method to use AIs of a specific capability range without AIs escaping even assuming misalignment of AIs.
Unusually relative to most AI governance people, I think regulation is most helpful in cases where AI alignment succeeds by a combination of instruction following/personal intent alignment, but no CEV of humanity occurs, and CEV alignment only occurs some of the time (and even then, it’s personal CEV alignment), which I think is the most plausible world right now.
- Charbel-Raphaël 1 Jan 2025 17:32 UTC
  6 points
  −1
  Parent
  No, AI control doesn’t pass the bar, because we’re still probably fucked until we have a solution for open source AI or race for superintelligence, for example, and OpenAI doesn’t seem to be planning to use control, so this looks to me like the research that’s sort of being done in your garage but ignored by the labs (and that’s sad, control is great I agree).
  - ryan_greenblatt 1 Jan 2025 20:23 UTC
    4 points
    0
    Parent
    I think this somewhat understates the level of buy in from labs.
    
    I agree that “quickly building superintelligence” makes control look notably less appealing. (Though note that this also applies for any prosaic method that is unlikely to directly productively scale to superintelligence.)
    
    I’m not very worried about open source AI at the moment, but I am quite worried about inadequate security undermining control and other hopes.
    - Charbel-Raphaël 1 Jan 2025 21:17 UTC
      6 points
      2
      Parent
      Maybe you have some information that I don’t have about the labs and the buy-in? You think this applies to OpenAI and not just Anthropic?
      But as far as open source goes, I’m not sure. Deepseek? Meta? Mistral? xAI? Some big labs are just producing open source stuff. DeepSeek is maybe only 6 months behind. Is that enough headway?
      It seems to me that the tipping point for many people (I don’t know for you) about open source is whether or not open source is better than close source, so this is a relative tipping point in terms of capabilities. But I think we should be thinking about absolute capabilities. For example, what about bioterrorism? At some point, it’s going to be widely accessible. Maybe the community only cares about X-risks, but personally I don’t want to die either.
      Is there a good explanation online of why I shouldn’t be afraid of open-source?
      - ryan_greenblatt 1 Jan 2025 22:21 UTC
        6 points
        2
        Parent
        As far as open source, the quick argument is that once AI becomes sufficiently powerful, it’s unlikely that the incentives are toward open sourcing it (including goverment incentives). This isn’t totally obvious though, and this doesn’t rule out catastrophic bioterrorism (more like COVID scale than extinction scale) prior to AI powerful enough to substantially accelerate R&D across many sectors (including bio). It also doesn’t rule out powerful AI being open sourced years after it is first created (though the world might be radically transformed by this point anyway). I don’t have that much of an inside view on this, but reasonable people I talk to are skeptical that open source is a very big deal (in >20% of worlds) from at least an x-risk perspective. (Seems very sensitive to questions about government response, how much stuff is driven by ideology, and how much people end up being compelled (rightly or not) by “commoditize your complement” (and ecosystem) economic arguments.)
        
        Open source seems good on current margins, at least to the extent it doesn’t leak algorithmic advances / similar.
        Charbel-Raphaël 4 Jan 2025 15:50 UTC
        2 points
        0
        Parent
        I would be happy to discuss in a dialogue about this. This seems to be an important topic, and I’m really unsure about many parameters here.