I strongly agree. I expect AI to be able to “take over the world” before it can create a more powerful AI that perfectly shares its values. This matches the Sable scenario Yudkowsky outlined in “If Anyone Builds It, Everyone Dies” where it becomes dangerously capable before solving its own alignment problem.
The problem is that doesn’t avert doom. If modern AIs become smart enough to do self-improvement at all, then their makers will have them do it. This has in some ways already started.
The agency and intentionality of current models is still up to debate, but the current versions of Claude, ChatGPT, etc. were all created with the assistance of their earlier versions.
I strongly agree. I expect AI to be able to “take over the world” before it can create a more powerful AI that perfectly shares its values. This matches the Sable scenario Yudkowsky outlined in “If Anyone Builds It, Everyone Dies” where it becomes dangerously capable before solving its own alignment problem.
The problem is that doesn’t avert doom. If modern AIs become smart enough to do self-improvement at all, then their makers will have them do it. This has in some ways already started.
AIs are already intentionally, agentically self-improving? Do you have a source for that?
The agency and intentionality of current models is still up to debate, but the current versions of Claude, ChatGPT, etc. were all created with the assistance of their earlier versions.