Beautifully written! I read both endings and it felt very realistic (or, as realistic as a detailed far future can get).
Would it be possible for you to write a second good ending based on actions you actually do recommend,[1] without the warning of “we don’t recommend these actions:”
Reminder that this scenario is a forecast, not a recommendation
We don’t endorse many actions in this slowdown ending and think it makes optimistic technical alignment assumptions. We don’t endorse many actions in the race ending either.
One of our goals in writing this scenario is to elicit critical feedback from people who are more optimistic than us. What does success look like? This “slowdown ending” scenario represents our best guess about how we could successfully muddle through with a combination of luck, rude awakenings, pivots, intense technical alignment effort, and virtuous people winning power struggles. It does not represent a plan we actually think we should aim for. But many, including most notably Anthropic and OpenAI, seem to be aiming for something like this.57 We’d love to see them clarify what they are aiming for: if they could sketch out a ten-page scenario, for example, either starting from the present or branching off from some part of ours.
Of course, I understand this is a very time consuming project and not an easy ask :/
I guess what I’m asking for, is for you to make the same technical alignment assumptions as the bad ending, but where policymakers and executives who are your “target audience” steer things towards the good ending. Where you don’t need to warn them against seeing it as a recommendation.
Thank you! We actually tried to write one that was much closer to a vision we endorse! The TLDR overview was something like:
Both the US and Chinese leading AGI projects stop in response to evidence of egregious misalignment.
Sign a treaty to pause smarter-than-human AI development, with compute based enforcement similar to ones described in our live scenario, except this time with humans driving the treaty instead of the AI.
Take time to solve alignment (potentially with the help of the AIs). This period could last anywhere between 1-20 years. Or maybe even longer! The best experts at this would all be brought in to the leading project, many different paths would be pursued (e.g. full mechinterp, Davidad moonshots, worst case ELK, uploads, etc).
Somehow, a do a bunch of good governance interventions on the AGI project (e.g. transparency on use of the AGIs, no helpful only access to any one. party, a formal governance structure where a large number of diverse parties all are represented.).
This culminates with aligning an AI “in the best interests of humanity” whatever that means, using a process where a large fraction of humanity is engaged and has some power to vote. This process might look something like giving each human some of the total resources of space and then doing lots of bargaining to find all the positive sum trades, with some rules against blackmail / using your resources to cause immense harm.
Unfortunately, it was hard to write this out in a way that felt realistic.
The next major project I focus on is likely going to be focusing on thinking through the right governance interventions here to make that happen. I’m probably not going to do this in scenario format (and instead something closer to normal papers and blog posts), but would be curious for thoughts.
Big +1 on adding this and/or finding another high-quality way of depicting what the ideal scenario would look like. I think many people think and feel that the world is in a very dire state to an extent that leads to hopelessness and fatalism. Articulating clear theories of victory that enable people to see the better future they can contribute towards will be an important part of avoiding this scenario.
:) it’s good to know that you tried this. Because on your way trying to make it realistic, you might think of a lot of insights to solving the unrealisticness problems.
Thank you for the summary. From this summary, I sorta see why it might not work as well as a story. Regulation and governance isn’t very exciting a narrative. And big changes in strategy and attitude inevitably sound unrealistic, even if they aren’t unrealistic. E.g. if someone predicted that Europe will simply accept the fact its colonies want independence, or that the next Soviet leader will simply allow his constituent republics to break away, they would be laughed out of the room. Even though their predictions will turn out accurate.
Maybe in your disclaimer, you can point out that this summary you just wrote, is what you would actually recommend (instead of what the characters in your story did).
Yes, papers and blog posts are less entertaining of us but more pragmatic for you.
Beautifully written! I read both endings and it felt very realistic (or, as realistic as a detailed far future can get).
Would it be possible for you to write a second good ending based on actions you actually do recommend,[1] without the warning of “we don’t recommend these actions:”
Of course, I understand this is a very time consuming project and not an easy ask :/
But maybe just a brief summary? I’m so curious.
I guess what I’m asking for, is for you to make the same technical alignment assumptions as the bad ending, but where policymakers and executives who are your “target audience” steer things towards the good ending. Where you don’t need to warn them against seeing it as a recommendation.
Thank you! We actually tried to write one that was much closer to a vision we endorse! The TLDR overview was something like:
Both the US and Chinese leading AGI projects stop in response to evidence of egregious misalignment.
Sign a treaty to pause smarter-than-human AI development, with compute based enforcement similar to ones described in our live scenario, except this time with humans driving the treaty instead of the AI.
Take time to solve alignment (potentially with the help of the AIs). This period could last anywhere between 1-20 years. Or maybe even longer! The best experts at this would all be brought in to the leading project, many different paths would be pursued (e.g. full mechinterp, Davidad moonshots, worst case ELK, uploads, etc).
Somehow, a do a bunch of good governance interventions on the AGI project (e.g. transparency on use of the AGIs, no helpful only access to any one. party, a formal governance structure where a large number of diverse parties all are represented.).
This culminates with aligning an AI “in the best interests of humanity” whatever that means, using a process where a large fraction of humanity is engaged and has some power to vote. This process might look something like giving each human some of the total resources of space and then doing lots of bargaining to find all the positive sum trades, with some rules against blackmail / using your resources to cause immense harm.
Unfortunately, it was hard to write this out in a way that felt realistic.
The next major project I focus on is likely going to be focusing on thinking through the right governance interventions here to make that happen. I’m probably not going to do this in scenario format (and instead something closer to normal papers and blog posts), but would be curious for thoughts.
Big +1 on adding this and/or finding another high-quality way of depicting what the ideal scenario would look like. I think many people think and feel that the world is in a very dire state to an extent that leads to hopelessness and fatalism. Articulating clear theories of victory that enable people to see the better future they can contribute towards will be an important part of avoiding this scenario.
:) it’s good to know that you tried this. Because on your way trying to make it realistic, you might think of a lot of insights to solving the unrealisticness problems.
Thank you for the summary. From this summary, I sorta see why it might not work as well as a story. Regulation and governance isn’t very exciting a narrative. And big changes in strategy and attitude inevitably sound unrealistic, even if they aren’t unrealistic. E.g. if someone predicted that Europe will simply accept the fact its colonies want independence, or that the next Soviet leader will simply allow his constituent republics to break away, they would be laughed out of the room. Even though their predictions will turn out accurate.
Maybe in your disclaimer, you can point out that this summary you just wrote, is what you would actually recommend (instead of what the characters in your story did).
Yes, papers and blog posts are less entertaining of us but more pragmatic for you.