I mean I had an impression that pretty much everyone assigned >5% probability to “if we scale we all die” so it’s already enough reason to work on global coordination on safety.
What specific actions do you have in mind when you say “global coordination on safety”, and how much of the problem do you think these actions solve?
My own view is that ‘caring about AI x-risk at all’ is a pretty small (albeit indispensable) step. There are lots of decisions that hinge on things other than ‘is AGI risky at all’.
I agree with Rohin that the useful thing is trying to understand each other’s overall models of the world and try to converge on them, not p(doom) per se. I gave some examples here of some important implications of having more Paul-ish models versus more Eliezer-ish models.
More broadly, examples of important questions people in the field seem to disagree a lot about:
Alignment
How hard is alignment? What are the central obstacles? What kind of difficulty is it? (Is it hard like ‘building a secure OS that works on the first try’? Hard like ‘the engineering/logistics/implementation portion of the Manhattan Project’? Both? Some other option? Etc.)
What alignment research directions are potentially useful, and what plans for developing aligned AGI have a chance of working?
Deployment
What should the first AGI systems be aligned to do?
To what extent should we be thinking of “large disruptive act that upends the gameboard”, versus “slow moderate roll-out of regulations and agreements across a few large actors”?
Information spread
How important is research closure and opsec for capabilities-synergistic ideas? (Now, later, in the endgame, etc.)
Path to AGI
Is AGI just “current SotA systems like GPT-3, but scaled up”, or are we missing key insights?
More broadly, what’s the relationship between current approaches and AGI?
How software- and/or hardware-bottlenecked are we on AGI?
How compute- and/or data-efficient will AGI systems be?
How far off is AGI? How possible is it to time future tech developments? How continuous is progress likely to be?
How likely is it that AGI is in-paradigm for deep learning?
If AGI comes from a new paradigm, how likely is it that it arises late in the paradigm (when the relevant approach is deployed at scale in large corporations) versus early (when a few fringe people are playing with the idea)?
Should we expect warning shots? Would warning shots make a difference, and if so, would they be helpful or harmful?
To what extent are there meaningfully different paths to AGI, versus just one path? How possible (and how desirable) is it to change which path humanity follows to get to AGI?
Actors
How likely is it that AGI is first developed by a large established org, versus a small startup-y org, versus an academic group, versus a government?
How likely is it that governments play a role at all? What role would be desirable, if any? How tractable is it to try to get governments to play a good role (rather than a bad role), and/or to try to get governments to play a role at all (rather than no role)?
Specific actions like not scaling systems with 5% probalility of catastrophe if they have control over it and explaning everyone why they shouldn’t do it too. It’s just that my first reaction is that indispensable steps should be a priority. And so even though reconciliation of models is certainly useful for future solution, it seemed to me less cost effective than spreading less pessimistic model, for example. Again, it is just initial feeling and I can come up with scenarios where it makes sense to focus on model convergence, but I am not sure how are you weighting these scenarios. Is it that just making everyone think like Paul is impossible, or is civilisation of Pauls would end anyway, or are you already trying to spread awareness via other channels and this discussion supposed to be solution-focused… I guess at least last is true, because https://www.lesswrong.com/posts/CpvyhFy9WvCNsifkY/discussion-with-eliezer-yudkowsky-on-agi-interventions but then this discussion felt like too much about P(doom). My guess it’s something like “models that assign wrong probabilites may not destroy world themselves, but would be too slow to solve alignment before someone creates AGI on desktop”? And so discussing models is not much less useful, because all known actions are unlikely to help. But would like to hear what’s the plan is/was anyway.
What specific actions do you have in mind when you say “global coordination on safety”, and how much of the problem do you think these actions solve?
My own view is that ‘caring about AI x-risk at all’ is a pretty small (albeit indispensable) step. There are lots of decisions that hinge on things other than ‘is AGI risky at all’.
I agree with Rohin that the useful thing is trying to understand each other’s overall models of the world and try to converge on them, not p(doom) per se. I gave some examples here of some important implications of having more Paul-ish models versus more Eliezer-ish models.
More broadly, examples of important questions people in the field seem to disagree a lot about:
Alignment
How hard is alignment? What are the central obstacles? What kind of difficulty is it? (Is it hard like ‘building a secure OS that works on the first try’? Hard like ‘the engineering/logistics/implementation portion of the Manhattan Project’? Both? Some other option? Etc.)
What alignment research directions are potentially useful, and what plans for developing aligned AGI have a chance of working?
Deployment
What should the first AGI systems be aligned to do?
To what extent should we be thinking of “large disruptive act that upends the gameboard”, versus “slow moderate roll-out of regulations and agreements across a few large actors”?
Information spread
How important is research closure and opsec for capabilities-synergistic ideas? (Now, later, in the endgame, etc.)
Path to AGI
Is AGI just “current SotA systems like GPT-3, but scaled up”, or are we missing key insights?
More broadly, what’s the relationship between current approaches and AGI?
How software- and/or hardware-bottlenecked are we on AGI?
How compute- and/or data-efficient will AGI systems be?
How far off is AGI? How possible is it to time future tech developments? How continuous is progress likely to be?
How likely is it that AGI is in-paradigm for deep learning?
If AGI comes from a new paradigm, how likely is it that it arises late in the paradigm (when the relevant approach is deployed at scale in large corporations) versus early (when a few fringe people are playing with the idea)?
Should we expect warning shots? Would warning shots make a difference, and if so, would they be helpful or harmful?
To what extent are there meaningfully different paths to AGI, versus just one path? How possible (and how desirable) is it to change which path humanity follows to get to AGI?
Actors
How likely is it that AGI is first developed by a large established org, versus a small startup-y org, versus an academic group, versus a government?
How likely is it that governments play a role at all? What role would be desirable, if any? How tractable is it to try to get governments to play a good role (rather than a bad role), and/or to try to get governments to play a role at all (rather than no role)?
Specific actions like not scaling systems with 5% probalility of catastrophe if they have control over it and explaning everyone why they shouldn’t do it too. It’s just that my first reaction is that indispensable steps should be a priority. And so even though reconciliation of models is certainly useful for future solution, it seemed to me less cost effective than spreading less pessimistic model, for example. Again, it is just initial feeling and I can come up with scenarios where it makes sense to focus on model convergence, but I am not sure how are you weighting these scenarios. Is it that just making everyone think like Paul is impossible, or is civilisation of Pauls would end anyway, or are you already trying to spread awareness via other channels and this discussion supposed to be solution-focused… I guess at least last is true, because https://www.lesswrong.com/posts/CpvyhFy9WvCNsifkY/discussion-with-eliezer-yudkowsky-on-agi-interventions but then this discussion felt like too much about P(doom). My guess it’s something like “models that assign wrong probabilites may not destroy world themselves, but would be too slow to solve alignment before someone creates AGI on desktop”? And so discussing models is not much less useful, because all known actions are unlikely to help. But would like to hear what’s the plan is/was anyway.