It was all very interesting, but what was the goal of these discussions? I mean I had an impression that pretty much everyone assigned >5% probability to “if we scale we all die” so it’s already enough reason to work on global coordination on safety. Is the reasoning that the same mental process that assigned too low probability would not be able to come up with actual solution? Or something like “at the time they think their solution reduced probability of failure from 5% to 0.1% it would still be much higher”? Seems to be only possible if people don’t understand arguments about inner optimisators or what not, as opposed to disagreeing with them.
Changing one’s mind on P(doom) can be useful for people comparing across cause areas (e.g. Open Phil), but it’s not all that important for me and was not one of my goals.
Generally when people have big disagreements about some high-level question like P(doom), it means that they have very different underlying models that drive their reasoning within that domain. The main goal (for me) is to acquire underlying models that I can then use in the future.
Acquiring a new underlying model that I actually believe would probably be more important than the rest of my work in a full year combined. It would typically have significant implications on what sorts of proposals can and cannot work, and would influence what research I do for years to come. In the case of Eliezer’s model specifically, it would completely change what research I do, since Eliezer’s model specifically predicts that the research I do is useless (I think).
I didn’t particularly expect to actually acquire a new model that I believed from these conversations, but there was some probability of that, and I did expect that I would learn at least a few new things I hadn’t previously considered. I’m unfortunately quite bad at noticing my own “updates”, so I can’t easily point to examples. That being said, I’m confident that I would now be significantly better at Eliezer’s ITT than before the conversations.
I mean I had an impression that pretty much everyone assigned >5% probability to “if we scale we all die” so it’s already enough reason to work on global coordination on safety.
What specific actions do you have in mind when you say “global coordination on safety”, and how much of the problem do you think these actions solve?
My own view is that ‘caring about AI x-risk at all’ is a pretty small (albeit indispensable) step. There are lots of decisions that hinge on things other than ‘is AGI risky at all’.
I agree with Rohin that the useful thing is trying to understand each other’s overall models of the world and try to converge on them, not p(doom) per se. I gave some examples here of some important implications of having more Paul-ish models versus more Eliezer-ish models.
More broadly, examples of important questions people in the field seem to disagree a lot about:
Alignment
How hard is alignment? What are the central obstacles? What kind of difficulty is it? (Is it hard like ‘building a secure OS that works on the first try’? Hard like ‘the engineering/logistics/implementation portion of the Manhattan Project’? Both? Some other option? Etc.)
What alignment research directions are potentially useful, and what plans for developing aligned AGI have a chance of working?
Deployment
What should the first AGI systems be aligned to do?
To what extent should we be thinking of “large disruptive act that upends the gameboard”, versus “slow moderate roll-out of regulations and agreements across a few large actors”?
Information spread
How important is research closure and opsec for capabilities-synergistic ideas? (Now, later, in the endgame, etc.)
Path to AGI
Is AGI just “current SotA systems like GPT-3, but scaled up”, or are we missing key insights?
More broadly, what’s the relationship between current approaches and AGI?
How software- and/or hardware-bottlenecked are we on AGI?
How compute- and/or data-efficient will AGI systems be?
How far off is AGI? How possible is it to time future tech developments? How continuous is progress likely to be?
How likely is it that AGI is in-paradigm for deep learning?
If AGI comes from a new paradigm, how likely is it that it arises late in the paradigm (when the relevant approach is deployed at scale in large corporations) versus early (when a few fringe people are playing with the idea)?
Should we expect warning shots? Would warning shots make a difference, and if so, would they be helpful or harmful?
To what extent are there meaningfully different paths to AGI, versus just one path? How possible (and how desirable) is it to change which path humanity follows to get to AGI?
Actors
How likely is it that AGI is first developed by a large established org, versus a small startup-y org, versus an academic group, versus a government?
How likely is it that governments play a role at all? What role would be desirable, if any? How tractable is it to try to get governments to play a good role (rather than a bad role), and/or to try to get governments to play a role at all (rather than no role)?
Specific actions like not scaling systems with 5% probalility of catastrophe if they have control over it and explaning everyone why they shouldn’t do it too. It’s just that my first reaction is that indispensable steps should be a priority. And so even though reconciliation of models is certainly useful for future solution, it seemed to me less cost effective than spreading less pessimistic model, for example. Again, it is just initial feeling and I can come up with scenarios where it makes sense to focus on model convergence, but I am not sure how are you weighting these scenarios. Is it that just making everyone think like Paul is impossible, or is civilisation of Pauls would end anyway, or are you already trying to spread awareness via other channels and this discussion supposed to be solution-focused… I guess at least last is true, because https://www.lesswrong.com/posts/CpvyhFy9WvCNsifkY/discussion-with-eliezer-yudkowsky-on-agi-interventions but then this discussion felt like too much about P(doom). My guess it’s something like “models that assign wrong probabilites may not destroy world themselves, but would be too slow to solve alignment before someone creates AGI on desktop”? And so discussing models is not much less useful, because all known actions are unlikely to help. But would like to hear what’s the plan is/was anyway.
It was all very interesting, but what was the goal of these discussions? I mean I had an impression that pretty much everyone assigned >5% probability to “if we scale we all die” so it’s already enough reason to work on global coordination on safety. Is the reasoning that the same mental process that assigned too low probability would not be able to come up with actual solution? Or something like “at the time they think their solution reduced probability of failure from 5% to 0.1% it would still be much higher”? Seems to be only possible if people don’t understand arguments about inner optimisators or what not, as opposed to disagreeing with them.
Changing one’s mind on P(doom) can be useful for people comparing across cause areas (e.g. Open Phil), but it’s not all that important for me and was not one of my goals.
Generally when people have big disagreements about some high-level question like P(doom), it means that they have very different underlying models that drive their reasoning within that domain. The main goal (for me) is to acquire underlying models that I can then use in the future.
Acquiring a new underlying model that I actually believe would probably be more important than the rest of my work in a full year combined. It would typically have significant implications on what sorts of proposals can and cannot work, and would influence what research I do for years to come. In the case of Eliezer’s model specifically, it would completely change what research I do, since Eliezer’s model specifically predicts that the research I do is useless (I think).
I didn’t particularly expect to actually acquire a new model that I believed from these conversations, but there was some probability of that, and I did expect that I would learn at least a few new things I hadn’t previously considered. I’m unfortunately quite bad at noticing my own “updates”, so I can’t easily point to examples. That being said, I’m confident that I would now be significantly better at Eliezer’s ITT than before the conversations.
What specific actions do you have in mind when you say “global coordination on safety”, and how much of the problem do you think these actions solve?
My own view is that ‘caring about AI x-risk at all’ is a pretty small (albeit indispensable) step. There are lots of decisions that hinge on things other than ‘is AGI risky at all’.
I agree with Rohin that the useful thing is trying to understand each other’s overall models of the world and try to converge on them, not p(doom) per se. I gave some examples here of some important implications of having more Paul-ish models versus more Eliezer-ish models.
More broadly, examples of important questions people in the field seem to disagree a lot about:
Alignment
How hard is alignment? What are the central obstacles? What kind of difficulty is it? (Is it hard like ‘building a secure OS that works on the first try’? Hard like ‘the engineering/logistics/implementation portion of the Manhattan Project’? Both? Some other option? Etc.)
What alignment research directions are potentially useful, and what plans for developing aligned AGI have a chance of working?
Deployment
What should the first AGI systems be aligned to do?
To what extent should we be thinking of “large disruptive act that upends the gameboard”, versus “slow moderate roll-out of regulations and agreements across a few large actors”?
Information spread
How important is research closure and opsec for capabilities-synergistic ideas? (Now, later, in the endgame, etc.)
Path to AGI
Is AGI just “current SotA systems like GPT-3, but scaled up”, or are we missing key insights?
More broadly, what’s the relationship between current approaches and AGI?
How software- and/or hardware-bottlenecked are we on AGI?
How compute- and/or data-efficient will AGI systems be?
How far off is AGI? How possible is it to time future tech developments? How continuous is progress likely to be?
How likely is it that AGI is in-paradigm for deep learning?
If AGI comes from a new paradigm, how likely is it that it arises late in the paradigm (when the relevant approach is deployed at scale in large corporations) versus early (when a few fringe people are playing with the idea)?
Should we expect warning shots? Would warning shots make a difference, and if so, would they be helpful or harmful?
To what extent are there meaningfully different paths to AGI, versus just one path? How possible (and how desirable) is it to change which path humanity follows to get to AGI?
Actors
How likely is it that AGI is first developed by a large established org, versus a small startup-y org, versus an academic group, versus a government?
How likely is it that governments play a role at all? What role would be desirable, if any? How tractable is it to try to get governments to play a good role (rather than a bad role), and/or to try to get governments to play a role at all (rather than no role)?
Specific actions like not scaling systems with 5% probalility of catastrophe if they have control over it and explaning everyone why they shouldn’t do it too. It’s just that my first reaction is that indispensable steps should be a priority. And so even though reconciliation of models is certainly useful for future solution, it seemed to me less cost effective than spreading less pessimistic model, for example. Again, it is just initial feeling and I can come up with scenarios where it makes sense to focus on model convergence, but I am not sure how are you weighting these scenarios. Is it that just making everyone think like Paul is impossible, or is civilisation of Pauls would end anyway, or are you already trying to spread awareness via other channels and this discussion supposed to be solution-focused… I guess at least last is true, because https://www.lesswrong.com/posts/CpvyhFy9WvCNsifkY/discussion-with-eliezer-yudkowsky-on-agi-interventions but then this discussion felt like too much about P(doom). My guess it’s something like “models that assign wrong probabilites may not destroy world themselves, but would be too slow to solve alignment before someone creates AGI on desktop”? And so discussing models is not much less useful, because all known actions are unlikely to help. But would like to hear what’s the plan is/was anyway.