This is a long comment! I was glad to have read it, but am a bit confused about your numbers seeming different from the ones I objected to. You said:
1 million AIs somewhat smarter than humans have spent 100 years each working on the problem (and coordinating etc?)
Then in this comment you say:
If we get the equivalent of 20 serial years of DAI-level labor (from >100k DAI level parallel agents given a proportional amount of compute) before +3 SDs over DAI we’re fine because we have a scalable solution to alignment. Otherwise takeover. (This is somewhat more conservative than my actual view.)
Here you now say 20 years, and >100k DAI level parallel agents. That’s a factor of 5 and a factor of 10 different! That’s a huge difference! Maybe your estimates are conservative enough to absorb a factor of 50 in thinking time without changing the probability that much?
I think I still disagree with your estimates, but before I go into them, I kind of want to check whether I am missing something, given that I currently think you are arguing for a resources allocation that’s 50x smaller than what I thought I was arguing against.
I was glad to have read it, but am a bit confused about your numbers seeming different from the ones I objected to.
I gave “1 million AIs somewhat smarter than humans with the equivalent of 100 years each” as an example of a situation I thought wouldn’t count as “anything like current techniques/understanding”. In this comment, I picked a lower number which is maybe my best guess for an amount of labor which eliminates most of the risk by a given level of capability.
I do think that “a factor of 5 and a factor of 10 different” is within my margin of error for amount of labor you need. (Note that there might be aggressively diminishing returns on parallel labor, though possibly not due to very superhuman coordination abilities by AI.)
My modeling/guesses are pretty shitty in this comment (I was just picking some numbers to see how things work out), so if that’s a crux, I should probably try to be thoughtful (I was trying to write this quickly to get something written up).
This makes sense, but I think I am still a bit confused. My comment above was mostly driven by doing a quick internal fermi estimate myself for whether “1 million AIs somewhat smarter than humans have spent 100 years each working on the problem” is a realistic amount of work to get out of the AIs without slowing down, and arriving at the conclusion that this seems very unlikely across a relatively broad set of worldviews.
We can also open up the separate topic of how much work might be required to make real progress on superalignment in time, or whether this whole ontology makes sense, but I was mostly interested in doing a fact-check of “wait, that really sounds like too much, do you really believe this number is realistic?”.
I still disagree, but I have much less of a “wait, this really can’t be right” reaction if you mean the number that’s 50x lower.
This is a long comment! I was glad to have read it, but am a bit confused about your numbers seeming different from the ones I objected to. You said:
Then in this comment you say:
Here you now say 20 years, and >100k DAI level parallel agents. That’s a factor of 5 and a factor of 10 different! That’s a huge difference! Maybe your estimates are conservative enough to absorb a factor of 50 in thinking time without changing the probability that much?
I think I still disagree with your estimates, but before I go into them, I kind of want to check whether I am missing something, given that I currently think you are arguing for a resources allocation that’s 50x smaller than what I thought I was arguing against.
I gave “1 million AIs somewhat smarter than humans with the equivalent of 100 years each” as an example of a situation I thought wouldn’t count as “anything like current techniques/understanding”. In this comment, I picked a lower number which is maybe my best guess for an amount of labor which eliminates most of the risk by a given level of capability.
I do think that “a factor of 5 and a factor of 10 different” is within my margin of error for amount of labor you need. (Note that there might be aggressively diminishing returns on parallel labor, though possibly not due to very superhuman coordination abilities by AI.)
My modeling/guesses are pretty shitty in this comment (I was just picking some numbers to see how things work out), so if that’s a crux, I should probably try to be thoughtful (I was trying to write this quickly to get something written up).
This makes sense, but I think I am still a bit confused. My comment above was mostly driven by doing a quick internal fermi estimate myself for whether “1 million AIs somewhat smarter than humans have spent 100 years each working on the problem” is a realistic amount of work to get out of the AIs without slowing down, and arriving at the conclusion that this seems very unlikely across a relatively broad set of worldviews.
We can also open up the separate topic of how much work might be required to make real progress on superalignment in time, or whether this whole ontology makes sense, but I was mostly interested in doing a fact-check of “wait, that really sounds like too much, do you really believe this number is realistic?”.
I still disagree, but I have much less of a “wait, this really can’t be right” reaction if you mean the number that’s 50x lower.