Consider the task “Prove Fermats last theorem”. This task is arguably human level task. Humans managed to do it. However it took some very smart humans a long time. Suppose you need 10,000 examples. You probably can’t get 10,000 examples of humans solving problems like this. So you train the system on easier problems. (maybe exam questions? ) You now have a system that can solve exam level questions in an instant, but can’t prove Fermats last theorem at all. You then train on the problems that can be decomposed into exam level questions in an hour. (ie the problems a reasonably smart human can answer in an hour, given access to this machine. ) Repeat a few more times. If you have mind uploading, and huge amounts of compute (and no ethical concerns) you could skip the imitation step. You would get an exponentially huge number of copies of some uploaded mind(s) arranged in a tree structure, with questions being passed down, and answers being passed back. No single mind in this structure experiences more than 1 subjective hour.

If you picked the median human by mathematical ability, and put them in this setup, I would be rather surprised if they produced a valid proof of Fermats last theorem. (and if they did, I would expect it to be a surprisingly easy proof that everyone had somehow missed. )

There is no way that IDA can compete with unaligned AI while remaining aligned. The question is, what useful things can IDA do?

If you picked the median human by mathematical ability, and put them in this setup, I would be rather surprised if they produced a valid proof of Fermats last theorem.

I would too. IDA/HCH doesn’t have to work with the median human, though. It’s ok to pick an excellent human, who has been trained for being in that situation. Paul has argued that it wouldn’t be that surprising if some humans could be arbitrarily competent in an HCH-setup, even if some couldn’t.

Epistemic status: Intuition dump and blatant speculation

Suppose that instead of the median human, you used Euclid in the HCH. (Ancient greek, invented basic geometry) I would still be surprised if he could produce a proof of fermat’s last theorem (given a few hours for each H). I would suspect that there are large chunks of modern maths that he would be unable to do. Some areas of modern maths have layers of concepts built on concepts. And in some areas of maths, just reading all the definitions will take up all the time. Assuming that there are large and interesting branches of maths that haven’t been explored yet, the same would hold true for modern mathematicians. Of course, it depends how big you make the tree. You could brute force over all possible formal proofs, and then set a copy on checking the validity of each line. But at that point, you have lost all alignment, someone will find their proof is a convincing argument to pass the message up the tree.

I feel that it is unlikely that any kind of absolute threshold lies between the median human, and an unusually smart human, given that the gap is small in an absolute sense.

In answer to question 2)

Consider the task “Prove Fermats last theorem”. This task is arguably human level task. Humans managed to do it. However it took some very smart humans a long time. Suppose you need 10,000 examples. You probably can’t get 10,000 examples of humans solving problems like this. So you train the system on easier problems. (maybe exam questions? ) You now have a system that can solve exam level questions in an instant, but can’t prove Fermats last theorem at all. You then train on the problems that can be decomposed into exam level questions in an hour. (ie the problems a reasonably smart human can answer in an hour, given access to this machine. ) Repeat a few more times. If you have mind uploading, and huge amounts of compute (and no ethical concerns) you could skip the imitation step. You would get an exponentially huge number of copies of some uploaded mind(s) arranged in a tree structure, with questions being passed down, and answers being passed back. No single mind in this structure experiences more than 1 subjective hour.

If you picked the median human by mathematical ability, and put them in this setup, I would be rather surprised if they produced a valid proof of Fermats last theorem. (and if they did, I would expect it to be a surprisingly easy proof that everyone had somehow missed. )

There is no way that IDA can compete with unaligned AI while remaining aligned. The question is, what useful things can IDA do?

How do you know that? Do you mean to say, “I really don’t think IDA can compete with unaligned AI while remaining aligned”?

I would too. IDA/HCH doesn’t have to work with the median human, though. It’s ok to pick an excellent human, who has been trained for being in that situation. Paul has argued that it wouldn’t be that surprising if some humans could be arbitrarily competent in an HCH-setup, even if some couldn’t.

Epistemic status: Intuition dump and blatant speculationSuppose that instead of the median human, you used Euclid in the HCH. (Ancient greek, invented basic geometry) I would still be surprised if he could produce a proof of fermat’s last theorem (given a few hours for each H). I would suspect that there are large chunks of modern maths that he would be unable to do. Some areas of modern maths have layers of concepts built on concepts. And in some areas of maths, just reading all the definitions will take up all the time. Assuming that there are large and interesting branches of maths that haven’t been explored yet, the same would hold true for modern mathematicians. Of course, it depends how big you make the tree. You could brute force over all possible formal proofs, and then set a copy on checking the validity of each line. But at that point, you have lost all alignment, someone will find their proof is a convincing argument to pass the message up the tree.

I feel that it is unlikely that any kind of absolute threshold lies between the median human, and an unusually smart human, given that the gap is small in an absolute sense.