To start off, this was just an example of verification being easier than coming up with the plan. There may be different paradigms under which we can verify the plan but the general trend seems to be verification in pretty much anything is easier than generation.
Also to your points:
I think a start would be proving the system truly follows some arbitrary goal. I don’t think learning what arrangement of light patterns consists of a human is that hard for neural nets. This was done a decade ago.
The goal could be defined as a group of typical rational humans would eventually agree the world state is good after considering all evidence and given infinite paper to write down their thoughts. This can obviously fail but I think it’s distinct from obvious paper clipping failures. I can still see failure modes here, but I don’t see why they are certain failures.
I also don’t agree with the bizarre inhuman part. I at least think it’s just an unknown unknown at worst. For example, the “faciest” face looks like a human face.
In addition, the goal won’t have precise definitions because they aren’t precise but we also can’t exclude our neural net understanding some form of the goal. I’m also sure there are mathematical systems that can deal with imprecise goals. For example, QACI is doing this.
I also don’t see what a C program has to do with this. Proving 2+2=4 in math means 2+2=4 whereas a computer program is a set of instructions which you believe give some desired result.
To start off, this was just an example of verification being easier than coming up with the plan. There may be different paradigms under which we can verify the plan but the general trend seems to be verification in pretty much anything is easier than generation.
Also to your points:
I think a start would be proving the system truly follows some arbitrary goal. I don’t think learning what arrangement of light patterns consists of a human is that hard for neural nets. This was done a decade ago.
The goal could be defined as a group of typical rational humans would eventually agree the world state is good after considering all evidence and given infinite paper to write down their thoughts. This can obviously fail but I think it’s distinct from obvious paper clipping failures. I can still see failure modes here, but I don’t see why they are certain failures.
I also don’t agree with the bizarre inhuman part. I at least think it’s just an unknown unknown at worst. For example, the “faciest” face looks like a human face.
In addition, the goal won’t have precise definitions because they aren’t precise but we also can’t exclude our neural net understanding some form of the goal. I’m also sure there are mathematical systems that can deal with imprecise goals. For example, QACI is doing this.
I also don’t see what a C program has to do with this. Proving 2+2=4 in math means 2+2=4 whereas a computer program is a set of instructions which you believe give some desired result.