If you actually believe the sharp left turn argument holds water, where is the evidence?
As as I said earlier this evidence must take a specific form, as evidence in the historical record
Hold on; why? Even for simple cases of goal misspecification, the misspecification may not become obvious without a sufficiently OOD environment;
Given any practical and reasonably aligned agent, there is always some set of conceivable OOD environments where that agent fails. Who cares? There is a single success criteria: utility in the real world! The success criteria is not “is this design perfectly aligned according to my adversarial pedantic critique”.
The sharp left turn argument uses the analogy of brain evolution misaligned to IGF to suggest/argue for doom from misaligned AGI. But brains enormously increased human fitness rather than the predicted decrease, so the argument fails.
In worlds where 1. alignment is very difficult, and 2. misalignment leads to doom (low utility) this would naturally translate into a great filter around intelligence—which we do not observe in the historical record. Evolution succeeded at brain alignment on the first try.
And in the human case, why does it not suffice to look at the internal motivations humans have, and describe plausible changes to the environment for which those motivations would then fail
I think this entire line of thinking is wrong—you have little idea what environmental changes are plausible and next to no idea of how brains would adapt.
On the other hand, something like uploading I would expect to completely shatter any relation our behavior has to IGF maximization.
When you move the discussion to speculative future technology to support the argument from a historical analogy—you have conceded that the historical analogy does not support your intended conclusion (and indeed it can not, because homo sapiens is an enormous alignment success).
It sounds like you’re arguing that uploading is impossible, and (more generally) have defined the idea of “sufficiently OOD environments” out of existence. That doesn’t seem like valid thinking to me.
Of course i’m not arguing that uploading is impossible, and obviously there are always hypothetical “sufficiently OOD environments”. But from the historical record so far we can only conclude that evolution’s alignments of brains was robust enough compared to the environment distribution shift encountered—so far. Naturally that could all change in the future, given enough time, but piling in such future predictions is clearly out of scope for an argument from historical analogy.
These are just extremely different:
an argument from historical observations
an argument from future predicted observations
It’s like I’m arguing that given that we observed the sequence 0,1,3,7 the pattern is probably 2^N-1, and you arguing that it isn’t because you predict the next digit is 31.
Regardless uploads are arguably sufficiently categorically different that its questionable how they even relate to evolutionary success of homo sapien brain alignment to genetic fitness (do sims of humans count for genetic fitness? but only if DNA is modeled in some fashion? to what level of approximation? etc.)
Given any practical and reasonably aligned agent, there is always some set of conceivable OOD environments where that agent fails. Who cares? There is a single success criteria: utility in the real world! The success criteria is not “is this design perfectly aligned according to my adversarial pedantic critique”.
The sharp left turn argument uses the analogy of brain evolution misaligned to IGF to suggest/argue for doom from misaligned AGI. But brains enormously increased human fitness rather than the predicted decrease, so the argument fails.
In worlds where 1. alignment is very difficult, and 2. misalignment leads to doom (low utility) this would naturally translate into a great filter around intelligence—which we do not observe in the historical record. Evolution succeeded at brain alignment on the first try.
I think this entire line of thinking is wrong—you have little idea what environmental changes are plausible and next to no idea of how brains would adapt.
When you move the discussion to speculative future technology to support the argument from a historical analogy—you have conceded that the historical analogy does not support your intended conclusion (and indeed it can not, because homo sapiens is an enormous alignment success).
It sounds like you’re arguing that uploading is impossible, and (more generally) have defined the idea of “sufficiently OOD environments” out of existence. That doesn’t seem like valid thinking to me.
Of course i’m not arguing that uploading is impossible, and obviously there are always hypothetical “sufficiently OOD environments”. But from the historical record so far we can only conclude that evolution’s alignments of brains was robust enough compared to the environment distribution shift encountered—so far. Naturally that could all change in the future, given enough time, but piling in such future predictions is clearly out of scope for an argument from historical analogy.
These are just extremely different:
an argument from historical observations
an argument from future predicted observations
It’s like I’m arguing that given that we observed the sequence 0,1,3,7 the pattern is probably 2^N-1, and you arguing that it isn’t because you predict the next digit is 31.
Regardless uploads are arguably sufficiently categorically different that its questionable how they even relate to evolutionary success of homo sapien brain alignment to genetic fitness (do sims of humans count for genetic fitness? but only if DNA is modeled in some fashion? to what level of approximation? etc.)
Uploading is impossible because the cat ate the Internet cable again
Would you say it’s … _cat_egorically impossible?