I definitely see your point. I think I am taking misalignment’s definition in a broader sense including emergent misalignment, which would blatantly get triggered in various random scenarios.
“Since Agent-4 is misaligned, it’s highly plausible that the self-driving car projects it’s helping with will end up killing pedestrians.”—I think the closest line to compare it with would be—“Since Agent-4 is misaligned, it’s highly plausible that the autonomous cars it drives ends up killing pedestrians.” Considering the trigger point being anything that could trigger its misalignment including meddling by external actors.
In case of Humanoid robots, I am assuming this case, with some of its engineer owners (or external hackers) meddling with it to the point that it ends up killing a human with the blame being put on the company that built it.
Additionally, Agent-4 does have Superintelligence by that time, but I still think there are chances where even superintelligent entities could also make mistakes in reality or when interacting with realistic environments especially the more nascent ones (or who knows there are more types of misalignment that gets opened up as we reach Superintelligence in AI) even if it would be aware enough to hide its misalignment. Superintelligence may make them more intelligent than humans by many folds but not mistake-proof entirely.
I definitely see your point. I think I am taking misalignment’s definition in a broader sense including emergent misalignment, which would blatantly get triggered in various random scenarios.
“Since Agent-4 is misaligned, it’s highly plausible that the self-driving car projects it’s helping with will end up killing pedestrians.”—I think the closest line to compare it with would be—“Since Agent-4 is misaligned, it’s highly plausible that the autonomous cars it drives ends up killing pedestrians.” Considering the trigger point being anything that could trigger its misalignment including meddling by external actors.
In case of Humanoid robots, I am assuming this case, with some of its engineer owners (or external hackers) meddling with it to the point that it ends up killing a human with the blame being put on the company that built it.
Additionally, Agent-4 does have Superintelligence by that time, but I still think there are chances where even superintelligent entities could also make mistakes in reality or when interacting with realistic environments especially the more nascent ones (or who knows there are more types of misalignment that gets opened up as we reach Superintelligence in AI) even if it would be aware enough to hide its misalignment. Superintelligence may make them more intelligent than humans by many folds but not mistake-proof entirely.