TAG comments on The genie knows, but doesn’t care

TAG 10 May 2023 12:09 UTC
1 point
0

“code in the high-level sentence, and let the AI figure it out.”

http://lesswrong.com/lw/rf/ghosts_in_the_machine/

It’s worth noting that using an AI’s semantic understanding of ethics to modify it’s motivational system is so unghostly, and unmysterious that it’s actually been done:

https://astralcodexten.substack.com/p/constitutional-ai-rlhf-on-steroids

But that doesn’t prove much, because it was never—not in 2023, not in 2013 -- the case that that kind of self-correction was necessarily an appeal to the supernatural. Using one part of a software system to modify another is not magic!

The superintelligence will be able to solve Semantics-in-General, but at that point if it isn’t already safe it will be rather late to start working on safety.

We have AIs with very good semantic understanding that haven’t killed us, and we are working on safety.
What links here?
- TAG's comment on Contra Yudkowsky on Epistemic Conduct for Author Criticism by Zack_M_Davis (16 Sep 2023 14:55 UTC; -2 points)