RogerDearnaley comments on A Case for the Least Forgiving Take On Alignment

RogerDearnaley 5 Dec 2023 10:59 UTC
3 points
−1
I agree completely about AGI being like Turing completeness, that there’s a threshold. However, there are programming languages that are technically Turing complete, but only a masochist would actually try to use. So there could be a fire alarm, while the AGI is still writing all the (mental analogs of) domain-specific languages and libraries it needs. My evidence for this is humans: we’re over the threshold, but barely so, and it takes years and years of education to turn us into quantum field theorist or aeronautical engineer.

But my main crux is that I think we already have a good idea how to align an AGI: value learning. See my post Requirements for a STEM-capable AGI Value Learner. That’s an alignment technique that only works on things over the threshold.