I think there are two separate claims being made here.
Don’t be sure that your self created metaethical system works
Be confident in using commonly used metaethical systems
I can get not being overly committed that your own metaethical system is the ultimate truth. But it does not follow that established and commonly used systems are any good either. Considering that for a large number of people, their source of ethics is whatever they were indoctrinated to believe in as children, I would not place a lot of confidence in existing metaethics even if I am not confident in my own.
Edit: The main suggestion of this piece is #1 but the using existing crypto methods seems to suggest #2. The debate then becomes about what should one do when inaction/further research is not an option, when you have to make a decision.
We have all heard the “AI just predicts the next word/token” and “AI just thought of X because it is in the training data” argument. I have a few ideas, first-draft stage, of experiments that might address this.
1) People invent artificial languages aka conlang (short for constructed language). The most famous examples being Esperanto, Klingon, and Tolkien’s Elvish. Someone can invent a new conlang that didn’t exist till today, and by extension wasn’t present in any training data of any LLM, and explain the rules to an LLM (after the training mode has already been completed). The language can even have new script, or at the very least new words and grammar. Then we can check if the LLM can talk in that language.
Potential Failure modes would be do design a language with ambiguous grammar, where there are multiple ways of saying the same thing; and not explaining the language to the LLM properly, like poor documentation.
2) Someone can invent a new game with a strategic element. Like chess with different pieces/board size, or mafia, or something. It has to be a completely new game that didn’t exist in history before, thus didn’t exist in the training data. Then explain the rules to an LLM and see if it plays it correctly. The LLM doesn’t have to display perfect strategy, just that it always makes legal moves and doesn’t violate the rules of the game (like ChatGPT 2.0 used to make illegal moves if you tried playing chess with it).
If LLMs do pass, which they might not be able to do for all we know yet, then it would show that “learning” in the colloquial English meaning is different from “learning” in the Machine Learning meaning (mistake 24 in Yudkowsky’s “37 Ways that Words can be Wrong”). AI that is past the machine learning phase can still do “learning” in the colloquial English sense.