StanislavKrym comments on A Reply to MacAskill on “If Anyone Builds It, Everyone Dies”

StanislavKrym 28 Sep 2025 18:57 UTC
10 points
0
Labs’ attempts to fix things seem to have a sweep-under-the-rug property, rather than looking like they’re at all engaging with root causes.
I illustrated this very point in another comment. Take sycophancy, for example. The team who did its best at solving it was Moonshot, who, if I understand correctly, decided to use RLVR and self-critique instead of RLHF. Why wasn’t this idea attempted by, say, Anthropic or OpenAI?