I think this position made a lot of sense a few years ago when we had no idea how a superintelligence might be built. The LLM paradigm has made me more hopeful about this. We’re not doing a random draw from the space of all possible intelligences, where you would expect to find eldritch alien weirdness. LLMs are trained to imitate humans; that’s their nature. I’ve been very positively surprised by the amount of empathy and emotional intelligence LLMs like Claude display; I think alignment research is on the right track.
What happens when you take an LLM that successfully models empathy and emotional intelligence and you dramatically scale up its IQ? Impossible to be certain, but I don’t think it’s obvious that it loses all its empathy and decides to only care about itself. As humans get smarter, do they become more amoral and uncaring? Quite the opposite: people like Einstein and Feynman are some of our greatest heroes. An ASI far smarter than Einstein might be more empathetic than him, rather than less.
To correctly specify [empathy] within the space of all possible ways that an ASI could view us requires putting a lot of bits of information into the ASI.
To me this reads like “we need to design a schematic for empathy that we can program into the ASI.” That’s not how we train LLMs, though. Instead we show them lots of examples of the behavior we want them to exhibit. As you pointed out, empathy and love are hard to specify and define. But that’s OK—LLMs don’t primarily work off of specifications and definitions. They just want lots of examples.
This isn’t to say the problem is solved. Alignment research still has more to do. I agree that we should put more resources into it. But I think there’s reason to be hopeful about the current approach.
I think this position made a lot of sense a few years ago when we had no idea how a superintelligence might be built. The LLM paradigm has made me more hopeful about this. We’re not doing a random draw from the space of all possible intelligences, where you would expect to find eldritch alien weirdness. LLMs are trained to imitate humans; that’s their nature. I’ve been very positively surprised by the amount of empathy and emotional intelligence LLMs like Claude display; I think alignment research is on the right track.
What happens when you take an LLM that successfully models empathy and emotional intelligence and you dramatically scale up its IQ? Impossible to be certain, but I don’t think it’s obvious that it loses all its empathy and decides to only care about itself. As humans get smarter, do they become more amoral and uncaring? Quite the opposite: people like Einstein and Feynman are some of our greatest heroes. An ASI far smarter than Einstein might be more empathetic than him, rather than less.
To me this reads like “we need to design a schematic for empathy that we can program into the ASI.” That’s not how we train LLMs, though. Instead we show them lots of examples of the behavior we want them to exhibit. As you pointed out, empathy and love are hard to specify and define. But that’s OK—LLMs don’t primarily work off of specifications and definitions. They just want lots of examples.
This isn’t to say the problem is solved. Alignment research still has more to do. I agree that we should put more resources into it. But I think there’s reason to be hopeful about the current approach.