What can we learn from Lex Fridman’s interview with Sam Altman?

These are my personal thougths about this interview.

Epistemic status: I neither consider myself a machine-learning expert, nor am I an alignment expert. My focus is on outreach: explaining AI safety to the general public and professionals outside of the AI safety community. So an interview like this one is important material for me to both understand the situation myself and explain it to others. After watching it, I’m somewhat confused. There were bits in this talk that I liked and others that disturbed me. There seems to be a mix of humbleness and hubris, of openly acknowledging AI risks and downplaying some elements of them. I am unsure how open and honest Sam Altman really was. I don’t mean to criticize. I want to understand what OpenAI’s and Sam Altman’s stance towards AI safety really is.

Below I list transcriptions of the parts that seemed most relevant for AI safety and my thoughts/​questions about them. Maybe you can help me better understand this by commenting.

[23:55] Altman: “Our degree of alignment increases faster than our rate of capability progress, and I think that will become more and more important over time.”

I don’t really understand what this is supposed to mean. What’s a “degree of alignment”? How can you meaningfully compare it with “rate of capability progress”? To me, this sounds a lot like marketing: “We know we are dealing with dangerous stuff, so we are extra careful.” Then again, it’s probably hard to explain this in concrete terms in an interview.

[24:40] Altman: “I do not think we have yet discovered a way to align a super powerful system. We have something that works for our current scale: RLHF.”

I find this very open and honest. Obviously, he not only knows about the alignment problem, but openly admits that RLHF is not the solution to aligning an AGI. Good!

[25:10] Altman: “It’s easy to talk about alignment and capability as of orthogonal vectors, they’re very close: better alignment techniques lead to better capabilities, and vice versa. There are cases that are different, important cases, but on the whole I think things that you could say like RLHF or interpretability that sound like alignment issues also help you make much more capable models and the division is just much fuzzier than people think.”

This, I think, contains two messages: “Capabilities research and alignment research are intertwined” and “criticizing us for advancing capabilities so much is misguided, because we need to do that in order to align AI”. I understand the first one, but I don’t subscribe to the second one, see discussion below.

[47:53] Fridman: “Do you think it’s possible that LLMs really is the way we build AGI?”
Altman: “I think it’s part of the way. I think we need other super important things … For me, a system that cannot significantly add to the sum total of scientific knowledge we have access to – kind of discover, invent, whatever you want to call it – new, fundamental science, is not a superintelligence. … To do that really well, I think we need to expand on the GPT paradigm in pretty important ways that we’re still missing ideas for. I don’t know what those ideas are. We’re trying to find them.”

This is pretty vague, which is understandable. However, it seems to indicate to me that the current, relatively safe, mostly myopic GPT approach will be augmented with elements that may make their approach much more dangerous, like maybe long term memory and dynamic learning. This is highly speculative, of course.

[49:50] Altman: “The thing that I’m so excited about is not that it’s a system that kind of goes off and does its own thing but that it’s this tool that humans are using in this feedback loop … I’m excited about a world where AI is an extension of human will and an amplifier of our abilities and this like most useful tool yet created, and that is certainly how people are using it … Maybe we never build AGI but we just make humans super great. Still a huge win.”

The last sentence is the most promising one in the whole interview from my point of view. It seems to indicate that Sam Altman and OpenAI are willing to stop short of creating an AGI if they can be convinced that alignment isn’t solved and creating an AGI would be suicidal. They may also be willing to agree on “red lines” if there is a consensus about them among leading developers.

[54:50] Fridman refers to Eliezer Yudkowsky’s view that AI will likely kill all of humanity.
Altman: “I think there’s some chance of that and it’s really important to acknowledge it because if we don’t talk about it, if we don’t treat it as potentially real, we won’t put enough effort into solving it. And I think we do have to discover new techniques to be able to solve it … The only way I know how to solve a problem like this is iterating our way through it, learning early, and limiting the number of one-shot-to-get-it-right scenarios that we have.”

I give Sam Altman a lot of credit for taking Eliezer’s warnings seriously, at least verbally. However, he seems to rule out the approach of solving the alignment problem in theory (or acknowledging its theoretical unsolvability), relying on a trial and error approach instead. This I think is very dangerous. “Limiting the number of one-shot-to-get-it-right scenarios” doesn’t do it in my eyes if that number doesn’t go down to zero.

[59:46] Fridman asks about take-off speed. Altman: “If we imagine a two-by-two matrix of short timelines till AGI starts /​long timelines till AGI starts [and] slow take-off/​fast take-off … what do you think the safest quadrant will be? … Slow take-off/​short timelines is the most likely good world and we optimized the company to have maximum impact in that world, to try to push for that kind of world, and the decisions we make are … weighted towards that. … I’m very afraid of the fast take-offs. I think in the long time-lines it’s hard to have a slow take-off, there’s a bunch of other problems too.”

Here he seems to imply that the two axes aren’t independent: Short timelines supposedly lead to a slow take-off, and vice versa. I don’t see why that should be the case: If an AI gets out of control, that’s it, regardless of when that happens and how fast. I understand the idea of an incremental approach to AI safety, but I don’t think that a high (if not to say breakneck) speed of deployment like OpenAI has demonstrated in the past helps in any way. He seems to use this argument to justify that speed on the grounds of improved safety, which I strongly feel is wrong.

[1:09:00] Fridman asks what could go wrong with an AI. Altman: “It would be crazy not to be a little bit afraid. And I empathize with people who are a lot afraid. … The current worries that I have are that there are going to be disinformation problems or economic shocks or something else at a level far beyond anything we’re prepared for. And that doesn’t require superintelligence, that doesn’t require a super deep alignment problem and the machine waking up trying to deceive us. And I don’t think it gets enough attention. … How would we know if the flow we have on twitter … like LLMs direct whatever’s flowing through that hive mind? … As on twitter, so everywhere else eventually … We wouldn’t [know]. And that’s a real danger. … It’s a certainty there are soon going to be a lot of capable open-sourced LLMs with very few to none safety controls on them … you can try regulatory approaches, you can try with more powerful AIs to detect this stuff happening, I’d like us to try a lot of things very soon.”

This is not really related to AGI safety and I’m not sure if I’m misinterpreting this. But it seems to imply something like “we need to develop our AGI fast because it is needed to combat bad actors and others are less safety-concerned than we are”. If I’m correct, this is another defense of fast deployment, if a more subtle one.

[1:11:19] Fridman asks how OpenAI is prioritizing safety in the face of competitive and other pressures. Altman: “You stick with what you believe and you stick to your mission. I’m sure people will get ahead of us in all sorts of ways and take shortcuts we’re not gonna take. … I think there are going to be many AGIs in the world so it’s not like outcompete everyone. We’re gonna contribute one, other people are gonna contribute some. I think multiple AGIs in the world with some differences in how they’re built and what they do what they’re focused on, I think that’s good. We have a very unusual structure though, we don’t have this incentive to capture unlimited value. I worry about the people who do, but, you know, hopefully it’s all gonna work out.”

I felt somewhat uneasy listening to this. It sounds a lot like “we’re the good guys, so don’t criticize us”. It also feels like downplaying the actual competitive pressure, which OpenAI have increased themselves. Does Sam Altman really believe in a stable world where there are many AGIs competing with each other, some of them with only minimal safety, and all goes well? This is either very naïve or somewhat dishonest in my opinion.

[1:14:50] Altman (talking about the transformation from non-profit to “capped” for-profit): “We needed some of the benefits of capitalism, but not too much.”

[1:16:00] Altman (talking about competition): “Right now there’s like extremely fast and not super deliberate motion inside of some of these companies, but already I think people are, as they see the rate of progress, already people are grappling with what’s at stake here. And I think the better angels are going to win out. … The incentives of capitalism to create and capitalize on unlimited value, I’m a little afraid of, but again, no one wants to destroy the world. … We’ve got the Moloch problem, on the other hand we’ve got people who are very aware of that, and I think, a lot of healthy conversation about how can we collaborate to minimize some of these very scary downsides.”

Again, he depicts OpenAI as being ethically “better” than the competition because of the capped profit rule (which, as far as I understand, has a very high ceiling). This in itself sounds very competitive. On the other hand, he seems open for collaboration, which is good.

[1:17:40] Fridman asks whether power might corrupt Altman/​OpenAI. Altman: “For sure. I think you want decisions about this technology and certainly decisions about who is running this technology to become increasingly democratic over time. We haven’t figured out quite how to do this. But part of the reason for deploying like this is to get the world to have time to adapt and to reflect and to think about this, to pass regulations, for institutions to come up with new norms, for the people working out together. That is a huge part of why we deploy even though many of the AI Safety people you referenced earlier think it’s really bad. Even they acknowledge that this is like of some benefit. But I think any version of ‘one person is in control of this’ is really bad. … I don’t have and I don’t want like any super voting power, any special … control of the board or anything like that at OpenAI.

Again, there seem to be good and bad messages here. I think it’s good that he acknowledges the enormous power OpenAI has and that it needs democratic regulation. But he again justifies the high deployment speed by arguing that this gives the world “time to adapt”. It think this is a contradiction. If he really wanted to give the world time to adapt, why didn’t they launch ChatGPT, then wait two or three years before launching Bing Chat/​GPT-4? Sam Altman would probably argue “we couldn’t because the competition is less safety concerned than we are, so we need to stay ahead”. This is of course speculative on my side, but I don’t like this kind of thinking at all.

[1:44:30] Fridman asks if an AGI could successfully manage a society based on centralized planning Soviet Union-style. Altman: “That’s perfect for a superintelligent AGI. … It might be better [than the human Soviet Union leaders], I expect it’d be better, but not better than a hundred, a thousand AGIs sort of in a liberal democratic system. … Also, how much of that could happen internally in one superintelligent AGI? Not so obvious. … Of course [competition] can happen with multiple AGIs talking to each other.”

Again, he points to a world with many competing AGIs in some kind of “libertarian utopia”. I have no idea how anyone could think this would be a stable situation. Even we humans have great difficulty creating stable, balanced societies, and we all have more or less the same level of intelligence. How is this supposed to work if competing AGIs can self-improve and/​or amass power? I can’t think of a stable world state which is not dominated by a single all-powerful AGI. But this may of course be due to my lack of imagination/​knowledge.

[1:45:35] Fridman mentions Stuart Russell’s proposal that an AI should be uncertain about its goals. Altman: “That feels important.” Fridman asks if uncertainty about its goals and values can be hard-engineered into an AGI. Altman: “The details really matter, but as I understand them, yes I do [think it is possible].”

[1:46:08] Fridman: “What about the off-switch?” Altman: “I’m a fan. … We can absolutely take a model back off the internet. … We can turn an API off.”

These are minor points and I may be misunderstanding them, but they seem to point towards a somewhat naïve view on AI safety.

[1:46:40] Fridman asks if they worry about “terrible usecases” by millions of users. Altman: “We do worry about that a lot. We try to figure it out … with testing and red teaming ahead of time how to avoid a lot of those, but I can’t emphasize enough how much the collective intelligence and creativity of the world will beat OpenAI and all of the red-teamers we can hire. So we put it out, but we put it out in a way we can make changes.”

[2:05:58] Fridman asks about the Silicon Valley Bank. Altman: “It is an example of where I think you see the dangers of incentive misalignment, because as the Fed kept raising [the interest rate], I assume that the incentives on people working at SVB to not sell at a loss their ‘super safe’ bonds which are now down to 20% or whatever … that’s like a classic example of incentive misalignment … I think one takeaway from SVB is how fast the world changes and how little our experts and leaders understand it … that is a very tiny preview of the shifts that AGI will bring. … I am nervous about the speed of these changes and the speed with which our institutions can adapt, which is part of why we want to start deploying these systems really early while they’re really weak, so that people have as much time as possible to do this. I mean it’s really scary to have nothing, nothing, nothing and then drop a super powerful AGI all at once on the world.”

Again, he’s arguing for quick deployment in the name of safety. This more and more feels like a justification for OpenAI’s approach, instead of an open discussion of the arguments for and against it. But that’s probably to be expected from an interview like this.

All in all, I feel a bit uneasy about this interview. In parts, it sounds a lot like what someone would say who wants to be seen as cautious and rational, but actually only wants to stay ahead of the competition whatever the cost and uses this talk to justify their breakneck-speed strategy. On the other hand, there are a lot of things Sam Altman says that show he actually understands his responsibility and is open for cooperation and regulation, which I am very grateful of. Also, most leaders in his position would probably be less open about the risks of their technology.

What’s your take?