This interview has a section dedicated to AI safety (7 min starting from 14:56). Ilya is now the co-lead of the OpenAI “superalignment” effort, and his thinking will likely be particularly influential in how this effort evolves.
What he is saying seems to be somewhat different from what is in the consensus OpenAI “superalignment” documents. It’s compatible, but the emphasis is rather different. In particular, thinking about humans controlling or steering a superintelligent system is limited to an analogy of controlling a nuclear reactor to prevent a meltdown, and a more collaborative approach between humans and AIs seems to be emphasized instead.
(I am not sure when the interview has been recorded, but no earlier than July 6, since it mentions Introducing Superalignment.)
Here is my attempt at editing the YouTube transcript of that part of the conversation.
The truly interesting part starts at 20:07. He hopes that a collaboration with superintelligence could solve the issues of misuse (so, no, he is not aiming to make superintelligence alignable to arbitrary goals, designing the proper goals is likely to be a collaborative activity between humans and AIs). I’ve put some bold marks for emphasis.
15:03 Sven: it’s worthwhile to also talk about
AI safety, and OpenAI has released the
document just recently where you’re
one of the undersigners.
Sam has testified in front of
Congress.
What worries you most about AI
safety?
15:27 Ilya: Yeah I can talk about that.
So let’s take a step back and talk about
the state of the world.
So you know, we’ve had this AI research
happening, and it was exciting, and now
you have the GPT models, and now you all
get to play with all the different chatbots
and assistance and, you know, Bard
and ChatGPT, and they say okay that’s
pretty cool, it can do things;
and indeed they already are.
You can start perhaps worrying about the
implications of the tools that we have
today,
and I think that it is a very valid
thing to do,
but that’s not where I
allocate my concern.
16:14 The place where things get really tricky
is when
you imagine fast forwarding some number
of years, a decade let’s say,
how powerful will AI be?
Of course with this
incredible future power of AI which I
think will be difficult to imagine
frankly.
With an AI this powerful you could do
incredible amazing things
that are perhaps even outside of our
dreams.
Like if you can really have a
dramatically powerful AI.
But the place where things get
challenging
are directly connected to the power of
the AI. It is powerful, it is going to be
extremely unbelievably
powerful, and it is because of this power
that’s where the safety issues come up,
and I’ll mention
three I see… I personally see three…
like you know when you get so…
you alluded to the letter
that we posted at OpenAI a few days ago,
actually yesterday,
about what with… about some ideas that we
think
would be good to implement
to navigate the challenges of superintelligence.
17:46 Now what is superintelligence, why did we choose to use
the term “superintelligence”?
The reason is that superintelligence is
meant to convey something that’s not
just like an AGI. With AGI we said, well
you have something kind of like a person,
kind of like a co-worker.
Superintelligence is meant to convey
something far more capable than that.
When you have such a capability it’s
like can we even imagine how it will be?
But without question it’s going to be
unbelievably powerful,
it could be used to solve
incomprehensibly hard problems.
If it is used well, if we navigate the
challenges that superintelligence
poses, we could
radically improve the quality of life.
But the power of superintelligence is
so vast so the concerns.
18:37 The concern
number one
has been expressed a lot and this is the
scientific problem of alignment. You
might want to think of it from the as an
analog to nuclear safety.
You know you build a nuclear reactor,
you want to get the energy, you need to
make sure that it won’t melt down even
if there’s an earthquake and even if
someone tries to
I don’t know
smash a truck into it. (Sven: Yep.) So this is
the superintelligent safety and it must
be addressed in order to contain the
vast power of the superintelligence.
It’s called the alignment problem.
One of the suggestions that we had in
our… in the post
was
an approach that an international
organization could do to create various
standards at this very high level of
capability, and I want to make this other
point you know about the post and also
about
our CEO Sam Altman Congressional
testimony
where he advocated for regulation
of AI. The intention is primarily
to put rules and standards
of various kinds
on the very high level of capability.
You know you could maybe start looking
at GPT-4, but that’s not really what is
interesting,
what is relevant here, but something
which is vastly more powerful than that,
when you have a technology so powerful
it becomes obvious that you need to do
something about this power.
That’s the first concern, the first
challenge to overcome.
20:08 The Second
Challenge to overcome is that of course
we are people, we are humans, “humans of
interests”, and if you have superintelligences
controlled by people,
who knows what’s going to happen…
I do hope that at this point we will
have the superintelligence itself try
to help us solve the challenge in the
world that it creates. This is not… no
longer an unreasonable thing to say. Like
if you imagine a superintelligence that
indeed sees things more deeply than we
do,
much more deeply.
To understand reality better than us.
We could use it to help us solve the
challenges that it creates.
20:43 Then there is the third challenge which
is
the challenge maybe of natural selection.
You know what the Buddhists say: the
change is the only constant. So even if
you do have your superintelligences in
the world and they are all… We’ve managed
to solve alignment, we’ve managed to
solve… no one wants to use them in very
destructive ways, we managed to create a
life of unbelievable abundance,
which really like not just not just
material abundance, but Health, longevity,
like
all the things we don’t even
try dreaming about because there’s
obviously impossible, if you’ve got to
this point then there is the third
challenge of natural selection. Things
change,
you know… You know that natural selection
applies to ideas, to organizations, and
that’s a challenge as well.
21:28 Maybe the Neuralink solution of people
becoming part AI will be one way we will
choose to address this.
I don’t know. But I would say that this
kind of describes my concern. And
specifically just as the concerns are
big,
if you manage, it is so worthwhile to
overcome them,
because then we could create truly
unbelievable lives
for ourselves that are completely even
unimaginable.
So it is like a challenge that’s
really really worth overcoming.
22:00 Sven: I very
much like the idea that there needs to
be the sort of threshold above which we
we really really should pay attention.
Because you know speaking as as a German,
if it’s like European style regulation
often from people that don’t really know
very much about the field, you can also
completely kill innovation which is a
which be… it would be a little bit of a
pity.
My own final comments:
I really like the emphasis on a collaborative non-adversarial approach to interactions
between humans and AIs. I think we need to find a way to combine
security mindset with a collaborative non-adversarial approach, or we’ll be
completely doomed.
In particular, I like that steering and controlling is understood in terms of
avoiding the blow-up, and that it seems that the overall interaction is supposed to be collaborative, including the goal-making (although the phrase “no one wants to use them in very destructive ways” does suggest a rather radical reshaping of the world
structure and of its inhabitants by this human-AI collaboration, and one might worry what would that imply, and whether it would go well).
Speaking of merging humans and AIs, I’d prefer people to focus more on
the intermediate solutions before jumping to Neuralink-grade ones. In particular,
high-end augmented reality and high-end non-invasive brain-computer interfaces
can go a long way and are much easier to accelerate rapidly, so I wish people
would not gloss over those intermediate solutions, but would talk about them more.
Ilya Sutskever’s thoughts on AI safety (July 2023): a transcript with my comments
There has been a 25 min interview with Ilya conducted by Sven Strohband and released on July 17: https://www.youtube.com/watch?v=xym5f0XYlSc
This interview has a section dedicated to AI safety (7 min starting from 14:56). Ilya is now the co-lead of the OpenAI “superalignment” effort, and his thinking will likely be particularly influential in how this effort evolves.
What he is saying seems to be somewhat different from what is in the consensus OpenAI “superalignment” documents. It’s compatible, but the emphasis is rather different. In particular, thinking about humans controlling or steering a superintelligent system is limited to an analogy of controlling a nuclear reactor to prevent a meltdown, and a more collaborative approach between humans and AIs seems to be emphasized instead.
(I am not sure when the interview has been recorded, but no earlier than July 6, since it mentions Introducing Superalignment.)
Here is my attempt at editing the YouTube transcript of that part of the conversation. The truly interesting part starts at 20:07. He hopes that a collaboration with superintelligence could solve the issues of misuse (so, no, he is not aiming to make superintelligence alignable to arbitrary goals, designing the proper goals is likely to be a collaborative activity between humans and AIs). I’ve put some bold marks for emphasis.
My own final comments:
I really like the emphasis on a collaborative non-adversarial approach to interactions between humans and AIs. I think we need to find a way to combine security mindset with a collaborative non-adversarial approach, or we’ll be completely doomed.
In particular, I like that steering and controlling is understood in terms of avoiding the blow-up, and that it seems that the overall interaction is supposed to be collaborative, including the goal-making (although the phrase “no one wants to use them in very destructive ways” does suggest a rather radical reshaping of the world structure and of its inhabitants by this human-AI collaboration, and one might worry what would that imply, and whether it would go well).
Speaking of merging humans and AIs, I’d prefer people to focus more on the intermediate solutions before jumping to Neuralink-grade ones. In particular, high-end augmented reality and high-end non-invasive brain-computer interfaces can go a long way and are much easier to accelerate rapidly, so I wish people would not gloss over those intermediate solutions, but would talk about them more.