I’ve been doing computational cognitive neuroscience research since getting my PhD in 2006, until the end of 2022. I’ve worked on computatonal theories of vision, executive function, episodic memory, and decision-making. I’ve focused on the emergent interactions that are needed to explain complex thought. I was increasingly concerned with AGI applications of the research, and reluctant to publish my best ideas. I’m incredibly excited to now be working directly on alignment, currently with generous funding from the Astera Institute. More info and publication list here.
Seth Herd
The fact that very few in government even understand the existential risk argument means that we haven’t seen their relevant opinions yet. As you point out, the government is composed of selfish individuals. At least some of those individuals care about themselves, their children and grandchildren. Making them aware of the existential risk arguments in detail could entirely change their atiitude.
In addition, I think we need to think in more detail about possible regulations and downsides. Sure, government is shortsighted and selfish, like the rest of humanity.
I think you’re miscalibrated on the risks relative to your average reader. We tend to care primarily about the literal extinction of humanity. Relative to those concerns, the “the most dystopian uses for AI” you mention are not a concern, unless you mean literally the worst- a billion-year reich of suffering or something.
We need a reason to believe that governments can reliably improve the incentives facing private organizations.
We do not. Many of us here believe we are in such a desparate situation that merely rolling the dice to change anything would make sense.
I’m not one of those people. I can’t tell what situation we’re really in, and I don’t think anyone else has a satisfactory full view either. So, despite all of the above, I think you might be right that government regulation may make the situation worse. The biggest risk I can see is changing who’s in the lead for the AGI race; the current candidates seem relatively well-intended and aware of the risks (with large caveats). (One counterargument is that takeoff will likely be so slow in the current paradigm that we will have a multiple AGIs, making the group dynamics as important as individual intentions.)
So I’d like to see a better analysis of the potential outcomes of government regulation. Arguing that governments are bad and dumb in a variety of ways just isn’t sufficiently detailed to be helpful in this situation.
I expect there is still tons of low-hanging fruit available in LLM capabilities land. You could call this “algorithmic progress” if you want. This will decrease the compute cost necessary to get a given level of performance, thus raising the AI capability level accessible to less-resourced open-source AI projects.
Don’t you expect many of those improvements to remain closed-source from here on out, benefitting the teams that developed them at great (average) expense? And even the ones that are published freely will benefit the leaders just as much as their open-source chasers.
The question this addresses is whether LLMs can create new knowledge. The answer is “that’s irrelevant”.
Your framing seems to equivocate over current LLMs, future LLMs, and future AI of all types. That’s exactly what the public debate does, and it creates a flaming mess.
I’m becoming concerned that too many in the safety community are making this same mistake, and thereby misunderstanding and underestimating the near-term danger.
I think there’s a good point to be made about the cognitive limitations of LLMs. I doubt they can achieve AGI on their own.
But they don’t have to, so whether they can is irrelevant.
If you look at how humans create knowledge, we are using a combination of techniques and brain systems that LLMs cannot employ. Those include continuous, self-directed learning; episodic memory as one aid to that learning; cognitive control, to organize and direct that learning; and sensory and motor systems to carry out experiments to direct that learning.
All of those are conceptually straightforward to add to LLMs (good executive function/cognitive control for planning is less obviously straightforward, but I think it may be surprsingly easy to leverage LLMs “intelligence” to do it well).
See my Capabilities and alignment of LLM cognitive architectures for expansions on those arguments. I’ve been reluctant to publish more, but I think these ideas are fairly obvious once someone actually sits down to create agents that expand on LLM capabilities, so I think getting the alignment community thinking about this correctly is more important than a tiny slowdown in reaching x-risk capable AGI through this route.
(BTW human artistic creativity uses that same set of cognitive capabilities in different ways, so same answer to “can LLMs be true artists”).
I’m not sure that’s true. It’s true if you adopt the dominant local perspective “alignment is very hard and we need more time to do it”. But there are other perspectives: see “AI is easy to control” by Pope & Belrose, arguing that the success of RLHF means there’s a less than 1% risk of extinction from AI. I think this perspective is both subtly wrong and deeply confused in mistaking alignment with total x-risk, but the core argument isn’t obviously wrong. So reasonable people can and do argue for full speed ahead on AGI.
I agree with pretty much all of the counterarguments made by Steve Byrnes in his Thoughts on “AI is easy to control” by Pope & Belrose. But not all reasonable people will. And those who are also non-utilitarians (most of humanity) will be pursuing AGI ASAP for rational (if ultimately subtly wrong) reasons.
I think we need to understand and take this position seriously to do a good job of avoiding extinction as best we can.
I wonder if you’re getting disagreement strictly over that last line. I think that all makes sense, but I strongly suspect that the ease of making ChatGPT had nothing to do with their decision to publicize and commercialize.
There’s little reason to think that alignment is an engineering problem to the exclusion of theory. But making good theory is also partly dependent on knowing about the system you’re addressing, so I think there’s a strong argument that that progress accelerated alignment work as strongly as capabilities.
I think the argument is that it would be way better to do all the work we could on alignment before advancing capabilities at all. Which it would be. If we were not only a wise species, but a universally utilitarian one (see my top level response on that if you care). Which we are decidedly not.
This doesn’t even address their stated reason/excuse for pushing straight for AGI.
I don’t have a link handy, but Altman has said that short timelines and a slow takeoff is a good scenario for AI safety. Pushing for AGI now raises the odds that, when we get it near it, it won’t get 100x smarter or more prolific rapidly. And I think that’s right, as far as it goes. It needs to be weighed against the argument for more alignment research before approaching AGI, but doing that weighing is not trivial. I don’t think there’s a clear winner.
Now, Altman pursuing more compute with his “7T investment” push really undercuts that argument being his sincere opinion, at least now (he said bit about that a while ago, maybe 5 years?).
But even if Altman was or is lying, that doesn’t make that thesis wrong. This might be the safest route to AGI. I haven’t seen anyone even try in good faith to weigh the complexities of the two arguments against each other.
Now, you can still say that this is evil, because the obviously better path is to do decades and generations of alignment work prior to getting anywhere near AGI. But that’s simply not going to happen.
One reason that goes overlooked is that most human beings are not utilitarians. Even if they realize we’re lowering the odds of future humans having an amazing, abundant future, they are pursuing AGI right now because it might prevent tham and many of those they love from dying painfully. This is terribly selfish from a utilitarian perspective, but reason does not cross the is/ought gap to make utilitarianism any more rational than selfishness. I think calling selfishness “evil” is ultimately correct, but it’s not obvious. And by that standard, most of humanity is currently evil.
And in this case, evil intentions still might have good outcomes. While OpenAI has no good alignment plan, neither does anyone else. Humanity is simply not going to pause all AI work to study alignment for generations, so plans that include substantial slowdown are not good plans. So fast timelines with a slow takeoff based on lack of compute might still be the best chance we’ve got. Again, I don’t know and I don’t think anyone else does, either.
Thanks! A joke explained will never get a laugh, but I did somehow get a cackling laugh from your explanation of the joke.
I think I didn’t get it because I don’t think the trend line breaks. If you made a good enough noise reducer, it might well develop smart and distinct enough simulations that one would gain control of the simulator and potentially from there the world. See A smart enough LLM might be deadly simply if you run it for long enough if you want to hurt your head on this.
I’ve thought about it a little because it’s interesting, but not a lot because I think we probably are killed by agents we made deliberately long before we’re killed by accidentally emerging ones.
Yep, that’s an alarmingly high base rate, so multiplying that by ten is an enormous added risk. So even if the concentration and effect is far lower than in alcoholics, I’d still probably not take that risk.
Possibly even without ALDH deficiency.
It sounds to me like the author isn’t thinking about near-future scenarios, just existing AI.
Making a machine autopoietic is straightforward if it’s got the right sort of intelligence. We haven’t yet made a machine with the right sort of intelligence to do it yet, but there are good reasons to think we’re close. AutoGPT and similar agents can roughly functionally understand a core instruction like “maintain, improve, and perpetuate your code base”, they’re just not quite smart enough to do it effectively. Yet. So engaging with the arguments for what remains between here and there is the critical bit. Maybe it’s around the corner, maybe it’s decades away. It comes down to the specifics. The general argument “Turing machines can’t host autopoietic agents” are obviously wrong.
I’m not sure if the author makes this argument, but your summary sounded like they do.
I certainly agree that I’d hold off until I knew the answers to a bunch more questions.
This all seems to rest on the relative increase in oral and esophygeal cancer. 10x sounds like an awful lot. But in terms of decision-making, the absolute increase, not the ratio, is the bottom line. So: what are the absolute likelihoods? If they’re both miniscule, this might not be a deciding factor. Increasing my cancer risk by one in a million might be a good trade for immunity to cavities and gum disease.
If you throw in immunity to bad breath, I’d take that deal. I wonder how large a factor the alcohol vs lactic acid is in bad breath.
I think it’s also worth considering how much ethanol is excreted into the mouth by these bacteria relative to how much in the mouth of a heavy drinker. I’m sure the frequency vs. persistance is also a factor, but I’m not sure how.
On the other hand, if those numbers are much higher, it’s possible that even those without ALDH deficiency shouldn’t take the treatment.
Separately from persistence of the grid: humanoid robots are damned near ready to go now. Recent progress is startling. And if the AGI can do some of the motor control, existing robots are adequate to bootstrap manufacturing of better robots.
My summary: This is related to the The Waluigi Effect (mega-post) but extends the hypothesis to that “Waluigi” hostile simulacra finding ways to perpetuate itself and gain influence first over the simulator, then over the real world.
Okay, I came back and read this more fully. I think this is entirely plausible. But I also think it’s mostly irrelevant. Long before someone accidentally runs a smart enough LLM for long enough, with access to enough tools to pose a threat, they’ll deliberately run it as an agent. The prompt “you’re a helpful assistant that wants to accomplish [x]; make a plan and execute it, using [this set of APIs] to gather information and take actions as appropriate.
And long before that, people will use more complex scaffolding to create dangerous language model cognitive architectures out of less capable LLMs.
I could be wrong about this, and I invite pushback. Again, I take the possibility you raise seriously.
Sure, but that’s no reason not to try.
I think this is a strong argument against “just do something that feels like it’s working toward liberal democracy”. But not against actually trying to work toward liberal democracy.
I think this is a subset of work on most important problems: time figuring out what to work on is surprisingly effective. People don’t do it as much as they should because it’s frustrating and doesn’t feel like it’s working toward a rewarding outcome.
It’s available as a podcast now:
Want to re-add a link?
I guess I don’t get it.
Sure, long after we’re dead from AGI that we deliberately created to plan to achieve goals.
Plagiarism is bad, on LW or anywhere.
Repeating other people’s useful thoughts is good. Pretending you came up with them yourself is bad. Attribution is the difference.
It could be but that’s clearly not the whole deal. Societal standards for childcare have shifted dramatically. That could be driven by people having fewer children and also causing it, in a vicious cycle.
High income households have access to the world’s best leisure opportunities, yet they still invest more time in child-rearing than lower income households.
I doubt they invest more time. They have money to pay for more help with childcare. And I think this is the critical difference.
Time spent on care per child has skyrocketed in recent decades. I think that’s one major factor driving down fertility: having kids is a bigger PITA every year.
Thinking of costs solely terms of money is a mistake. The time investment is critical.
This is why I’m unconcerned with low fertility if we get AI progress and don’t die from it: AI is going to be great at childcare. Even current LLMs have the cognitive capacity to be good tutors and playmates.
Who is downvoting posts like this? Please don’t!
I see that this is much lower than the last time I looked, so it’s had some, probably large, downvotes.
A downvote means “please don’t write posts like this, and don’t read this post”.
Daniel Kokatijlo disagreed with this post, but found it worth engaging with. Don’t you want discussions with those you disagree with? Downvoting things you don’t agree with says “we are here to preach to the choir. Dissenting opinions are not welcome. Don’t post until you’ve read everything on this topic”. That’s a way to find yourself in an echo chamber. And that’s not going to save the world or pursue truth.
I largely disagree with the conclusions and even the analytical approach taken here, but that does not make this post net-negative. It is net-positive. It could be argued that there are better posts on this topic one should read, but there certainly haven’t been this week. And I haven’t heard these same points made more cogently elsewhere. This is net-positive unless I’m misunderstanding the criteria for a downvote.
I’m confused why we don’t have a “disagree” vote on top-level posts to draw off the inarticulate disgruntlement that causes people to downvote high-effort, well-done work.