One way they could do that, is by pitting the model against modified versions of itself, like they did in OpenAI Five (for Dota).
From the minimizing-X-risk perspective, it might be the worst possible way to train AIs.
As Jeff Clune (Uber AI) put it:
[O]ne can imagine that some ways of configuring AI-GAs (i.e. ways of incentivizing progress) that would make AI-GAs more likely to succeed in producing general AI also make their value systems more dangerous. For example, some researchers might try to replicate a basic principle of Darwinian evolution: that it is ‘red in tooth and claw.’
If a researcher tried to catalyze the creation of an AI-GA by creating conditions similar to those on Earth, the results might be similar. We might thus produce an AI with human vices, such as violence, hatred, jealousy, deception, cunning, or worse, simply because those attributes make an AI more likely to survive and succeed in a particular type of competitive simulated world. Note that one might create such an unsavory AI unintentionally by not realizing that the incentive structure they defined encourages such behavior.
Additionally, if you train a language model to outsmart millions of increasingly more intelligent copies of itself, you might end up with the perfect AI-box escape artist.
I agree. Additionally, the life expectancy of elephants is significantly higher than of paleolithic humans (1, 2). Thus, individual elephants have much more time to learn stuff.
In humans, technological progress is not a given. Across different populations, it seems to be determined by the local culture, and not by neurobiological differences. For example, the ancestors of Wernher von Braun have left their technological local minimum thousands of years later than Egyptians or Chinese. And the ancestors of Sergei Korolev lived their primitive lives well into the 8th century C.E. If a Han dynasty scholar had visited the Germanic and Slavic tribes, he would’ve described them as hopeless barbarians, perhaps even as inherently predisposed to barbarism.
Maybe if we give elephants more time, they will overcome their biological limitations (limited speech, limited “hand”, fewer neurons in neocortex etc), and will escape the local minimum. But maybe not.
Jeff Hawkins provided a rather interesting argument on the topic:
The scaling of the human brain has happened too fast to implement any deep changes in how the circuitry works. The entire scaling process was mostly done by the favorite trick of biological evolution: copy and paste existing units (in this case—cortical columns).
Jeff argues that there is no change in the basic algorithm between earlier primates and humans. It’s the same reference-frames processing algo distributed across columns. The main difference is, humans have much more columns.
I’ve found his arguments convincing for two reasons:
his neurobiological arguments are surprisingly good (to the point of being surprisingly obvious in hindsight)
It’s the same “just add more layers” trick we reinvented in ML
The failure of large dinosaurs to quickly scale is a measuring instrument that detects how their algorithms scaled with more compute
Are we sure about the low intelligence of dinosaurs?
Judging by the living dinos (e.g. crows), they are able to pack a chimp-like intelligence into a 0.016 kg brain.
And some of the dinos have had x60 more of it (e.g. the brain of Tyrannosaurus rex weighted about 1 kg, which is comparable to Homo erectus).
And some of the dinos have had a surprisingly large encephalization quotient, combined with bipedalism, gripping hands, forward-facing eyes, omnivorism, nest building, parental care, and living in groups (e.g. troodontids).
Maybe it was not an asteroid after all...
(Very unlikely, of course. But I find the idea rather amusing)
why aren’t elephants GI?
As Herculano-Houzel called it, the human brain is a remarkable, yet not extraordinary, scaled-up primate brain. It seems that our main advantage in hardware is quantitative: more cortical columns to process more reference frames to predict more stuff.
And the primate brain is mostly the same as of other mammals (which shouldn’t be surprising, as the source code is mostly the same).
And the intelligence of mammals seems to be rather general. It allows them to solve a highly diverse set of cognitive tasks, including the task of learning to navigate at the Level 5 autonomy in novel environments (which is still too hard for the most general of our AIs).
One may ask: why aren’t elephants making rockets and computers yet?
But one may ask the same question about any uncontacted human tribe.
Thus, it seems to me that the “elephants are not GI” part of the argument is incorrect. Elephants (and also chimps, dolphins etc) seem to possess a rather general but computationally capped intelligence.
BTW, a few days ago Eliezer made a specific prediction that is perhaps relevant to your discussion:
I [would very tentatively guess that] AGI to kill everyone before self-driving cars are commercialized
(I suppose Eliezer is talking about Level 5 autonomy cars here).
Maybe a bet like this could work:
At least one month will elapse after the first Level 5 autonomy car hits the road, without AGI killing everyone.
“Level 5 autonomy” could be further specified to avoid ambiguities. For example, like this:
The car must be publicly accessible (e.g. available for purchase, or as a taxi etc). The car should be able to drive from some East Coast city to some West Coast city by itself.
The preliminary results where obtained on a subset of the full benchmark (~90 tasks vs 206 tasks). And there were many changes since then, including scoring changes. Thus, I’m not sure we’ll see the same dynamics in the final results. Most likely yes, but maybe not.
I agree that the task selection process could create the dynamics that look like the acceleration. A good point.
As I understand, the organizers have accepted almost all submitted tasks (the main rejection reasons were technical—copyright etc). So, it was mostly self-selection, with the bias towards the hardest imaginable text tasks. It seems that for many contributors, the main motivation was something like:
Take that, the most advanced AI of Google! Let’s see if you can handle my epic task!
This includes many cognitive tasks that are supposedly human-complete (e.g. understanding of humor, irony, ethics), and the tasks that are probing the model’s generality (e.g. playing chess, recognizing images, navigating mazes—all in text).
I wonder if the performance dynamics on such tasks will follow the same curve.
The list of of all tasks is available here.
During the workshop presentation, Jascha said that the OpenAI will run their models on the benchmark. This suggests that there is (was?) some collaboration. But it was a half a year ago.
Just checked, the repo’s readme doesn’t mention OpenAI anymore. In the earlier versions, it was mentioned like this:
Teams at Google and OpenAI have committed to evaluate BIG-Bench on their best-performing model architectures
So, it seems that OpenAI withdrew from the project, partially or fully.
Nope. Although the linked paper uses the same benchmark (a tiny subset of it), the paper comes from a separate research project.
As I understand, the primary topic of the future paper will be the BIG-bench project itself, and how the models from Google / OpenAI perform on it.
The results were presented at a workshop by the project organizers. The video from the workshop is available here (the most relevant presentation starts at 5:05:00).
It’s one of those innocent presentations that, after you understand the implications, keep you awake at night.
your view seems to imply that we will move quickly from much worse than humans to much better than humans, but it’s likely that we will move slowly through the human range on many tasks
We might be able to falsify that in a few months.
There is a joint Google / OpenAI project called BIG-bench. They’ve crowdsourced ~200 of highly diverse text tasks (from answering scientific questions to predicting protein interacting sites to measuring self-awareness).
One of the goals of the project is to see how the performance on the tasks is changing with the model size, with the size ranging by many orders of magnitude.
A half-year ago, they presented some preliminary results. A quick summary:
if you increase the N of parameters from 10^7 to 10^10, the aggregate performance score grows roughly like log(N).
But after the 10^10 point, something interesting happens: the score starts growing much faster (~N).
And for some tasks, the plot looks like a hockey stick (a sudden change from ~0 to almost-human).
The paper with the full results is expected to be published in the next few months.
Judging by the preliminary results, the FOOM could start like this:
The GPT-5 still sucks on most tasks. It’s mostly useless. But what if we increase parameters_num by 2? What could possibly go wrong?
It doesn’t seem to be a consequence of Crypto specifically. Any API qualifies here.
For a digital entity, it is tricky to handle fiat currency (say, USD) without relying on humans. For example, to open any kind of the account (e.g. bank, PayPal etc), one need to pass KYC filters, CAPTCHAs etc. Same for any API that allow transfers of fiat currency. The legacy financial system is explicitly designed to be shielded against bots (with the exception of the bots owned by registered humans).
But in the crypto space, you can create your own bank in a few lines of code, without any kind of human assistance. There are no legal requirements for participation. You don’t have to own a valid identification document, a postal address etc.
Thanks to crypto, a smart enough Python script could earn money, trade goods and services, or even hire humans, without a single interaction with the legacy financial system.
Crypto is an AI-friendly tool to convert intelligence directly into financial power.
Although I’m not sure if it has any meaningful impact on the X-risk. For a recursively self-improving AGI, hijacking the legacy financial system could be as trivial as hijacking the crypto space.
Aside from the theoretical similarities between the two fields, there are also interesting practical aspects.
Some positive effects:
Cryptocurrencies have made some AI alignment researchers much wealthier, allowing them to focus more on their research.
Some of the alignment orgs (e.g. MIRI) got large donations from the crypto folk.
Some negative effects:
Cryptocurrencies allow AIs to directly participate in the economy, without human intermediaries. These days, a Python script can buy / sell goods and services, and even hire freelancers. And some countries are moving to a crypto-based economy (e.g. El Salvador). This could greatly increase the speed of AI takeoff.
Some cryptocurrencies are general-purpose computing systems that are practically uncencorable and indestructible (short of switching off the Internet). Thanks to it, even a sub-human AI could become impossible to switch off.
Both effects are reducing the complexity of the first steps of AI takeoff. Instead of hacking robotic factories or whatever, the AI could just hire freelancers to run its errands. Instead of hacking some closely monitored VMs, the AI could just run itself on Ethereum. And so on. Gaining the first money, human minions, and compute—is now a mundane software problem that doesn’t require a Bayesian superintelligence.
This also makes a stealthy takeoff more realistic. On the Internet, nobody knows you’re a self-aware smart contract who is paying untraceable money to some shady people to do some shady stuff.
This comment lists more negative than positive effects. But I have no idea if crypto is a net positive or not. I haven’t thought deeply on the topic.
You’re right, the goal is to bring the Archimedes back to life, not just some similar mind.
We humans are changing all the time. My mind from 1 year ago is different from my current mind (say, the difference is X percent). So, if you resurrect me, and the difference is less than X, then I don’t have to worry about the difference.
I also agree with you on time travel. If it’s possible, then it will be the best solution. Just mind-upload the person from the past, before they die.
My reasoning was as follows.
The typical book page contains about 3k chars (including spaces).
If we encode each char in UTF-8 (8 bit), a 200-pages book will contain about 5*10^6 bits (about 600 KiB).
Thus, to generate all 200-pages books, we must generate all binary numbers of the length 5*10^6 .
There are 2^(5*10^6) such numbers, or about 10^10^6.
There are at most 10^82 atoms in the universe. If we replace each atom with a (classical) computer, and split the work among them, each of the computers will need to generate 10^(10^6 − 82) books.
If each computer is generating one example per femtosecond (10^-15 sec), it will take them roughly 10^10^6 sec to finish the job.
It’s much much longer than the time before the last black hole evaporates (10^114 sec).
And this is only the generation. We also need to write the books down somewhere, which will require some additional time per book, and a hell lot of storage.
I suspect that the entire task can be done dramatically faster on quantum computers. But I’m not knowledgeable enough in the topic to predict the speedup. Can they do it in 1 sec? In 10^10^3 sec? No idea.
We could also massively speed up the whole thing if we limit it to realistic books, and not just strings of random characters. E.g. use only the relevant parts of UTF-8.
The estimate is based on my (ridiculously primitive) 20th-century understanding of computing. How will people think about such tasks in 1000 years is beyond my comprehension (as it was beyond comprehension 70 years ago to measure stuff in petabytes and petaFLOPS).
If I create a million digital minds and kill them instantly—would you be morally obligated to resurrect all of them?
If these days someone creates children, and then endangers their lives, are we morally obligated to try to save them? I see it as a morally equal situation.
And I am doing a bad thing killing the minds when I know you’re gonna resurrect them anyway?
Trying to kill people is definitely a bad thing, even if you are sure that the murder will be unsuccessful.
There is also no guarantee that any of the listed resurrection methods will ever work.
As per our assumptions, Archimedes 2.0 is the Archimedes, the 3rd-century BC Greek thinker who was temporary dead but who will be alive again.
(in exactly the same sense, as some of the today’s clinically-dead patients will be alive, thanks to the modern medical tech).
Thus, the plan is not just create some minds similar to Archimedes, but to save the life of Archimedes himself.
I see resurrecting a long-dead person as morally equal to saving a contemporary who is in grave danger.
I don’t advocate for nonconsensual mind changes (edited the post to make it clear).
Can we possibly agree that mind control is not OK?
Imagine that your friend has a severe depression (e.g. caused by their gut microbiome), to the point of attempting suicide.
You have a medication that fixes their gut microbiome, curing their depression, with no side effects.
In such a situation, is it OK to ask your friend to take the medication? (with a proper medical consent etc).
If your friend takes the medication and thanks to it doesn’t have suicidal thoughts anymore, is it right to call the whole situation a “mind control by Jbash”?
Can’t a person even die without being harassed by busybodies?
Paramedics don’t abandon a patient because the patient committed suicide. They’re trying to save his/her life regardless. And they’re doing the right thing.
People saying things like “just simulate all possible brains” are making implicit assumptions that can’t be supported in anything we’ve observed of the universe.
Sure, from our current (very limited) understanding of physics, simulating all possible brains seems to be impossible. But should we assume it will forever remain impossible?
It would take more than 10^10^20 seconds to find prime factors of some large integers. But after we’ve discovered one weird trick, we can (theoretically) do it many-orders-of-magnitude faster. Maybe there are similar tricks for searching in the space of all possible minds.
Judging by the history, the Clarke’s first law is more fundamental than any law of physics:
When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
From the linked article:
On 29 December 1934, Albert Einstein was quoted in the Pittsburgh Post-Gazette as saying, “There is not the slightest indication that [nuclear energy] will ever be obtainable. It would mean that the atom would have to be shattered at will.” This followed the discovery that year by Enrico Fermi that if you bombard uranium with neutrons, the uranium atoms split up into lighter elements, releasing energy.
I think he would still very much like to not die in the conventional sense.
I agree, the technological resurrection idea should not be used as a reassurance in own immortality. It is much better to avoid dying than to rely on such speculations.
Recreating past human minds would be a cosmic waste in my opinion, why explore the space of current human minds when you can explore the space of jupiter-brains?
Why not both? We could save the 100+ billion lives awaiting resurrection, and then explore the space of jupiter-brains. Saving lives is not a waste of resources.
I agree. We still don’t know how 85% the Universe works (dark matter), and our main physics frameworks don’t play well with each other. It means, there is still a lot of undiscovered physics.
What we see as physically impossible today could become a mundane engineering problem in a few decades. Such transitions from impossible to mundane have already happened a few times (e.g. transmutation of elements).