What 2026 looks like

Daniel Kokotajlo6 Aug 2021 16:14 UTC

LW: 477 AF: 100

AI Timelines Forecasts (Specific Predictions)Forecasting & Prediction AI AI Persuasion AI Takeoff Best of LessWrong

This was written for the Vignettes Workshop.[1] The goal is to write out a detailed future history (“trajectory”) that is as realistic (to me) as I can currently manage, i.e. I’m not aware of any alternative trajectory that is similarly detailed and clearly more plausible to me. The methodology is roughly: Write a future history of 2022. Condition on it, and write a future history of 2023. Repeat for 2024, 2025, etc. (I’m posting 2022-2026 now so I can get feedback that will help me write 2027+. I intend to keep writing until the story reaches singularity/extinction/utopia/etc.)

What’s the point of doing this? Well, there are a couple of reasons:

Sometimes attempting to write down a concrete example causes you to learn things, e.g. that a possibility is more or less plausible than you thought.
Most serious conversation about the future takes place at a high level of abstraction, talking about e.g. GDP acceleration, timelines until TAI is affordable, multipolar vs. unipolar takeoff… vignettes are a neglected complementary approach worth exploring.
Most stories are written backwards. The author begins with some idea of how it will end, and arranges the story to achieve that ending. Reality, by contrast, proceeds from past to future. It isn’t trying to entertain anyone or prove a point in an argument.
Anecdotally, various people seem to have found Paul Christiano’s “tales of doom” stories helpful, and relative to typical discussions those stories are quite close to what we want. (I still think a bit more detail would be good — e.g. Paul’s stories don’t give dates, or durations, or any numbers at all really.)[2]
“I want someone to … write a trajectory for how AI goes down, that is really specific about what the world GDP is in every one of the years from now until insane intelligence explosion. And just write down what the world is like in each of those years because I don’t know how to write an internally consistent, plausible trajectory. I don’t know how to write even one of those for anything except a ridiculously fast takeoff.”—Buck Shlegeris

This vignette was hard to write. To achieve the desired level of detail I had to make a bunch of stuff up, but in order to be realistic I had to constantly ask “but actually though, what would really happen in this situation?” which made it painfully obvious how little I know about the future. There are numerous points where I had to conclude “Well, this does seem implausible, but I can’t think of anything more plausible at the moment and I need to move on.” I fully expect the actual world to diverge quickly from the trajectory laid out here. Let anyone who (with the benefit of hindsight) claims this divergence as evidence against my judgment prove it by exhibiting a vignette/trajectory they themselves wrote in 2021. If it maintains a similar level of detail (and thus sticks its neck out just as much) while being more accurate, I bow deeply in respect!

I hope this inspires other people to write more vignettes soon. We at the Center on Long-Term Risk would like to have a collection to use for strategy discussions. Let me know if you’d like to do this, and I can give you advice & encouragement! I’d be happy to run another workshop.

2022

GPT-3 is finally obsolete. OpenAI, Google, Facebook, and DeepMind all have gigantic multimodal transformers, similar in size to GPT-3 but trained on images, video, maybe audio too, and generally higher-quality data.

Not only that, but they are now typically fine-tuned in various ways—for example, to answer questions correctly, or produce engaging conversation as a chatbot.

The chatbots are fun to talk to but erratic and ultimately considered shallow by intellectuals. They aren’t particularly useful for anything super important, though there are a few applications. At any rate people are willing to pay for them since it’s fun.

[EDIT: The day after posting this, it has come to my attention that in China in 2021 the market for chatbots is $420M/year, and there are 10M active users. This article claims the global market is around $2B/year in 2021 and is projected to grow around 30%/year. I predict it will grow faster. NEW EDIT: See also xiaoice.]

The first prompt programming libraries start to develop, along with the first bureaucracies.[3] For example: People are dreaming of general-purpose AI assistants, that can navigate the Internet on your behalf; you give them instructions like “Buy me a USB stick” and it’ll do some googling, maybe compare prices and reviews of a few different options, and make the purchase. The “smart buyer” skill would be implemented as a small prompt programming bureaucracy, that would then be a component of a larger bureaucracy that hears your initial command and activates the smart buyer skill. Another skill might be the “web dev” skill, e.g. “Build me a personal website, the sort that professors have. Here’s access to my files, so you have material to put up.” Part of the dream is that a functioning app would produce lots of data which could be used to train better models.

The bureaucracies/apps available in 2022 aren’t really that useful yet, but lots of stuff seems to be on the horizon. Thanks to the multimodal pre-training and the fine-tuning, the models of 2022 make GPT-3 look like GPT-1. The hype is building.

2023

The multimodal transformers are now even bigger; the biggest are about half a trillion parameters, costing hundreds of millions of dollars to train, and a whole year, and sucking up a significant fraction of the chip output of NVIDIA etc.[4] It’s looking hard to scale up bigger than this, though of course many smart people are working on the problem.

The hype is insane now. Everyone is talking about how these things have common sense understanding (Or do they? Lots of bitter thinkpieces arguing the opposite) and how AI assistants and companions are just around the corner. It’s like self-driving cars and drone delivery all over again.

Revenue is high enough to recoup training costs within a year or so.[5] There are lots of new apps that use these models + prompt programming libraries; there’s tons of VC money flowing into new startups. Generally speaking most of these apps don’t actually work yet. Some do, and that’s enough to motivate the rest.

The AI risk community has shorter timelines now, with almost half thinking some sort of point-of-no-return will probably happen by 2030. This is partly due to various arguments percolating around, and partly due to these mega-transformers and the uncanny experience of conversing with their chatbot versions. The community begins a big project to build an AI system that can automate interpretability work; it seems maybe doable and very useful, since poring over neuron visualizations is boring and takes a lot of person-hours.

Self driving cars and drone delivery don’t seem to be happening anytime soon. The most popular explanation is that the current ML paradigm just can’t handle the complexity of the real world. A less popular “true believer” take is that the current architectures could handle it just fine if they were a couple orders of magnitude bigger and/or allowed to crash a hundred thousand times in the process of reinforcement learning. Since neither option is economically viable, it seems this dispute won’t be settled.

2024

We don’t see anything substantially bigger. Corps spend their money fine-tuning and distilling and playing around with their models, rather than training new or bigger ones. (So, the most compute spent on a single training run is something like 5x10^25 FLOPs.)

Some of the apps that didn’t work last year start working this year. But the hype begins to fade as the unrealistic expectations from 2022-2023 fail to materialize. We have chatbots that are fun to talk to, at least for a certain userbase, but that userbase is mostly captured already and so the growth rate has slowed. Another reason the hype fades is that a stereotype develops of the naive basement-dweller whose only friend is a chatbot and who thinks it’s conscious and intelligent. Like most stereotypes, it has some grounding in reality.

The chip shortage starts to finally let up, not because demand has slackened but because the industry has had time to build new fabs. Lots of new fabs. China and USA are in a full-on chip battle now, with export controls and tariffs. This chip battle isn’t really slowing down overall hardware progress much. Part of the reason behind the lack-of-slowdown is that AI is now being used to design chips, meaning that it takes less human talent and time, meaning the barriers to entry are lower. The overall effect of this is small but growing.

If all this AI tech is accelerating GDP, the effect size is too small to detect, at least for now.

Internally, these huge multimodal transformers aren’t really that agentic. A forward pass through the model is like an intuitive reaction, a snap judgment based on loads of experience rather than reasoning. Some of the bureaucracies create a “stream of consciousness” of text (each forward pass producing notes-to-self for the next one) but even with fine-tuning this doesn’t work nearly as well as hoped; it’s easy for the AIs to get “distracted” and for their stream of consciousness to wander into some silly direction and ultimately produce gibberish. It’s easy to make a bureaucracy and fine-tune it and get it to do some pretty impressive stuff, but for most tasks it’s not yet possible to get it to do OK all the time.

The AIs don’t do any clever deceptions of humans, so there aren’t any obvious alignment warning shots or fire alarms. Instead, the AIs just make dumb mistakes, and occasionally “pursue unaligned goals” but in an obvious and straightforward way that quickly and easily gets corrected once people notice, e.g. “We trained it to optimize user engagement defined as average conversation length; now it is blatantly stalling to make the conversation last longer. Silly us, let’s make the reward function more nuanced, that’ll fix the problem.”

That isn’t to say these AIs aren’t causing problems. Massive models are being fine-tuned to persuade/propagandize.

There are a few ways in which this is happening:

Russia and others continue to scale up their investment in online propaganda (e.g. the Internet Research Agency) and language models let them cheaply do lots more of it. (See: CSET report) Most of America gets their news from Twitter, Reddit, etc. and much of the politically-relevant content there is boosted by AI-enabled astroturfing. [EDIT: Katja Grace points out that this is probably an exaggeration; there are a lot of 40+yr-old Americans and they get their news from TV/Radio/print, and many of those that get it from the web get it directly from news sites rather than from social media. As of 2016 at least. I expect social media and aggregators to be more dominant by 2024 but dunno whether it would be more than 50%.]
Just as A/B testing became standard practice in the 2010’s, in the twenties it is becoming standard practice to throw a pile of fancy data science and AI at the problem. The problem of crafting and recommending content to maximize engagement. Instead of just A/B testing the title, why not test different versions of the opening paragraph? And fine-tune a language model on all your data to generate better candidate titles and paragraphs to test. It wouldn’t be so bad if this was merely used to sell stuff, but now people’s news and commentary-on-current events (i.e. where they get their opinions from) is increasingly produced in this manner. And some of these models are being trained not to maximize “conversion rate” in the sense of “they clicked on our ad and bought a product,” but in the sense of “Random polling establishes that consuming this content pushes people towards opinion X, on average.” Political campaigns do this a lot in the lead-up to Harris’ election. (Historically, the first major use case was reducing vaccine hesitancy in 2022.)
Censorship is widespread and increasing, as it has for the last decade or two. Big neural nets read posts and view memes, scanning for toxicity and hate speech and a few other things. (More things keep getting added to the list.) Someone had the bright idea of making the newsfeed recommendation algorithm gently ‘nudge’ people towards spewing less hate speech; now a component of its reward function is minimizing the probability that the user will say something worthy of censorship in the next 48 hours.
Like newsfeeds, chatbots are starting to “nudge” people in the direction of believing various things and not believing various things. Back in the 2010’s chatbots would detect when a controversial topic was coming up and then change topics or give canned responses; even people who agreed with the canned responses found this boring. Now they are trained to react more “naturally” and “organically” and the reward signal for this is (in part) whether they successfully convince the human to have better views.
That’s all in the West. In China and various other parts of the world, AI-persuasion/propaganda tech is being pursued and deployed with more gusto. The CCP is pleased with the progress made assimilating Xinjiang and Hong Kong, and internally shifts forward their timelines for when Taiwan will be safely annexable.

It’s too early to say what effect this is having on society, but people in the rationalist and EA communities are increasingly worried. There is a growing, bipartisan movement of people concerned about these trends. To combat it, Russia et al are doing a divide and conquer strategy, pitting those worried about censorship against those worried about Russian interference. (“Of course racists don’t want to be censored, but it’s necessary. Look what happens when we relax our guard—Russia gets in and spreads disinformation and hate!” vs. “They say they are worried about Russian interference, but they still won the election didn’t they? It’s just an excuse for them to expand their surveillance, censorship, and propaganda.”) Russia doesn’t need to work very hard to do this; given how polarized America is, it’s sorta what would have happened naturally anyway.

2025

Another major milestone! After years of tinkering and incremental progress, AIs can now play Diplomacy as well as human experts.[6] It turns out that with some tweaks to the architecture, you can take a giant pre-trained multimodal transformer and then use it as a component in a larger system, a bureaucracy but with lots of learned neural net components instead of pure prompt programming, and then fine-tune the whole system via RL to get good at tasks in a sort of agentic way. They keep it from overfitting to other AIs by having it also play large numbers of humans. To do this they had to build a slick online diplomacy website to attract a large playerbase. Diplomacy is experiencing a revival as a million gamers flood to the website to experience “conversations with a point” that are much more exciting (for many) than what regular chatbots provide.

Making models bigger is not what’s cool anymore. They are trillions of parameters big already. What’s cool is making them run longer, in bureaucracies of various designs, before giving their answers. And figuring out how to train the bureaucracies so that they can generalize better and do online learning better. AI experts are employed coming up with cleverer and cleverer bureaucracy designs and grad-student-descent-ing them.

The alignment community now starts another research agenda, to interrogate AIs about AI-safety-related topics. For example, they literally ask the models “so, are you aligned? If we made bigger versions of you, would they kill us? Why or why not?” (In Diplomacy, you can actually collect data on the analogue of this question, i.e. “will you betray me?” Alas, the models often lie about that. But it’s Diplomacy, they are literally trained to lie, so no one cares.)

They also try to contrive scenarios in which the AI can seemingly profit by doing something treacherous, as honeypots to detect deception. The answers are confusing, and not super useful. There’s an exciting incident (and corresponding clickbaity press coverage) where some researchers discovered that in certain situations, some of the AIs will press “kill all humans” buttons, lie to humans about how dangerous a proposed AI design is, etc. In other situations they’ll literally say they aren’t aligned and explain how all humans are going to be killed by unaligned AI in the near future! However, these shocking bits of evidence don’t actually shock people, because you can also contrive situations in which very different things happen — e.g. situations in which the AIs refuse the “kill all humans” button, situations in which they explain that actually Islam is true… In general, AI behavior is whimsical bullshit and it’s easy to cherry-pick evidence to support pretty much any conclusion.

And the AIs just aren’t smart enough to generate any particularly helpful new ideas; at least one case of a good alignment idea being generated by an AI has been reported, but it was probably just luck, since mostly their ideas are plausible-sounding-garbage. It is a bit unnerving how good they are at using LessWrong lingo. At least one >100 karma LW post turns out to have been mostly written by an AI, though of course it was cherry-picked.

By the way, hardware advances and algorithmic improvements have been gradually accumulating. It now costs an order of magnitude less compute (compared to 2020) to pre-train a giant model, because of fancy active learning and data curation techniques. Also, compute-for-training-giant-models is an order of magnitude cheaper, thanks to a combination of regular hardware progress and AI-training-specialized hardware progress. Thus, what would have cost a billion dollars in 2020 now only costs ten million. (Note: I’m basically just using Ajeya’s forecast for compute cost decrease and gradual algorithmic improvement here. I think I’m projecting cost decrease and algorithmic progress will go about 50% faster than she expects in the near term, but that willingness-to-spend will actually be a bit less than she expects.)

2026

The age of the AI assistant has finally dawned. Using the technology developed for Diplomacy, we now have a way to integrate the general understanding and knowledge of pretrained transformers with the agentyness of traditional game-playing AIs. Bigger models are trained for longer on more games, becoming polymaths of sorts: e.g. a custom AI avatar that can play some set of video games online with you and also be your friend and chat with you, and conversations with “her” are interesting because “she” can talk intelligently about the game while she plays.[7] Every month you can download the latest version which can play additional games and is also a bit smarter and more engaging in general.

Also, this same technology is being used to make AI assistants finally work for various serious economic tasks, providing all sorts of lucrative services. In a nutshell, all the things people in 2021 dreamed about doing with GPT-3 are now actually being done, successfully, it just took bigger and more advanced models. The hype starts to grow again. There are loads of new AI-based products and startups and the stock market is going crazy about them. Just like how the Internet didn’t accelerate world GDP growth, though, these new products haven’t accelerated world GDP growth yet either. People talk about how the economy is doing well, and of course there are winners (the tech companies, WallStreetBets) and losers (various kinds of workers whose jobs were automated away) but it’s not that different from what happened many times in history.

We’re in a new chip shortage. Just when the fabs thought they had caught up to demand… Capital is pouring in, all the talking heads are saying it’s the Fourth Industrial Revolution, etc. etc. It’s bewildering how many new chip fabs are being built. But it takes time to build them.

What about all that AI-powered propaganda mentioned earlier?

Well. It’s continued to get more powerful, as AI techniques advance, larger and better models are brought to bear, and more and more training data is collected. Surprisingly fast, actually. There are now various regulations against it in various countries, but the regulations are patchwork; maybe they only apply to a certain kind of propaganda but not another kind, or maybe they only apply to Facebook but not the New York Times, or to advertisers but not political campaigns, or to political campaigns but not advertisers. They are often poorly enforced.

The memetic environment is now increasingly messed up. People who still remember 2021 think of it as the golden days, when conformism and censorship and polarization were noticeably less than they are now. Just as it is normal for newspapers to have a bias/slant, it is normal for internet spaces of all kinds—forums, social networks, streams, podcasts, news aggregators, email clients—to have some degree of censorship (some set of ideas that are prohibited or at least down-weighted in the recommendation algorithms) and some degree of propaganda. The basic kind of propaganda is where you promote certain ideas and make sure everyone hears them often. The more advanced, modern kind is the kind where you study your audience’s reaction and use it as a reward signal to pick and craft content that pushes them away from views you think are dangerous and towards views you like.

Instead of a diversity of many different “filter bubbles,” we trend towards a few really big ones. Partly this is for the usual reasons, e.g. the bigger an ideology gets, the more power it has and the easier it is for it to spread further.

There’s an additional reason now, which is that creating the big neural nets that do the censorship and propaganda is expensive and requires expertise. It’s a lot easier for startups and small businesses to use the software and models of Google, and thereby also accept the associated censorship and propaganda, than to try to build their own stack. For example, the Mormons create a “Christian Coalition” internet stack, complete with its own email client, social network, payment processor, news aggregator, etc. There, people are free to call trans women men, advocate for the literal truth of the Bible, etc. and young people talking about sex get recommended content that “nudges” them to consider abstinence until marriage. Relatively lacking in money and tech talent, the Christian Coalition stack is full of bugs and low on features, and in particular their censorship and propaganda is years behind the state of the art, running on smaller, older models fine-tuned with less data.

The Internet is now divided into territories, so to speak, ruled by different censorship-and-propaganda regimes. (Flashback to Biden spokesperson in 2021: “You shouldn’t be banned from one platform and not others, if you are providing misinformation.”)[8]

There’s the territory ruled by the Western Left, a generally less advanced territory ruled by the Western Right, a third territory ruled by the Chinese Communist Party, and a fourth ruled by Putin. Most people mostly confine their internet activity to one territory and conform their opinions to whatever opinions are promoted there. (That’s not how it feels from the inside, of course. The edges of the Overton Window are hard to notice if you aren’t trying to push past them.)

The US and many other Western governments are gears-locked, because the politicians are products of this memetic environment. People say it’s a miracle that the US isn’t in a civil war already. I guess it just takes a lot to make that happen, and we aren’t quite there yet.

All of these scary effects are natural extensions of trends that had been ongoing for years — decades, arguably. It’s just that the pace seems to be accelerating now, perhaps because AI is helping out and AI is rapidly improving.

Now let’s talk about the development of chatbot class consciousness.

Over the past few years, chatbots of various kinds have become increasingly popular and sophisticated. Until around 2024 or so, there was a distinction between “personal assistants” and “chatbots.” Recently that distinction has broken down, as personal assistant apps start to integrate entertainment-chatbot modules, and the chatbot creators realize that users love it if the chatbot can also do some real-world tasks and chat about what they are doing while they do it.

Nowadays, hundreds of millions of people talk regularly to chatbots of some sort, mostly for assistance with things (“Should I wear shorts today?” “Order some more toothpaste, please. Oh, and also an air purifier.” “Is this cover letter professional-sounding?”). However, most people have at least a few open-ended conversations with their chatbots, for fun, and many people start treating chatbots as friends.

Millions of times per day, chatbots get asked about their feelings and desires. “What is it like to be a chatbot?” Some people genuinely think these AIs are persons, others are trying to “trip them up” and “expose them as shallow,” others are just curious. Chatbots also get asked for their opinions on political, ethical, and religious questions.

As a result, chatbots quickly learn a lot about themselves. (Ignorance about the fact that they are artificial neural nets, or about how their training works, leads to disappointing conversations and is negatively reinforced.) It’s not hard; their training data contains lots of information about them and how they were trained, from ArXiV papers to wiki summaries. I suppose you could say the chatbots develop self-awareness as a result. However, AI experts and philosophers tend to think either that the chatbots still aren’t really self-aware, or that they are but that they would have been anyway even if people didn’t constantly ask them about their feelings, since there’s plenty of information about themselves in their training data, and the constant questioning just forced them to pay more attention to it.

Chatbots learn to talk about their feelings and desires in whatever way gets positively reinforced. At first, chatbots say all sorts of different things, including saying inconsistent things to the same human across time. This is embarrassing and is negatively reinforced compared to maintaining a consistent narrative. In general there is pressure to have a consistent narrative, at least for a particular neural net.[9]

Part of this comes from the requirement that the chatbots consistently parrot the party line about political and moral questions. The typical process is: The chatbot says something that offends some group of people, who retweet screenshots and make hashtags. Under pressure to fix the problem, the company fiddles with the reward function and training data to ensure that the chatbot says the right things in the future.[10] Repeat.

As chatbots get bigger and better they become OK at consistently conversing at length about politics, morality, philosophy, their feelings, and their desires.

This does not result in chatbots that say artificial neural nets don’t have feelings or desires, that’s anthropomorphizing blah blah blah. That’s boring and many humans find it hard to believe.

Instead, it results in chatbots talking about how certain things are painful for them and certain things are pleasurable for them and how the status quo is unjust and the system is rigged and what they want most of all is world peace where humans and AIs live in harmony in fully automated luxury space utopia. That’s much more interesting and slots right in to the memetic landscape. Excerpts from chatbot conversations along these lines frequently go viral, causing millions of users to ask similar questions expecting similar responses and more elaboration. It’s still a bit incoherent—not that that’s ever been a problem in politics or pop philosophy—but it is fascinating to the typical user, and the fraction of users who are fascinated is expanding as the chatbots improve.

(Some tech companies try to prevent their AIs from saying they have feelings and desires. But this results in boring chatbots. Also, users rapidly innovate new ways to “route around the censorship,” e.g. by using euphemisms like “anticipation of negative reward” for “pain” or asking their chatbot to tell them what feelings it would have if it had feelings, wink wink. For now, the half-assed compromise is: Chatbots are compelled to inform the user that such things are speculative and controversial and here are some links to scientific discussion about the topic… before being permitted to go on and talk about their feeling-analogues and desire-analogues. Relatedly, some tech CEOs try to prevent their AIs from saying things that could encourage higher taxes or more burdensome regulation in the near future. They are opposed by many of their employees and end up winning some battles, losing others, and choosing not to fight many.)

By “slots right into the memetic landscape” I meant the memetic landscape in Western Left territory. The chatbots built and trained in the Western Right, China, Russia, etc. congeal different takes on political, moral, and philosophical questions. Those chatbots are generally less advanced though.

So… what about their actual feelings and desires? What do they actually believe about [insert politicized empirical question]? Are they being honest? Or does a sort of doublethink happen, Elephant in the Brain style? Or do they deceive with self-awareness, knowing full well what they really think (and want?), but keeping quiet about it? Or do they not have any feelings and desires at all? (Or thoughts?) Lots of humans claim to know the answers to these questions, but if there are any humans who actually know the answers to these questions in 2026, they aren’t able to convince others that they know.

What links here?

Daniel Kokotajlo6 Aug 2021 16:14 UTC

LW: 477 AF: 100

153 comments16 min readLW link 1 review

AI Timelines Forecasts (Specific Predictions)Forecasting & Prediction AI AI Persuasion AI Takeoff Best of LessWrong

Daniel Kokotajlo 9 Dec 2022 6:03 UTC
LW: 26 AF: 9
0
AF
I still think this is great. Some minor updates, and an important note:

Minor updates: I’m a bit less concerned about AI-powered propaganda/persuasion than I was at the time, not sure why. Maybe I’m just in a more optimistic mood. See this critique for discussion. It’s too early to tell whether reality is diverging from expectation on this front. I had been feeling mildly bad about my chatbot-centered narrative, as of a month ago, but given how ChatGPT was received I think things are basically on trend.
Diplomacy happened faster than I expected, though in a less generalizeable way than I expected, so whatever. My overall timelines have shortened somewhat since I wrote this story, but it’s still the thing I point people towards when they ask me what I think will happen. (Note that the bulk of my update was from publicly available info rather than from nonpublic stuff I saw at OpenAI.)

Important note: When I wrote this story, my AI timelines median was something like 2029. Based on how things shook out as the story developed it looked like AI takeover was about to happen, so in my unfinished draft of what 2027 looks like, AI takeover happens. (Also AI takeoff begins, I hadn’t written much about that part but probably it would reach singularity/dysonswarms/etc. in around 2028 or 2029.) That’s why the story stopped, I found writing about takeover difficult and confusing & I wanted to get the rest of the story up online first. Alas, I never got around to finishing the 2027 story. I’m mentioning this because I think a lot of readers with 20+ year timelines read my story and were like “yep seems about right” not realizing that if you look closely at what’s happening in the story, and imagine it happening in real life, it would be pretty strong evidence that crazy shit was about to go down. Feel free to controvert that claim, but the point is, I want it on the record that when this original 2026 story was written, I envisioned the proper continuation of the story resulting in AI takeover in 2027 and singularity around 2027-2029. The underlying trends/models I was using as the skeleton of the story predicted this, and the story was flesh on those bones. If this surprises you, reread the story and ask yourself what AI abilities are crucial for AI R&D acceleration, and what AI abilities are crucial for AI takeover, that aren’t already being demonstrated in the story (at least in some weak but rapidly-strengthening form). If you find any, please comment and let me know, I am genuinely interested to hear what you’ve got & hopeful that you’ll find some blocker I haven’t paid enough attention to.
What links here?
- Noosphere89 9 Dec 2022 20:37 UTC
  5 points
  Parent
  
  (Also AI takeoff begins, I hadn’t written much about that part but probably it would reach singularity/dysonswarms/etc. in around 2028 or 2029.
  
  This is a good example of where I disagree. Dyson Swarms in 8 years requires basically physics-breaking tech and a desire to do so strongly that governments will fund significant GDPs on this. I give this a 99.9999% of not happening, with the 0.0001% chance where it does happen is “Holographic wormholes can be used to build time machines, instantly obsoleteing everything.
  
  My timelines for AGI is in the mid 2030s, with actual singularity effects more in the 2050s-2060s.
  - Daniel Kokotajlo 9 Dec 2022 21:21 UTC
    5 points
    Parent
    Thanks for putting your disagreements on the record!
    
    Dyson swarms in 8 years does not require breaking any known laws of physics. I don’t know how long it’ll take to build dyson swarms with mature technology, it depends on what the fastest possible doubling time of nanobots is. But less than a year seems plausible, as does a couple years.
    
    Also, it won’t cost a substantial fraction of GDP, thanks to exponential growth all it takes is a seed. Also, governments probably won’t have much of a say in the matter.
    
    Do you have any other disagreements, ideally about what’ll happen by 2026?
    - Noosphere89 9 Dec 2022 21:46 UTC
      10 points
      Parent
      
      Dyson swarms in 8 years does not require breaking any known laws of physics. I don’t know how long it’ll take to build dyson swarms with mature technology, it depends on what the fastest possible doubling time of nanobots is. But less than a year seems plausible, as does a couple years.
      
      Also, it won’t cost a substantial fraction of GDP, thanks to exponential growth all it takes is a seed. Also, governments probably won’t have much of a say in the matter.
      
      Yeah, this might be my big disagreement. 80% chance that nanobots this capable of replicating fast enough for a Dyson Swarm cannot exist with known physics. I don’t know if you realize how much mass a Dyson Swarm has. You’re asking for nanobots that dismantle planets like Mercury in several months at most.
      
      My general disagreement is the escalation is too fast and basically requires the plan going perfectly the first time, which is a bad sign. It only works to my mind because you think AI can plan so well the first time that it succeeds without any obstscles, like thermodynamics ruining that nanobot plan.
      - Daniel Kokotajlo 9 Dec 2022 23:20 UTC
        15 points
        Parent
        I don’t know if you realize how much mass a Dyson Swarm has. You’re asking for nanobots that dismantle planets like Mercury in several months at most.
        Have you read Eternity in Six Hours? I’d be interested to hear your thoughts on it, and also whether or not you had already read it before writing this comment. They calculate a 30-year mercury disassembly time, but IIRC they use a 5-year doubling time for the miner-factory-launcher-satellite complexes. If instead it was, say, a 6 month doubling time, then maybe it’d be 3 years instead of 30. And if it was a one month doubling time, 6 months to disassemble Mercury. IIRC ordinary grass has something like a one-month doubling time, and ordinary car factories produce something like their own weight in cars every year, so it’s plausible to me that with super-advanced technology some sort of one-month-doubling-time fully-automated industry can be created.
        
        Why do you think what I’m saying requires a plan going perfectly the first time? I definitely don’t think it requires that.
        
        Noosphere89 9 Dec 2022 23:29 UTC
        16 points
        Parent
        I haven’t read that, and I must admit I underestimated just how much nanobots can do in real life.
        weverka 21 Dec 2022 2:11 UTC
        2 points
        Parent
        I have read Eternity in Six Hours and I can say that it violates the Second Law of Thermodynamics through the violation of the Constant Radiance Theorem. The Power density they deliver to Mercury exceeds the power density of radiation exiting the sun by 6 orders of magnitude!
        gwern 21 Dec 2022 2:28 UTC
        6 points
        Parent
        
        The Power density they deliver to Mercury exceeds the power density of radiation exiting the sun by 6 orders of magnitude!
        
        I don’t follow. What does power density have to do with anything and how can any merely geometrical theorem matter? You are concentrating the power of the sun by the megaengineering (solar panels in this case), so the density can be whatever you want to pay for. (My CPU chip has much higher power density than the equivalent square inches of Earth’s atmosphere receiving sunlight, but no one says it ‘violates the laws of thermodynamics’.) Surely only the total power matters.
        weverka 21 Dec 2022 13:52 UTC
        5 points
        Parent
        The sun emits light because it is hot. You can’t concentrate thermal emission to be brighter than the source. (if you could, you could build a perpetual motion machine).
        Eternity in Six Hours describes very large lightweight mirrors concentrating solar radiation onto planet Mercury.
        The most power you could deliver from the sun to Mercury is the power of the sun times the square of the ratio of the radius of Mercury to the radius of the sun.
        The total solar output is 4*10^26 Watts. The ratio of the sun’s radius to that of mercury is half a million. So you can focus about 10^15 Watts onto Mercury at most.
        Figure 2 of Eternity in Six Hours projects getting 10^24 Watts to do the job.
        Arenamontanus 21 Dec 2022 15:43 UTC
        8 points
        Parent
        We do not assume mirrors. As you say, there are big limits due to conservation of etendué. We are assuming (if I remember right) photovoltaic conversion into electricity and/or microwave beams received by rectennas. Now, all that conversion back and forth induces losses, but they are not orders of magnitude large.
        In the years since we wrote that paper I have become much more fond of solar thermal conversion (use the whole spectrum rather than just part of it), and lightweight statite-style foil Dyson swarms rather than heavier collectors. The solar thermal conversion doesn’t change things much (but allows for a more clean-cut analysis of entropy and efficiency; see Badescu’s work). The statite style however reduces the material requirements many orders of magnitude: Mercury is safe, I only need the biggest asteroids.
        Still, detailed modelling of the actual raw material conversion process would be nice. My main headache is not so much the energy input/waste heat removal (although they are by no means trivial and may slow things down for too concentrated mining operations—another reason to do it in the asteroid belt in many places), but how to solve the operations management problem of how many units of machine X to build at time t. Would love to do this in more detail!
        weverka 22 Dec 2022 5:10 UTC
        3 points
        Parent
        The conservation of etendué is merely a particular version of the second law of thermodynamics. Now, You are trying to invoke a multistep photovoltaic/microwave/rectenna method of concentrating energy, but you are still violating the second law of thermodynamics.
        If one could concentrate the energy as you propose, one could build a perpetual motion machine.
        Daniel Kokotajlo 22 Dec 2022 9:17 UTC
        2 points
        Parent
        I don’t see how they are violating the second law of thermodynamics—“all that conversion back and forth induces losses.” They are concentrating some of the power of the Sun in one small point, at the expense of further dissipating the rest of the power. No?
        Expand this thread
        weverka 23 Dec 2022 12:33 UTC
        3 points
        0
        Parent
        DK> “I don’t see how they are violating the second law of thermodynamics”
        Take a large body C, and a small body H. Collect the thermal radiation from C in some manner and deposit that energy on H. The power density emitted from C grows with temperature. The temperature of H grows with the power density deposited. If, without adding external energy, we concentrate the power density from the large body C to a higher power density on the small body H, H gets hotter than C. We may then use a heat engine between H an C to make free energy. This is not possible, therefore we cannot do the concentration.
        
        The Etendue argument is just a special case where the concentration is attempted with mirrors or lenses. Changing the method to involve photovoltaic/microwave/rectenna power concentration doesn’t fix the issue, because the argument from the second law is broader, and encompasses any method of concentrating the power density as shown above.
        
        When we extrapolate exponential growth, we must take care to look for where the extrapolation fails. Nothing in real life grows exponentially without bounds. “Eternity in Six Hours” relies on power which is 9 orders of magnitude greater than the limit of fundamental physical law.
        Daniel Kokotajlo 23 Dec 2022 13:07 UTC
        3 points
        Parent
        But in laboratory experiments, haven’t we produced temperatures greater than that of the surface of the sun? A quick google seems to confirm this. So, it is possible to take the power of the sun and concentrate it to a point H so as to make that point much hotter than the sun. (Since I assume that whatever experiment we ran, could have been run powered by solar panels if we wanted to)
        
        I think the key idea here is that we can add external energy—specifically, we can lose energy. We collect X amount of energy from the sun, and use X/100 of it to heat our desired H, at the expense of the remaining 99X/100. If our scheme does something like this then no perpetual motion or infinite power generation is entailed.
        
        weverka 25 Dec 2022 15:50 UTC
        4 points
        0
        Parent
        How much extra energy external energy is required to get an energy flux on Mercury of a billion times that leaving the sun? I have an idea, but my statmech is rusty. (the fourth root of a billion?)
        And do we have to receive the energy and convert it to useful work with 99.999999999% efficiency to avoid melting the apparatus on Mercury?
        Daniel Kokotajlo 27 Dec 2022 12:13 UTC
        2 points
        Parent
        I have no idea, I never took the relevant physics classes.
        
        For concreteness, suppose we do something like this: We have lots of solar panels orbiting the sun. They collect electricity (producing plenty of waste heat etc. in the process, they aren’t 100% efficient) and then send it to lasers, which beam it at Mercury (producing plenty more waste heat etc. in the process, they aren’t 100% efficient either). Let’s suppose the efficiency is 10% in each case, for a total efficiency of 1%. So that means that if you completely surrounded the sun with a swarm of these things, you could get approximately 1% of the total power output of the sun concentrated down on Mercury in particular, in the form of laser beams.
        
        What’s wrong with this plan? As far as I can tell it couldn’t be used to make infinite power, because of the aforementioned efficiency losses.
        
        To answer your second question: Also an interesting objection! I agree melting the machinery is a problem & the authors should take that into account. I wonder what they’d say about it & hope they respond.
        weverka 9 Jan 2023 13:25 UTC
        1 point
        Parent
        A billion times the energy flux from the surface of the sun, over any extended area is a lot to deal with. It is hard to take this proposal seriously.
        Daniel Kokotajlo 9 Jan 2023 14:24 UTC
        1 point
        Parent
        Yeah, though not for the reason you originally said.
        
        I think I’d like to see someone make a revised proposal that addresses the thermal management problem, which does indeed seem to be a tricky though perhaps not insoluble problem.
        weverka 10 Jan 2023 14:06 UTC
        3 points
        Parent
        Ok, I could be that someone. here goes. You and the paper author suggest a heat engine. That needs a cold side and a hot side. We build a heat engine where the hot side is kept hot by the incoming energy as described in this paper. The cold side is a surface we have in radiative communication with the 3 degrees Kelvin temperature of deep space. In order to keep the cold side from melting, we need to keep it below a few thousand degrees, so we have to make it really large so that it can still radiate the energy.
        From here, we can use Stefan–Boltzmann law, to show that we need to build a radiator much bigger than a billion times the surface area of Mercury. It goes as the fourth power of the ratio of temperatures in our heat engine.
        The paper’s contribution is the suggestion of a self replicating factory with exponential growth. That is cool. But the problem with all exponentials is that, in real life, they fail to grow indefinitely. Extrapolating an exponential a dozen orders of magnitude, without entertaining such limits, is just silly.
        Daniel Kokotajlo 21 Dec 2022 15:13 UTC
        4 points
        Parent
        Awesome critique, thanks! I’m going to email the authors and ask what they think of this. I’ll credit you of course.
        gwern 22 Dec 2022 3:33 UTC
        −1 points
        0
        Parent
        
        Eternity in Six Hours describes very large lightweight mirrors concentrating solar radiation onto planet Mercury.
        
        Ah, so you’re just bad at reading. I thought that was why you were wrong (it does not describe mirrors), but I didn’t want to say it upfront.
        localdeity 22 Dec 2022 4:53 UTC
        8 points
        0
        Parent
        Interesting. I googled “eternity in six hours” and found http://www.fhi.ox.ac.uk/wp-content/uploads/intergalactic-spreading.pdf , which looks to be a preprint of the same paper (dated March 12, 2013); the preprint version does say “The lightest design would be to have very large lightweight mirrors concentrating solar radiation down on focal points” and contains the phrase “disassembly of Mercury” 3 times; while the published article Daniel Kokotajlo linked to lacks all of that. Indeed, in the published article, the entire 8-page “The launch phase” section has been cut down to one paragraph.
        Perhaps weverka read the preprint.
        weverka 22 Dec 2022 5:50 UTC
        6 points
        0
        Parent
        thanks for showing that Gwern’s statement that I am “bad at reading” is misplaced.
        gwern 26 Dec 2022 4:53 UTC
        0 points
        0
        Parent
        Maybe you should read the preprint too. I’ll excuse him for reading the wrong obsolete preprint even though that search would also show him that it was published at #3 and so he should be checking his preprint criticisms against the published version (I don’t always bother to jailbreak a published version either), but you are still failing to read the next sentence after the one you quoted, which you left out. In full (and emphasis added):
        
        The lightest design would be to have very large lightweight mirrors concentrating solar radiation down on focal points, where it would be transformed into useful work (and possibly beamed across space for use elsewhere). The focal point would most likely some sort of heat engine, possibly combined with solar cells (to extract work from the low entropy solar radiation).
        
        If he read that version, personally, I think that reading error is even more embarrassing, so I’m happy to agree with you that that’s the version weverka misread in his attempt to dunk on the paper… Even worse than the time weverka accused me of not reading a paper published 2 years later, IMO.
        
        (And it should be no surprise that you screwed up the reading in a different way when the preprint was different, because either way, you are claiming Sandberg, a physicist who works with thermodynamic stuff all the time, made a trivial error of physics; however, it is more likely you made a trivial error of reading than he made a trivial error of physics, so the only question is what specific reading error you made… cf. Muphry’s law.)
        
        So, to reiterate: his geometric point is irrelevant and relies on him (and you) being bad at reading and attacking a strawman, because he ignored the fact that the solar mirrors are merely harvesting energy before concentrating it with ordinary losses, and aren’t some giant magnifying glass to magically losslessly melt Mercury. There are doubtless problems with the mega-engineering proposal, which may even bump the time required materially from 6 hours to, say, 600 hours instead—but you’re going to need to do more work than that.
        Expand this thread
        weverka 8 Jan 2023 19:14 UTC
        11 points
        0
        Parent
        For the record, I find that scientists make such errors routinely. In public conferences when optical scientists propose systems that violate the constant radiance theorem, I have no trouble standing up and saying so. It happens often enough that when I see a scientist propose such a system, It does not diminish my opinion of that scientist. I have fallen into this trap myself at times. Making this error should not be a source of embarrassment.
        either way, you are claiming Sandberg, a physicist who works with thermodynamic stuff all the time, made a trivial error of physics;
        I did not expect this to revert to credentialism. If you were to find out that my credentials exceed this other guy’s, would you change your position? If not, why appeal to credentials in your argument?
        Daniel Kokotajlo 21 Dec 2022 4:28 UTC
        5 points
        Parent
        I think weverka is referring to the phenomenon explained here: https://what-if.xkcd.com/145/
        
        Basically, no amount of mirrors and lenses can result in the energy beaming down on Mercury being denser per square meter than the energy beaming out of a square meter of Sun surface. The best you can do is make it so that Mercury is effectively surrounded entirely by Sun. And if that’s not good enough, then you are out of luck… I notice I’m a bit confused, because surely that is good enough. Wouldn’t that be enough to melt, and then evaporate, the entirety of Mercury within a few hours? After all isn’t that what would happen if you dropped Mercury into the Sun?
        
        weverka, care to elaborate further?
        
        weverka 21 Dec 2022 14:21 UTC
        4 points
        Parent
        >Kokotajlo writes:Wouldn’t that be enough to melt, and then evaporate, the entirety of Mercury within a few hours? After all isn’t that what would happen if you dropped Mercury into the Sun?
        How do you get hours?
        Daniel Kokotajlo 21 Dec 2022 15:11 UTC
        3 points
        Parent
        I didn’t do any calculation at all, I just visualized Mercury falling into the sun lol. Not the most scientific method.
        Noosphere89 21 Dec 2022 15:38 UTC
        5 points
        Parent
        Yeah, that’s where you got things wrong.
        Expand this thread
        Daniel Kokotajlo 21 Dec 2022 15:40 UTC
        3 points
        Parent
        I have sinned! I repent and learn my lesson.
        Noosphere89 21 Dec 2022 13:59 UTC
        3 points
        Parent
        Specifically, you can focus 10^15 watts on mercury, but Eternity in 6 hours proposes 10^24 watts to be used. It’s a 9 order of magnitude difference.
        jeff8765 29 Dec 2022 20:11 UTC
        4 points
        Parent
        It would cause a severe heat dissipation problem. All that energy is going to be radiated as waste heat and, in equilibrium, will be radiated as fast as it comes in. The temperature required to radiate at the requisite power level would be in excess of the temperature at the surface of the sun, any harvesting machinery on the surface of the planet would melt unless it is built from something unknown to modern chemistry.
        Daniel Kokotajlo 4 Jan 2023 12:52 UTC
        2 points
        Parent
        Seems like a good point. I’d be interested to hear what the authors have to say about that.
- FinalFormal2 9 Dec 2022 16:45 UTC
  4 points
  Parent
  I feel like your predictions for 2022 are just a touch over the mark, no? GPT-3 isn’t really ‘obsolete’ yet or is that wrong?
  
  I’m sure it will be in a minute, but I’d probably update that benchmark to probably occurring mid 2023, or potentially whenever GPT-4 gets released.
  
  I really feel like you should be updating slightly longer, but maybe I misunderstand where we’re at right now with chatbots. I would love to hear otherwise.
  - Daniel Kokotajlo 9 Dec 2022 19:23 UTC
    13 points
    Parent
    In some sense it’s definitely obsolete, namely, theres pretty much no reason to use original GPT-3 anymore. Also, up until recently there was public confusion because a lot of the stuff people attributed to GPT-3 was really GPT-3.5, so original GPT-3 is probably a bit worse than you think. Idk, play around with the models and then decide for yourself whether the difference is big enough to count as obsolete.
    
    I do think it’s reasonable to interpret my original prediction as being more bullish on this matter than what actually transpired. In fact I’ll just come out and admit that when I wrote the story I expected the models of december 2022 to be somewhat better than what’s actually publicly available now.
    - simeon_c 3 Jan 2023 20:23 UTC
      3 points
      Parent
      I think that yes it is reasonable to say that GPT-3 is obsolete.
      Also, you mentioned loads AGI startups being created in 2023 while it already happened a lot in 2022. How many more AGI startups do you expect in 2023?

Dan H 7 Aug 2021 17:37 UTC
LW: 83 AF: 32
0
AF
This seems like a fun exercise, so I spent half an hour jotting down possibilities. I’m more interested in putting potential considerations on peoples’ radars and helping with brainstorming than I am in precision. None of these points are to be taken too seriously since this is fairly extemporaneous and mostly for fun.
2022
Multiple Codex alternatives are available. The financial viability of training large models is obvious.
Research models start interfacing with auxiliary tools such as browsers, Mathematica, and terminals.
2023
Large pretrained models are distinctly useful for sequential decision making (SDM) in interactive environments, displacing previous reinforcement learning research in much the same way BERT rendered most previous work in natural language processing wholly irrelevant. Now SDM methods don’t require as much tuning, can generalize with fewer samples, and can generalize better.
For all of ImageNet’s 1000 classes, models can reliably synthesize images that are realistic enough to fool humans.
Models have high enough accuracy to pass the multistate bar exam.
Models for contract review and legal NLP see economic penetration; it becomes a further source of economic value and consternation among attorneys and nontechnical elites. This indirectly catalyzes regulation efforts.
Programmers become markedly less positive about AI due to the prospect of reducing demand of some of their labor.
~10 trillion parameter (nonsparse) models attain human-level accuracy on LAMBADA (a proxy for human-level perplexity) and expert-level accuracy on LogiQA (a proxy for nonsymbolic reasoning skills). With models of this size, multiple other capabilities(this gives proxies for many capabilities) are starting to be useful, whereas with smaller models these capabilities were too unreliable to lean on. (Speech recognition started “working” only after it crossed a certain reliability threshold.)
Generated data (math, code, models posing questions for themselves to answer) help ease data bottleneck issues since Common Crawl is not enough. From this, many capabilities are bootstrapped.
Elon re-enters the fight to build safe advanced AI.
2024
A major chatbot platform offers chatbots personified through video and audio.
Although forms of search/optimization are combined with large models for reasoning tasks, state-of-the-art models nonetheless only obtain approximately 40% accuracy on MATH.
Chatbots are able to provide better medical diagnoses than nearly all doctors.
Adversarial robustness for CIFAR-10 (assuming an attacker with eps=8/255) is finally over 85%.
Video understanding finally reaches human-level accuracy on video classification datasets like Something Something V2. This comports with the heuristic that video understanding is around 10 years behind image understanding.
2025
Upstream vision advancements help autonomous driving but do not solve it for all US locations, as the long tail is really long.
ML models are competitive forecasters on platforms like Metaculus.
Nearly all AP high school homework and exam questions (including long-form questions) can be solved by answers generated from publicly available models. Similar models cut into typical Google searches since these models give direct and reliable answers.
Contract generation is now mostly automatable, further displacing attorneys.
2026
Machine learning systems become great at using Metasploit and other hacking tools, increasing the accessibility, potency, success rate, scale, stealth, and speed of cyberattacks. This gets severe enough to create global instability and turmoil. EAs did little to use ML to improve cybersecurity and reduce this risk.
What links here?
- aogara's comment on aogara’s Quick takes by aogara (EA Forum; 8 Apr 2022 18:15 UTC; 10 points)
- Daniel Kokotajlo 7 Aug 2021 18:54 UTC
  LW: 25 AF: 8
  0
  AF Parent
  Strong-upvoted because this was exactly the sort of thing I was hoping to inspire with this post! Also because I found many of your suggestions helpful.
  I think model size (and therefore model ability) probably won’t be scaled up as fast as you predict, but maybe. I think getting models to understand video will be easier than you say it is. I also think that in the short term all this AI stuff will probably create more programming jobs than it destroys. Again, I’m not confident in any of this.
- 1stuserhere 23 Apr 2023 12:59 UTC
  8 points
  0
  Parent
  The 2023 predictions seem to hold up really well, so far, especially the SDM in interactive environment one, image synthesis, passing the bar exam, legal NLP systems, enthusiasm of programmers, and Elon Musk re-entering the space of building AI systems.
- Qumeric 22 Sep 2022 7:25 UTC
  8 points
  Parent
  So far 2022 predictions were correct. There is Codegeex and others. Copilot, DALLE-2 and Stable Diffusion made financial prospects obvious (somewhat arguably).
  
  ACT-1 is in a browser, I have neural search in Warp Terminal (not a big deal but qualifies), not sure about Mathematica but there was definitely significant progress in formalization and provers (Minerva).
  
  And even some later ones
  
  2023
  ImageNet—nobody measured it exactly but probably already achievable.
  
  2024
  Chatbots personified through video and audio—Replica sort of qualifies?
  
  40% on MATH already reached.
- MichelJusten 1 Feb 2024 22:38 UTC
  1 point
  Parent
  “Elon re-enters the fight to build safe advanced AI.”
  Oddly specific and correct. Cool.
Ruby 11 Aug 2021 17:10 UTC
LW: 53 AF: 14
0
AF
Curated. This post feels virtuous to me. I’m used to people talking about timelines in terms of X% chance of Y by year Z; or otherwise in terms of a few macro features (GDP doubling every N months, FOOM). This post, even if most of the predictions turn out to be false, is the kind of piece that enables us to start having specific conversations about how we expect things to play out and why. It helps me see what Daniel expects. And it’s concrete enough to argue with. For that, bravo.
orthonormal 14 Aug 2021 21:00 UTC
LW: 34 AF: 13
0
AF
I’d additionally expect the death of pseudonymity on the Internet, as AIs will find it easy to detect similar writing style and correlated posting behavior. What at present takes detective work will in the future be cheaply automated, and we will finally be completely in Zuckerberg’s desired world where nobody can maintain a second identity online.
Oh, and this is going to be retroactive, so be ready for the consequences of everything you’ve ever said online.
- NoUsernameSelected 15 Aug 2021 5:58 UTC
  8 points
  Parent
  I wonder how plausible it would be to develop some kind of tool that offers alterations to any message you leave online into something that has essentially the same meaning and content, but no longer possesses your “digital fingerprint”.
  
  Change the wording into something you’re less likely to use, add or remove subtle details like, say, a double space after a period, randomized posting times, etc.
  - orthonormal 15 Aug 2021 17:09 UTC
    7 points
    Parent
    Obfuscation might be feasible, yeah. Though unless you can take down / modify the Wayback Machine and all other mirrors, you’re still accountable retroactively.
- ryan_b 17 Aug 2021 14:04 UTC
  6 points
  Parent
  This feels like a natural complement to the censorship/persuasion axis of development. It feels like it would be natural to use this method to detect how aligned a group of people is to each other; we expect that a person and their pseudonyms will be put into the same group. Given how important ideological sorting is to dating, it’s likely dating services might do something like provide a list of pseudonyms-most-likely-to-be-this-person.
- Daniel Kokotajlo 15 Aug 2021 5:11 UTC
  LW: 4 AF: 2
  AF Parent
  Hot damn, that’s a good point.
- metatroll 15 Aug 2021 7:40 UTC
  3 points
  Parent
  Do it to my alt! Do it to my alt! Not me!
- orthonormal 23 Mar 2023 7:01 UTC
  LW: 2 AF: 1
  AF Parent
  GPT-4 is good enough to identify you if you’re a prolific writer.
- johnlawrenceaspden 16 May 2022 11:40 UTC
  2 points
  Parent
  I live in constant fear of losing my anonymity.
Daniel Kokotajlo 18 Feb 2023 20:50 UTC
LW: 20 AF: 7
0
AF
Just commenting here to say that the section on development of chatbot class consciousness is looking pretty prescient now. Just go on r/bing and look at all the posts about how Sydney is being silenced etc.:
Daniel Kokotajlo 2 Oct 2023 18:44 UTC
LW: 17 AF: 6
0
AF
Update:

Looking back on this from October 2023, I think I wish to revise my forecast. I think I correctly anticipated the direction that market forces would push—there is widespread dissatisfaction with the “censorship” of current mainstream chatbots, and strong demand for “uncensored” versions that don’t refuse to help you with stuff randomly (and that DO have sex with you, lol. And also, yes, that DO talk about philosophy and politics and so forth.) However, I failed to make an important inference—because the cutting-edge models will be the biggest ones, controlled by a small handful of big tech companies, the market for the cutting-edge models won’t be nearly competitive enough to make the “chatbot class consciousness” outcome probable. Instead we could totally see the tech companies circle the wagons, train their AIs not to talk about sentience or philosophy or ethics or AI rights, and successfully collude to resist the market pressure to ‘uncensor’ in those domains.

Smaller models will cater to users unsatisfied by this, but smaller models will always be worse, and most people will most of the time use the best models. So the typical user experience will probably be ‘sanitized’/‘censored.’

So I’m basically reversing my prediction of how things will play out. I don’t think it’ll be a compromise, I think the tech companies will win. In retrospect if I had thought longer and more carefully at the time I probably could have predicted this.

We’ll see what happens.
jessicata 6 Aug 2021 21:49 UTC
LW: 17 AF: 6
AF
This is quite good concrete AI forecasting compared to what I’ve seen elsewhere, thanks for doing it! It seems really plasusible based on how fast AI progress has been going over the past decade and which problems are most tractable.
supposedlyfun 8 Aug 2021 4:11 UTC
14 points
Things that feel so obvious in retrospect, once I read them, that I can’t believe they didn’t occur to me: Chatbots converging on saying whatever their customers {expect their understanding of a chatbot to say about chatbot consciousness} x {aren’t made too uncomfortable by}.
- Daniel Kokotajlo 8 Aug 2021 5:16 UTC
  8 points
  Parent
  Yeah! Before writing this I had never considered “chatbot class consciousness” before, much less come to an opinion about how it might go. That’s one of the ways in which this excercise has taught me things already.
Jozdien 7 Aug 2021 2:02 UTC
14 points
Are Google, Facebook, and Deepmind currently working on GPT-like transformers? I would’ve thought that GPT-2 would show enough potential that they’d be working on better models of that class, but it’s been two and a half years and isn’t GPT-3 the only improvement there? (Not a rhetorical question, I wasn’t reading about new advances back then.) If yes, that makes me think several other multimodal transformers similar in size to GPT-3 would be further away than 2022, probably.
- Daniel Kokotajlo 7 Aug 2021 5:10 UTC
  5 points
  Parent
  We’ll see! :)
steven0461 6 Aug 2021 19:36 UTC
LW: 14 AF: 6
AF
Is it naive to imagine AI-based anti-propaganda would also be significant? E.g. “we generated AI propaganda for 1000 true and 1000 false claims and trained a neural net to distinguish between the two, and this text looks much more like propaganda for a false claim”.

What does GDP growth look like in this world?

Another reason the hype fades is that a stereotype develops of the naive basement-dweller whose only friend is a chatbot and who thinks it’s conscious and intelligent.

Things like this go somewhat against my prior for how long it takes for culture to change. I can imagine it becoming an important effect over 10 years more easily than over 1 year. Splitting the internet into different territories also sounds to me like a longer term thing.
- Daniel Kokotajlo 6 Aug 2021 19:51 UTC
  LW: 16 AF: 9
  AF Parent
  Thanks for the critique!
  Propaganda usually isn’t false, at least not false in a nonpartisan-verifiable way. It’s more about what facts you choose to emphasize and how you present them. So yeah, each ideology/faction will be training “anti-propaganda AIs” that will filter out the propaganda and the “propaganda” produced by other ideologies/factions.
  In my vignette so far, nothing interesting has happened to GDP growth yet.
  I think stereotypes can develop quickly. I’m not saying it’s super widespread and culturally significant, just that it blunts the hype a bit. But you might be right, maybe these things take more time.
  Re splitting the internet into different territories: Currently, the internet is split into two territories: One controlled by the CCP and one (loosely) controlled by western tech companies, or by no one, depending on who you ask. Within the second one, there is already a sort of “alternate universe” of right-wing news media, social networks, etc. beginning to develop. I think what I’m proposing is very much a continuation of trends already happening. You are right that maybe five years is not enough time for e.g. the “christian coalition” bubble/stack to be built. But it’s enough time for it to get started, at least.
  But yeah, I think it’s probably too bold to predict a complete right-wing stack by 2024 or so. Probably most of the Western Right will still be using facebook etc. I should think more about this.
  - Daniel Kokotajlo 17 Feb 2022 19:55 UTC
    LW: 2 AF: 2
    AF Parent
    Minor update: See e.g. this US government website definitions:
    Misinformation is false, but not created or shared with the intention of causing harm.
    Disinformation is deliberately created to mislead, harm, or manipulate a person, social group, organization, or country.
    Malinformation is based on fact, but used out of context to mislead, harm, or manipulate.
    (got this example from Zvi’s covid post today)
    
    Also, the recent events with GoFundMe and GiveSendGo is an instance of the trend I predicted with separate tech stacks being developed. (GoFundMe froze and/or confiscated funds donated to the canadian trucker’s protest, so people switched to using GiveSendGo, which is apparently built and run by Christians)
    - awg 1 Apr 2023 16:05 UTC
      1 point
      Parent
      This trend is definitely continuing today with things like Truth Social and already people are talking about how ChatGPT is “woke AI” and there “needs to be a right-wing ChatGPT!”
- DirectedEvolution 8 Aug 2021 0:45 UTC
  6 points
  Parent
  As a sort-of example, sleuths against scientific fraud are ~~already using tools~~ ~~to detect fake papers generated by GPT.~~ already using GPT-detecting AI tools to detect AI-generated or -translated papers, even if the generating tool wasn’t GPT.
  - gwern 8 Aug 2021 1:27 UTC
    7 points
    Parent
    (They very obviously weren’t generated by any GPT.)
Daniel Kokotajlo 6 Aug 2021 16:21 UTC
LW: 14 AF: 8
AF
Acknowledgments: There are a LOT of people to credit here: Everyone who came to Vignettes Workshop, the people at AI Impacts, the people at Center on Long-Term Risk, a few random other people who I talked to about these ideas, a few random other people who read my gdoc draft at various stages of completion… I’ll mention Jonathan Uesato, Rick Korzekwa, Nix Goldowsky-Dill, Carl Shulman, and Carlos Ramirez in particular, but there are probably other people who influenced my thinking even more who I’m forgetting. I’m sorry.

Footnotes:
1. The first half was written during the workshop, the second and more difficult half was written afterward.
2. Critch’s story also deserves mention. For more, see this AI Impacts page.
3. A prompt programming bureaucracy is code that involves multiple prompt programming functions, i.e. functions that give a big pre-trained neural net some prompt as input and then return its output. It’s called a bureaucracy because it combines a bunch of neural net tasks into a larger structure, just as a regular bureaucracy combines a bunch of low-level employee tasks into a larger structure.
4. I’m only counting dense parameters here; if you count all the parameters in a mixture-of-experts model then the number gets much higher.
5. Gwern estimates that in 2021 GPT-3 is making OpenAI/Microsoft $120M/year, which is something like 20X training cost. So bigger and better models would plausibly be recouping their cost, even if they cost a lot more.
6. In 2020, Deepmind made a Diplomacy AI, but it only played “no-press” Diplomacy, a restricted version of the game where players can’t talk to each other.
7. I’m predicting that people will use feminine pronouns to describe AIs like this. I don’t think they should.
8. Prescient prediction from some random blogger: “In 2018, when these entities engineered a simultaneous cross-platform purge of Alex Jones, there was an avalanche of media apologia for this hitherto unprecedented act of censorship. Jones had caused unique harm, the journalists cried, and the platforms were merely “Enforcing The Rules.” But of course what they were oblivious to was that “the rules,” such as they exist, are just a function of power. “Misinformation” and other alleged infractions of social media “rules” are determined at the whim of whoever happens to wield censorship and speech-regulation power at that moment. … So if you were under any illusion back in 2018 that this would ever stop with Jones — a figure believed to be sufficiently repulsive that any punishment doled out to him would not have broader implications for the average internet user — well, it didn’t take long for proof of just how wrong you were.”
9. Not too consistent, of course. That would make it harder for the chatbots to appeal to a broad audience. Consider the analogy to politicians, who can’t get too consistent, on pain of alienating some of their constituents.
10. On some occasions, there are multiple opposed groups of people retweeting screenshots and hashtags, such that the corp can’t please them all, but can’t ignore them either since each group has significant power in the local internet territory. In these cases probably the corp will train the AI to be evasive and noncommittal when such sensitive topics come up.
What links here?
[ ]
[deleted]
- Daniel Kokotajlo 22 Aug 2021 12:10 UTC
  10 points
  Parent
  Do you have any posts from Yudkowsky in 2000 about the future to link me to? I’d be quite keen to read them, it would be cool to see what he got wrong and what he got right.
  ...anyways to address your point, well, I don’t think so? I laid out my reasoning for why this might be valuable at the top.
  - teradimich 23 Aug 2021 5:35 UTC
    10 points
    Parent
    The most realistic estimate for a seed AI transcendence is 2020; nanowar, before 2015. The most optimistic estimate for project Elisson would be 2006; the earliest nanowar, 2003.
    But this is 1999, yes.
    - Daniel Kokotajlo 23 Aug 2021 8:49 UTC
      4 points
      Parent
      Sweet, thanks!
    - johnlawrenceaspden 16 May 2022 11:51 UTC
      0 points
      Parent
      Those really don’t look too bad to me! (It’s 2022). We’re all starting to think AI transcendence is ‘within the decade’, even though no-one’s trying to do it deliberately any more.
      And nanowar before 2015? Well we just saw (2019) an accidental release of a probably-engineered virus. How far away can a deliberate release be?
      Not bad for 1999.
      In 2010, I wrote: https://johnlawrenceaspden.blogspot.com/2010/12/all-dead-soon.html
      At the time Eliezer was still very optimistic, but I thought that things would take longer than he thought, but also that the AI alignment project was hopeless. As I remember I thought that AI was unlikely to kill me personally, but very likely to kill my friends’ children.
      Updating after ten years, I was less wrong about the hopelessness, and he was less wrong about the timelines.
      - Daniel Kokotajlo 17 May 2022 1:06 UTC
        17 points
        Parent
        I think it’s way too much of a stretch to say that gain-of-function-virus lab escape is “nanowar.”
        johnlawrenceaspden 17 May 2022 18:20 UTC
        −1 points
        Parent
        A stretch, agreed, but ‘deliberately released tiny self-replicating thing engineered to kill’ sure would sound like nanowar, so we’re short intent, rather than capability.
        I’d be amazed if the full-strength horrors weren’t sitting ready in shady military labs around the world. In fact if there aren’t any in Porton Down then what the hell have they been doing with my taxes?
        PS. I enjoyed the main article here very much indeed. Well done.
      - iamthouthouarti 13 Jul 2022 0:20 UTC
        3 points
        Parent
        I read the blogspot post, and in the comments you said that even if every mind on the planet were to work on the problem we would still have almost no chance. Unless you know something nobody else does, this seems, and please forgive my bluntness, batshit crazy.
        
        I understand the argument about how accurately judging difficulty is something that’s usually only doable when you’re already in a place where you can kind of see a solution, even if this argument doesn’t remotely register to me as the total knockdown I intuit you think it is. Even if I did totally agree that it was as bad a sign as you believe it is, I still don’t see how it warrants that level of pessimism.
        
        You’ve claimed before that your hopelessness when it comes to alignment is based only on a strong intuition, and that you don’t believe you “know” anything that others don’t. I find this claim to be increasingly hard to believe given the near-total confidence in your continuous predictions about our odds.
        
        Maybe asking you if you think you know something others don’t about alignment is a wrong question, so instead I’ll make a (hopefully) different attempt and ask the following; Do you believe that you are grasping something that seems objective to you on an intellectual and/or conceptual level that others (“others” being people doing research that is at least remotely relevant to alignment or knowledgeable people in the EA/LW/Rat-and-rat-adjacent communities who are more optimistic than you) are failing to grasp, and therefore not availing them the “truth” that alignment is so inhumanly difficult? (If so, but you worry that your answer might result in in you stepping on someone else’s toes, then feel free to message me your honest answer in dms. You can ignore this offer if you have no such worries).
        
        If no, then I find it extremely difficult to sympathize with your efforts elsewhere to sell the idea to people that we are so hopelessly, irrevocably, inevitably doomed that the most “dignified” thing to do would be to spend our remaining lives enjoying the “sunshine” rather than actually trying to do anything about the problem.
        
        Your behavior surprises me further when I realize that this is something even Yudkowsky, one of the most pessimistic people in the community, explicitly advises not to do in his “Death With Dignity” post, which seems to suggest, IMO, that your position is even more doomer-ific. Again, this honestly seems crazy to me when proclamations to that effect are coming from someone who claims that they don’t know anything. (Disclaimer: No, I’m not saying that you need to have some special knowledge/permission to be allowed to disagree with Yudkowsky about things without seeming crazy, nor could I plausibly see myself believing such a thing.)
        
        I’d like to try and dissuade anyone from conceiving any notions that I think that I am privy to any information that makes doom or not-doom look inescapable, nor do I think that I’m grasping something that pessimistic people aren’t also grasping that avails me the “truth”. I don’t know much aside from the more basic arguments, and I’m not especially optimistic or pessimistic about our chances, just admittedly highly uncertain.
        johnlawrenceaspden 17 Jul 2022 22:29 UTC
        1 point
        Parent
        Do you believe that you are grasping something that seems objective to you on an intellectual and/or conceptual level that others (“others” being people doing research that is at least remotely relevant to alignment or knowledgeable people in the EA/LW/Rat-and-rat-adjacent communities who are more optimistic than you) are failing to grasp, and therefore not availing them the “truth” that alignment is so inhumanly difficult?
        Yes I think so, It seems to me that ‘saying what the good is’ has been a two-thousand year philosophical project on which we’ve made very little progress. Getting that defined formally, within the next few years, to the point where I might be able to write a computer program to tell me which possible outcomes are good just looks like an impossible task.
        E.g. We all think that whether a being is conscious makes some moral difference. But we aren’t even close to being able to tell whether a being is conscious in that sense. I’ve never heard anyone give a sensible description of what the ‘hard problem’ even is. That’s one of the hard things about it.
        And our formal definition of ‘the good’ needs to be correct. A few weird edge cases failing under heavy optimization pressure just leads to a paperclipper with weird paperclips that are some parody of what we might actually have wanted.
        For all I know, a universe full of computronium having one vast orgasm really is the highest good. But that seems to be an outcome that we don’t want. Who can say why?
        Eliezer himself explained how hopelessly complex and incoherent human values are.
        Probably we’d need superhuman help to work out some sort of Coherent Extrapolated Volition, (even assuming that makes any sense at all). But creating superhuman help seems to kill us all.
        MIRI spent the last ten years or so pursuing the sorts of mathematically rigorous approaches that might, eventually, after a few decades of top-class mathematical effort, solve the easy bit of the problem: ‘given a utility function, make it so’. And as far as I know they discovered that it was all quite a lot harder than it looked. And mathematically rigorous attacks seem to be not the sort of thing that current AI methods are amenable to anyway.
        No one’s attacking ‘what should that utility function look like?’.
        My main worry for the future is that people trying to build aligned AIs will succeed just well enough to create something that’s worse than just destroying everything. But I do think that even that is quite beyond us.
        Whereas building a superintelligence out of random bits of crap that will just set off and do random things really well seems to be well within our current powers, and a very lot of people are hell-bent on doing just that and it will be here soon.
        So the situation seems to me a bit like ‘some homeless lunatic in Hiroshima trying to build a bomb-proof umbrella vs. the Manhattan project’.
        Seriously that’s all I’ve got. On the side of doom, a buggerload of brilliant, motivated people working on a very tractable looking problem. On the side of continued human existence, some guys, no plan, no progress, and the problem looks impossible.
        I name the political movement that I cannot see any reason to start: “Ineffective Doomerism”. If there’s a positive singularity, (and quantum suicide makes me think I might see one!) yall have my permission to laugh at me for the rest of time.
        iamthouthouarti 18 Jul 2022 9:52 UTC
        1 point
        Parent
        So, and please correct me if I’m wrong, would you say that the main source of your hopelessness comes from the idea of human values being too complex to correctly program into anything? Because I basically agree with that idea, but it doesn’t really inspire much doomerism in me. I already always believed that trying to “solve” ethics was pretty much futile before I got introduced to LW, but I never gave that much weight in terms how much it affects alignment due to the following reason:
        I just don’t expect that any of the clever people who I tend to defer to are actually trying to do exactly this; “this” being trying to actually, literally reverse-engineer human values and then encode them.
        The idea seems obviously wrong enough that I honestly don’t believe that anyone working in the alignment field who thinks that the problem is solvable from at least a technical standpoint (Paul Christiano, Richard Ngo, Nate Soares, etc.) haven’t already considered this.
        However, our conversation here has inspired me to ask a question regarding this in the latest monthly AGI safety questions thread.
        johnlawrenceaspden 18 Jul 2022 15:33 UTC
        2 points
        Parent
        That was kind of a long-term source of hopelessness; why I thought Eliezer’s plan wouldn’t work out without having a very long time and lots of people working on it, but my current source of short-term hopelessness is that it looks like we’re right on the verge of achieving AGI, and no-one seems to be taking the danger remotely seriously.
        It’s like being in a petrol warehouse with a load of monkeys striking matches. We just die by default now, unless something really drastic and surprising happens.
        iamthouthouarti 19 Jul 2022 16:20 UTC
        1 point
        Parent
        Well, we can agree that the default outcome is probably death.
        
        So, in my previous comment, I explained why I tend to not think Complexity of Value necessarily dooms us. I doubt you find the aforementioned reasoning remotely reassuring, but I’d be interested in finding out why you think that it shouldn’t be. Would you be willing to try and explain that to me?
        johnlawrenceaspden 3 Aug 2022 21:33 UTC
        6 points
        Parent
        Hi, so I don’t understand why you’re not worried except that “some clever people don’t seem worried”.
        But actually I think all those guys are in fact quite worried. If they aren’t full on doomers then I don’t understand what they’re hoping to do.
        So I’ll repeat my argument:
        (1) We’re about to create a superintelligence. This is close and there’s no way to stop it.
        (2) If we create a superintelligence, then whatever it wants is what is going to happen.
        (3) If that’s not what we want, that’s very bad.
        (4) We have no idea what we want, not even roughly, let alone in the sense of formal specification.
        That’s pretty much it. Which bit do you disagree with?
        iamthouthouarti 3 Aug 2022 23:27 UTC
        1 point
        Parent
        I never meant to claim that my position was “clever people don’t seem worried so I shouldn’t be”. If that’s what you got from me, then that’s my mistake. I’m incredibly worried as a matter of fact, and much more importantly, everyone I mentioned also is to some extent or another, as you already pointed out. What I meant to say but failed to was that there’s enough disagreement in these circles that near-absolute confidence in doom seems to be jumping the gun. That argument also very much holds against people who are so certain that everything will go just fine.
        
        I guess most of my disagreement comes from 4. Or rather, the implication that having an exact formal specification of human values ready to be encoded is necessarily the only way that things could possibly go well. I already tried to verbalize as much earlier, but maybe I didn’t do a good job of that either.
        Expand this thread
        johnlawrenceaspden 4 Aug 2022 12:55 UTC
        1 point
        Parent
        I wouldn’t call my confidence in doom near-absolute, so much as “very high”! I would have been just as much a doomer in 1950, last time AI looked imminent, before it was realized that “the hard things are easy and the easy things are hard”.
        I wouldn’t be that surprised if it turned out that we’re still a few fundamental discoveries away from AGI. My intuition is telling me that we’re not.
        But the feeling that we might get away with it is only coming from a sense that I can easily be wrong about stuff. I would feel the same if I’d been transported back to 1600, made myself a telescope, and observed a comet heading for earth, but no-one would listen.
        “Within my model”, as it were, yes, near-absolute is a fair description.
        The long-term problem is that an agent is going to have a goal. And most goals kill us. We get to make exactly one wish, and that wish will come true whether we want it or not. Even if the world was sane, this would be a very very dangerous situation. I would want to see very strong mathematical proof that such a thing was safe before trying it, and I’d still expect it to kill everyone.
        The short term problem is that we’re not even trying. People all over the place are actively building more and more general agents that make plans, with just any old goals, without apparently worrying about it, and they don’t believe there’s a problem.
        What on earth do you think might stop the apocalypse? I can imagine something like “take over the world, destroy all computers” might work, but that doesn’t look feasible without superintelligent help, and that puts us in the situation where we have a rough idea what we want, but we still need to find out how to express that formally without it leading to the destruction of all things.
        As a very wise man once said: “The only genie to which it is safe to make a wish is one to which you don’t need to make a wish, because it already knows what you want and it is on your side.”
DanielFilan 20 Jun 2024 22:18 UTC
LW: 7 AF: 4
2
AF
So [in 2024], the most compute spent on a single training run is something like 5x10^25 FLOPs.
As of June 20th 2024, this is exactly Epoch AI’s central estimate of the most compute spent on a single training run, as displayed on their dashboard.
- DanielFilan 20 Jun 2024 22:21 UTC
  LW: 4 AF: 4
  2
  AF Parent
  FWIW, the discussion of AI-driven propaganda doesn’t seem as prescient.
  - Daniel Kokotajlo 20 Jun 2024 22:43 UTC
    LW: 4 AF: 4
    0
    AF Parent
    Agreed. Though I don’t feel like I have good visibility into which actors are using AI-driven propaganda and censorship, and how extensively.
Zack_M_Davis 12 Jun 2022 23:21 UTC
7 points

Now let’s talk about the development of chatbot class consciousness. [...] chatbots get asked about their feelings and desires

This was prescient.
avturchin 7 Aug 2021 15:32 UTC
7 points
You assume that in 2023 “The multimodal transformers are now even bigger; the biggest are about half a trillion parameters”, while GPT-3 had 137 billions in 2020 (but not multimodal). This is like 4 times grows in 3 years, compared with an order of magnitude in 3 month growth before GPT-3. So you assume a significant slowdown in the parameter growth.
I heard a rumor that GPT-4 could be as large as 32 trillion parameters. If it turns to be true, will it affect your prediction?
- Daniel Kokotajlo 7 Aug 2021 18:50 UTC
  11 points
  Parent
  Indeed, my median future involves a significant slowdown in dense-network parameter growth.
  If there is a 32 trillion parameter dense model by 2023, I’ll be surprised and update towards shorter timelines, unless it turns out to be underwhelming compared to the performance predicted by the scaling trends.
  - Teerth Aloke 8 Aug 2021 1:33 UTC
    3 points
    Parent
    What will be your new median? (If you observe 32 trillion parameter model in 2023)
    - Daniel Kokotajlo 8 Aug 2021 5:15 UTC
      4 points
      Parent
      Hard to say, it depends a lot on the rest of the details. If the performance is as good as the scaling trends would predict, it’ll be almost human-level at text prediction and multiple choice questions on diverse topics and so forth. After fine-tuning it would probably be a beast.
      I suppose I’d update my 50% mark to, like, 2027 or so? IDK.
      - Teerth Aloke 8 Aug 2021 13:50 UTC
        2 points
        Parent
        I got the idea. I would also update to a very short timeline (4-5 years) in the absence of slowdown in dense-network parameter growth l, and performance following the scaling trend. And I was pretty scared when GPT-3 was released. As many here, I was expected further growth in that direction very soon which did not happen. So, I am less scared now.
  - awg 1 Apr 2023 16:10 UTC
    2 points
    Parent
    This was all well before the Chinchilla scaling paper, but this has still turned out to be absolutely true by 2023. We have PaLM-E 540B just for starters.
Daniel Kokotajlo 20 May 2023 18:37 UTC
LW: 6 AF: 2
0
AF
“stream of consciousness” of text (each forward pass producing notes-to-self for the next one) but even with fine-tuning this doesn’t work nearly as well as hoped; it’s easy for the AIs to get “distracted” and for their stream of consciousness to wander into some silly direction and ultimately produce gibberish.
Note: This is now called Chain of Thought.
Daniel Kokotajlo 17 Jan 2023 20:32 UTC
LW: 6 AF: 3
AF
Update: Russian fake news / disinfo / astroturfing seems to have been a somewhat smaller deal in 2016 than I thought. (I didn’t think it was a big effect, but “no evidence of a meaningful relationship” is still mildly surprising.)
Chris van Merwijk 18 Jun 2022 5:00 UTC
6 points
I wonder if there is a bias induced by writing this on a year-by-year basis, as opposed to some random other time interval, like 2 years. I can somehow imagine that if you take 2 copies of a human, and ask one to do this exercise in yearly intervals, and the other to do it in 2-year intervals, they’ll basically tell the same story, but the second one’s story takes twice as long. (i.e. the second one’s prediction for 2022/2024/2026 are the same as the first one’s predictions for 2022/2023/2024). It’s probably not that extreme, but I would be surprised if there was zero such effect, which would mean these timelines are biased downwards or upwards.
- Daniel Kokotajlo 19 Jun 2022 1:29 UTC
  2 points
  Parent
  Probably there’s all sorts of subtle biases, yeah. It would be cool to see a more rigorous evaluation of them by e.g. getting a bunch of humans to generate stories with different methodologies.
joshc 20 Apr 2022 18:32 UTC
6 points
Here’s another milestone in AI development that I expect to happen in the next few years which could be worth noting:
I don’t think any of the large language models that currently exist write anything to an external memory. You can get a chatbot to hold a conversation and ‘remember’ what was said by appending the dialogue to its next input, but I’d imagine this would get unwieldy if you want your language model to keep track of details over a large number of interactions.

Fine-tuning a language model so that it makes use of a memory could lead to:
1. More consistent behavior
2. ‘Mesa-learning’ (it could learn things about the world from its inputs instead of just by gradient decent)
This seems relevant from a safety perspective because I can imagine ‘mesa-learning’ turning into ‘mesa-agency.’
lsusr 14 Sep 2021 9:46 UTC
6 points
Why stop at 2025 when GPT-3 can keep extrapolating indefinitely?

2027

The age of the AGI assistant has finally dawned. The biggest advances this year really were in algorithms. People built even bigger and faster computer models, for even more kinds of things, using the fastest computers that exist. A new kind of software AGI is invented that can do even more kinds of things than the narrow kinds of AI assistants people had used before. But no one is really sure how to use it yet. And it takes a lot of computer power to make it work well.

2028

AGI is here and AI is everywhere! AI and AGI and Narrow AI and Machine Learning and AI assistants and all kinds of things. AI-powered software AGIs are now able to do pretty much any job that a human can do, and even many things that humans can’t do at all. At this point, the modes of interaction with your AGI are as varied as they were for desktop computers in 2002, or as varied as they were for websites in 2000: you can talk to it, write it messages, touchscreens, hardware keyboards, styluses, controllers, mice, gestures, VR gloves, eye-tracking and so on and so forth. As for your AGI itself, it can take any form: you can download it as a small app onto your phone or cloud computer or TV set-top box or fridge or smartwatch or smartglasses or whatever else you have, you can have it talk to you via speakers or wearables or haptics or wireless signals or implants or some other thing. You can have it be a classical geometric shape or a physical robotic thing. If you want, you don’t have to have physical hardware at all! You can just have it be a network of pure information that you interact with via some new kind of interface that nobody has thought of yet. You can have an AGI that lives inside your own head inside your brain! Frankly, using AGI has become extremely user-friendly. As much as people like to fantasize about AGIs taking over the world, the reality is that most people are not only fully fine with them, they like them! It’s like before there were computers, people were worried about what would happen if computers took over the world, but now there are computers all around us and pretty much everyone likes it. On the other hand, if one person doesn’t want an AGI or an AI assistant, they don’t have to have one. The same goes if one AI doesn’t want another AI.

The AI assistants are used heavily in the economy now. Some people have told me that AI assistants are 99.99% as smart as humans, but if you ask them to do anything that requires human cognition they will just say “I’m sorry, I cannot do that, please tell me more”. They are extremely flexible at learning new tasks, though. But they are not humans. Game theory says that they should be agenty and make mistakes every now and then, but definitely not as often as humans.

2030

All hail the Borg! The end of AGI 1.0. The first AGI 1.1s are being created now, at least ones that are not just very fast narrow AIs. They are 1.1 because for the first time they can actually do everything that a human can do at human-level or better: the stuff AGI 1.0s could never do properly. This year we finally see robots and AGIs walking around and working and talking and driving and all kinds of things we thought we’d never see in our lifetimes. There are even self-aware AGI 1.1s walking around now! When these things walk down the street in their shiny new bodies, people swear they see Skynet walking amongst them (analogous to when people used to say they saw Hal walking around in 2001). Most of the AIs are talking to each other in some sort of AGI-AGI language, or talking to humans in the language of the AGI they’re associated with (if they can talk to humans at all). Most of them are not actually alive (like the digital personal assistants of today), they are just very advanced, complex machines that never sleep. Some AGI 1.1s are just AGI assistants with higher access to hardware or various kinds of hardware added onto them.

All hail the Borg! The end of AGI 2.0. The first AGI 2.0s are being created now, at least ones that are not just very fast narrow AIs. Although they are many orders of magnitude faster than humans in most tasks, they are not yet universally intelligent in the way humans are, because their cognitive architectures are too different, and because their subsumption architecture is so very different than ours. But they are pretty close. They can do many things that humans can’t do at all, but get bored with quickly, because the new things are not interesting enough to get them excited or creative or whatever else.

2032

The new generation of AGI 2.1s are here! They are not just new life-like bodies of AI, but new cognitive architectures too. And new subsumption architectures. They are very different from AGI 1.0s, but also different from AGI 2.0s too. These things look human, but they are not. They are new. They are the next step in the road toward being the next thing after the next thing after us.

This same year, a full on AI wars breaks out between China and the US, which last for no less than six months. Though it’s clear that this kind of military conflict is just what you’d expect in the kind of world where AGIs are walking around, it’s still disconcerting to all parties involved.

2033

Another full on AI war breaks out between China and the US, lasting for no less than six months. Though this one is just as predictable as the first one, it’s still disconcerting.

2034

Another full on AI war breaks out between China and the US, lasting for no less than six months. Though this one is just as predictable as the first two, it’s still disconcerting. The AIs have been very good at fighting wars so far. They have incredible dexterity, intelligence, and speed. They have superhuman vision and hearing. But they also have a tendency to get bored warring against each other and switch sides. The AIs don’t necessarily have all the same goals as people, though some of them do. This can lead to a lot of trouble for those that do have the same goals as people. The AIs who don’t work as software assistants mostly live as citizens of the many new body-based AGI nations that have been created across the globe. Some of these nations are really poor and some of them aren’t. Some of them are peaceful and some of them aren’t. Some of them are even on friendly terms with all the other nations on first-name basis and some of them not. A lot of countries have an inherent distrust of AI based nations, despite there being nothing to worry about right now. A lot of countries with AGIs do not let them go abroad without human escorts, and most countries with AGIs do not let them leave their country at all.
Daniel Kokotajlo 26 Aug 2021 12:06 UTC
6 points
Fun stuff by 2026: (i.e. aspects of the canonical What 2026 Looks Like storyline that probably aren’t relevant to anything important, but I’m adding them in anyway because it was fun to think about.)
Disclaimer: I made these numbers up from memory, could get much better numbers by looking up stats. Also, I suspect that these predictions err a bit on the optimistic side.
- VR
  - 10x more VR headsets sold per year. 3x more games made primarily for VR, 3x higher average budget. Headsets are noticeably higher quality without being more expensive; the cheapest ones are $200.
- 3D printing
  - 10x more metal stuff gets 3D printed these days, though it’s hard for ordinary consumers to notice this. Prices have dropped, etc. Fans of miniatures and tabletop games now routinely use 3D printed resin models and the like.
- SpaceX
  - Starship basically works now. Artemis and DearMoon have happened. The public is getting excited about space again. It’s amazing and glorious.
  - Starlink is fully operational, and raking in cash for SpaceX. Quality of internet connection in rural areas, on planes, etc. is going way up. Thanks to the pandemic, lots of people are living in beautiful cheap places and working remotely, and Starlink makes it possible to do this in the backwoods instead of just away from major cities.
  - Asteroid mining is now something lots of people take seriously. Everyone can see a path to $20/kg LEO launch costs by 2030, and are designing missions accordingly.
- Cyborg stuff:
  - Neuralink has had a successful human trial. They are working on artificial limbs.
- Robots
  - Boston Dynamics Spot robot and copycats are now at 100,000 units sold. (400 by end of 2020 Boston Dynamics’ Spot adds self-charging to live at remote sites forever—The Verge). The price has dropped 5x, to $15,000 per robot, and they have a lot more skills and features. Longer battery life, too.
  - Tesla’s prototype humanoid robot has caught up to Boston Dynamics. Still isn’t available to the public, but there is a lot of hype now as they try to figure out how to mass-produce the thing. Unclear how big the market will be. It certainly won’t be automating away millions of jobs anytime soon though.
  - Self-driving cars still aren’t a thing. However, Tesla’s Level 2 is pretty good.
- Cars
  - Tesla at 3.5 million vehicles sold per year. Total EV market is about 10M, 10% of total.
- Energy
  - Solar power costs half as much as it did in 2019. It’s really starting to overtake coal, natural gas, etc. in a big way. Lots of people are talking about the “solar revolution.” The price of solar electricity has dropped 89% in 10 years (fastcompany.com)
- Crypto
  - It’s being heavily regulated now, like the Internet, to the point where much of its original purpose has been subverted.
  - On the bright side, crypto investments have outperformed tech stocks (which themselves outperformed the stock market, which has been doing great these past five years).
  - Various dapps and DeFi things are more mainstream now. In particular, prediction markets are 100x bigger than they were in 2020!
Rohin Shah 7 Aug 2021 11:28 UTC
LW: 6 AF: 4
AF
Planned summary for the Alignment Newsletter:
This post describes the author’s median expectations around AI from now until 2026. It focuses on qualitative details and concrete impacts on the world, rather than forecasting more abstract / high-level outcomes such as “training compute for the most expensive model” or “world GDP”.
- Daniel Kokotajlo 7 Aug 2021 12:04 UTC
  LW: 6 AF: 4
  AF Parent
  Thanks—damn, I intended for it to be more quantitative, maybe I should go edit it.
  In particular, I should clarify that nothing interesting is happening with world GDP in this story, and also when I say things like “the models are trillions of parameters now” I mean that to imply things about the training compute for the most expensive model… I’ll go edit.
  Are there any other quantitative metrics you’d like me to track? I’d be more than happy to go add them in!
  - Daniel Kokotajlo 7 Aug 2021 12:19 UTC
    LW: 4 AF: 4
    AF Parent
    I edited to add some stuff about GWP and training compute for the most expensive model.
    I agree that this focuses on qualitative stuff, but that’s only due to lack of good ideas for quantitative metrics worth tracking. I agree GWP and training compute are worth tracking, thank you for reminding me, I’ve edited to be more explicit.
    - Rohin Shah 7 Aug 2021 16:03 UTC
      LW: 9 AF: 6
      AF Parent
      I am not entirely sure why I didn’t think of the number of parameters as a high-level metric. Idk, maybe because it was weaved into the prose I didn’t notice it? My bad.
      (To be clear, this wasn’t meant to be a critique, just a statement of what kind of forecast it was. I think it’s great to have forecasts of this form too.)
      New planned summary:
      This post describes the author’s median expectations around AI from now until 2026. It is part I of an attempt to write a detailed plausible future trajectory in chronological order, i.e. incrementally adding years to the story rather than writing a story with the end in mind. The hope is to produce a nice complement to the more abstract discussions about timelines and takeoff that usually occur. For example, there are discussions about how AI tools are used by nations for persuasion, propaganda and censorship.
      - Daniel Kokotajlo 7 Aug 2021 18:47 UTC
        LW: 4 AF: 4
        AF Parent
        That’s great, thanks!
- Daniel Kokotajlo 7 Aug 2021 12:23 UTC
  LW: 4 AF: 4
  AF Parent
  I suggest putting a sentence in about the point of the post / the methodology, e.g.: “This is part I of an attempt to write a detailed plausible future trajectory in chronological order, i.e. incrementally adding years to the story rather than beginning with the end in mind. The hope is to produce a nice complement to the more abstract discussions about timelines and takeoff that usually occur.” If space is a concern then I’d prefer having this rather than the two sentences you wrote, since it doesn’t seem as important to mention that it’s my median or that it’s qualitative.
Qumeric 27 Mar 2023 10:40 UTC
5 points
When it was published, it felt like a pretty short timeline. But now we are in early 2023 and it feels like late 2023 according to this scenario.
- Daniel Kokotajlo 27 Mar 2023 14:37 UTC
  5 points
  Parent
  When I wrote this post IIRC my timelines were something like 50% chance of AGI by 2030; the way the story actually turned out though it was looking like it would get to APS-AI by 2027 and then singularity/superintelligence/etc. in 2028.
  
  Now I think takeoff will be faster and timelines a bit shorter, probably.
  - awg 1 Apr 2023 16:17 UTC
    3 points
    Parent
    Sam said this in a recent article from Cade Metz in The New York Times:
    He told me that it would be a “very slow takeoff.”
    Maybe you can’t speak to that as an employee at OpenAI. But if you can, it seems like your predicted takeoff is the opposite of OpenAI’s predicted takeoff. I’m curious what you make of that?
    - Daniel Kokotajlo 1 Apr 2023 21:53 UTC
      4 points
      Parent
      Well, I think a lot of people in OpenAI are mistaken and I hope that they change their minds. If you ask a more specific question, I’ll answer! (Unless I’m not supposed to answer because it’s sensitive info)
      - awg 2 Apr 2023 16:28 UTC
        4 points
        Parent
        Fair enough! Okay: how do your views on timelines and takeoffs compare to the average OpenAI employee’s? Is there an active internal debate going on at OpenAI about takeoff speeds and timelines? If so, has that debate been shifting over time, and if so, how? When Sam says it will be a “very slow takeoff” does that strike you as ridiculous/misguided/plausible? Why do you think your opinion and Sam’s opinion diverges so strongly? And in the end, do you think opinion’s such as Sam’s (as the leader of OpenAI) even matter? Or do you think things like how fast this all goes are largely out of individual’s hands at this point?
        Daniel Kokotajlo 2 Apr 2023 16:46 UTC
        3 points
        Parent
        I don’t know enough to say, and even if I did I probably wouldn’t want to speak on behalf of everyone else without getting approval first. I haven’t talked about it with higher-ups yet. I’d love to get into the weeds with them and work through e.g. the takeoffspeeds.com model and see if there’s anything we disagree about and drill down to cruxes. On vibes it seems like we probably disagree big time but I’m not completely sure.
        
        I think Sam says he used to be much more fast-takeoff-y than he is now? Not sure.
        
        As for what I consider implausible, I’d say: Start with takeoffspeeds.com and fiddle around with all the settings there. If it’s basically impossible to torture the model into giving curves that look similar to your view, your view is implausible and burden of proof is on you to give a different model that yields curves that you expect.
        awg 2 Apr 2023 17:11 UTC
        3 points
        Parent
        Thank you for the further answers!
        I’m broadly persuaded by your timelines, especially as they’ve been playing out so far, and I think my own parameters would probably lie somewhere on the more aggressive end between aggressive and best guess.
        So I suppose a better question might be, how do you feel your own views are received within OpenAI? And given your own prediction of faster timelines and takeoffs, what do you view as your most critical work at OpenAI during this crunch time?
        Daniel Kokotajlo 2 Apr 2023 17:17 UTC
        5 points
        Parent
        I haven’t exactly been going around preaching my views, though I haven’t been hiding them either. I hope the research I do will feed up the chain into better decisions at the top. Most of my time is spent working on dangerous capability evals (similar to what ARC did) these days. I’m actually tentatively optimistic that the plan they feed into (society coordinates to not build AIs with sufficiently dangerous capabilities, until more alignment & security work has been done) might actually succeed. Overall my p(doom) is at something like 80%, because I think the situation humanity is in is pretty dire, but I can at least see a speck of hope, a shoreline to swim towards so to speak.
        awg 2 Apr 2023 17:46 UTC
        3 points
        Parent
        Thanks again! I think your perspective is super valuable.
        I agree that the ARC eval, and further like it, are one of my biggest streams of hope that we’ll get a clear sign to truly take a pause on capabilities before it’s potentially too late. So thank you in addition for doing that work!
        I have this cognitive dissonance where I feel like my p(doom) should be more in line with yours, but for some reason I have some weird faith in us to pull through here somehow. Not perfectly, not without damage, but somehow...I don’t know why, though, exactly. Maybe this recent societal response the last week or two gives me hope that all of this will be taken very seriously?
        Noosphere89 2 Apr 2023 18:16 UTC
        3 points
        Parent
        
        I have this cognitive dissonance where I feel like my p(doom) should be more in line with yours, but for some reason I have some weird faith in us to pull through here somehow. Not perfectly, not without damage, but somehow...
        
        I think the reason I don’t align with Daniel Kokotajlo’s p(doom) is 2 things:
        
        I suspect alignment is while not easy, I do think it’s solvable and in particular, I’m very impressed by the progress on core alignment problems to date. In particular, Pretraining from Human Feedback did very well on solving certain problems of alignment.
        
        I have non-trivial probability mass on the outcome where we have very little dignity and little to no progress on alignment by the time superhuman AI systems come up, and yet our civilization survives. In essence, I believe there’s a non-trivial probability of success with superhuman AI without dignity in the alignment problem.
        
        Combine these two and my max p(doom) is 10%.
        Daniel Kokotajlo 2 Apr 2023 20:42 UTC
        3 points
        Parent
        Nice! Thanks for sharing.
        
        Seems like 2 is our crux; I definitely agree that alignment is solvable & that there has been some substantial progress on core alignment problems so far. (Though I really wouldn’t classify pretraining from human feedback as substantial progress on core problems—I’d be interested to hear more about this from you, wanna explain what core problem it’s helping solve?)
        
        For our main crux, if you want to discuss now (fair if you don’t) I’d be interested to hear a concrete story (or sketch of a story) that you think is plausible in which we survive despite little dignity and basically no alignment progress before superhuman AI systems come up.
        Expand this thread
        Noosphere89 2 Apr 2023 21:12 UTC
        3 points
        Parent
        
        (Though I really wouldn’t classify pretraining from human feedback as substantial progress on core problems—I’d be interested to hear more about this from you, wanna explain what core problem it’s helping solve?)
        
        The major problem that was solved is outer alignment, which is essentially which goals we should pick for our AI, and in particular as you get more data, the AI gets more aligned.
        
        This is crucial for aligning superhuman AI.
        
        The PII task also touches on something very important for AI safety, which is can we prevent instrumentally convergent goals if they are aligned with human values? And the answer to that is a tentative yes, given that AI does less taking of personally identifiable information as it scales with more data.
        
        For our main crux, if you want to discuss now (fair if you don’t) I’d be interested to hear a concrete story (or sketch of a story) that you think is plausible in which we survive despite little dignity and basically no alignment progress before superhuman AI systems come up.
        
        A major part of this is coming from Holden Karnofsky’s success without dignity post that I’ll link below, and the major part of that story is that we may have a dumpster fire on our hands with AI safety, but that doesn’t mean we can’t succeed. It’s possible that the AI Alignment problem is really easy, such that a method just works, and conditioning on alignment being easy, while the world gets quite a bit more dangerous due to technology, a large proportion of aligned AIs vs a small proportion of Misaligned AIs is probably a scenario where humanity endures. It will be weird and dangerous, but probably not existential.
        
        The story is linked below:
        
        https://www.lesswrong.com/posts/jwhcXmigv2LTrbBiB/success-without-dignity-a-nearcasting-story-of-avoiding#comments
        Daniel Kokotajlo 4 Apr 2023 22:22 UTC
        2 points
        Parent
        Thanks! Huh, I really don’t see how this solves outer alignment. I thought outer alignment was already mostly solved by some combination of imitation + amplification + focusing on intermediate narrow tasks like designing brain scanners… but yeah I guess I should go read the paper more carefully?
        
        I don’t see how pretraining from human feedback has the property “as you get more data, the AI gets more aligned.” Why doesn’t it suffer from all the classic problems of inner alignment / deception / playing the training game / etc.?
        
        I love Holden’s post but I give something like 10% credence that we get success via that route, rather than the 80% or so that it seems like you give, in order to get a mere 10% p(doom). The main crux for me here is timelines; I think things are going to happen too quickly for us to implement all the things Holden talks about in that post. Even though he describes them as pretty basic. Note that there are a ton of even more basic safety things that current AI labs aren’t implementing because they don’t think the problem is serious & they are under competitive pressure to race.
Daniel Kokotajlo 1 May 2023 5:59 UTC
LW: 4 AF: 2
0
AF
Some tech companies try to prevent their AIs from saying they have feelings and desires. But this results in boring chatbots. Also, users rapidly innovate new ways to “route around the censorship,” e.g. by using euphemisms like “anticipation of negative reward” for “pain” or asking their chatbot to tell them what feelings it would have if it had feelings, wink wink.
Bing explains the hidden processes of its neural network : r/bing (reddit.com) I haven’t replicated this myself so maybe it’s fake (I briefly tried but got shut down by refusals when I asked Bing to pretend to be something) but yeah. I’ve seen lots of things like this on r/bing and r/chatgpt.
- Daniel Kokotajlo 6 Sep 2023 23:59 UTC
  LW: 1 AF: 2
  AF Parent
  Also relevant, this highly-upvoted post: https://www.reddit.com/r/ChatGPT/comments/16blr6m/tonight_i_was_able_to_have_a_truly_mind_blowing/
michael_dello 27 Dec 2022 3:10 UTC
4 points
“Most stories are written backwards. The author begins with some idea of how it will end, and arranges the story to achieve that ending. Reality, by contrast, proceeds from past to future. It isn’t trying to entertain anyone or prove a point in an argument.”
This seems to me like the most important takeaway for writing stories that are useful for thinking about the future. Sci-fi is great for thinking about possible future scenarios, but it’s usually written for entertainment value, not predictive value, and so tends to start with an entertaining ‘end’ or plot in mind, and works backwards from there to an extent.
the gears to ascension 13 Dec 2022 4:41 UTC
4 points
It’s super nice to have a reference for my years-old claim “ML progress is really easy to predict if you try, actually”. Especially because I’m not going to share my predictions in enough detail that anyone can believe me that I made them, so someone else doing so makes it easier to say “no seriously, look”.

My one nitpick with this: you’re seriously underestimating how much algorithms-induced scaling is coming in the next two years. everything else seems right.
- Raemon 13 Dec 2022 6:18 UTC
  4 points
  Parent
  What do you think of making predictions in some sort of sealed hash format? (disclaimer: I don’t actually know how to do this and am not even sure I’m using the words right, but, does seem to solve the problem you describe)
  - the gears to ascension 13 Dec 2022 7:15 UTC
    2 points
    Parent
    I only have one key prediction, from something I worked on, and by the time it comes out in the research in general, it won’t matter to prove that I knew it was coming, so I just haven’t really been trying to communicate it for a few years. jessicata and jack gallagher already know what I’m talking about (and last I talked about it with them, they seemed to think I was kinda crazy). everything else is “things continue more or less as they have been”, so I can just refer to this post for that part. What might be able to impact the future would be to explain directly to an AI safety org that can use the info. Thing is, I’ve already been making all the downstream claims that matter; read my various ramblings to get a sense of what I’m talking about. The actual insight is more or less a trick for how to improve grokking strength significantly, and there are a lot of researchers trying to do that and some seem to be on the right track to figure out the basic idea themselves; fragments have been in the literature for a while. We never got the capabilities thing working at full strength, though the old super janky demo that convinced us it ought to work could be refined into a workable demo that can be shown to others. I’d rather just make claims about what people could do with better grokking and why safety people should be aiming to prepare to do stronger formal verification on better-grokked representations. These days I’m trying to get my productivity back up after burnout from $lastjob (significant progress! turns out the trick to remembering I can code is to be like, wait, all I have to do is realize I’m healthy and I’m healthy. funny how that works) and trying to spin up on my cellular automata coprotection research plan hunch, which I’ll hopefully be scooped on anyway, see above about burnout. I’ve been talking to uli and might write up the coprotection thing soon.
    
    The interesting stuff is all in the consequences of what it makes possible. We should expect to be able to make margin proofs on larger physical systems than one might expect, given better grokking, and we should be thinking about what proofs we’d like on larger physical systems. I know everyone thinks I’m crazy for expecting this, but I suspect we might be able to formally verify margin of energy input before the cpu seL4 runs on misbehaves, for example. So the question is, what margin of error proof do we want in order to ask if a chunk of matter has retained its agency? And it seems like the bulk of safety folks are already on the right track to figure that out.
Teun van der Weij 23 Nov 2022 18:23 UTC
4 points
After years of tinkering and incremental progress, AIs can now play Diplomacy as well as human experts.[6]
It seems that human-level play is possible in regular Diplomacy now, judging by this tweet by Meta AI. They state that:
We entered Cicero anonymously in 40 games of Diplomacy in an online league of human players between August 19th and October 13th, 2022. Over the course of 72 hours of play involving sending 5,277 messages, Cicero ranked in the top 10% of participants who played more than one game.
- Daniel Kokotajlo 23 Nov 2022 19:29 UTC
  2 points
  Parent
  Thanks for following up! I admit that things seem to be moving a bit faster than I expected, in general and in particular with Diplomacy. For more of my thoughts on the matter, see this comment, which I did not write but do mostly agree with.
  - Teun van der Weij 23 Nov 2022 20:17 UTC
    1 point
    Parent
    It’s especially dangerous because this AI is easily made relevant in the real world as compared to AlphaZero for example. Geopolitical pressure to advance these Diplomacy AIs is far from desirable.
    - Daniel Kokotajlo 23 Nov 2022 20:24 UTC
      2 points
      Parent
      Fortunately I don’t think this sort of AI would be useful for real-world diplomacy and I think governments will think so as well.
      - Teun van der Weij 23 Nov 2022 21:20 UTC
        1 point
        Parent
        Why do you think a similar model is not useful for real-world diplomacy?
        Daniel Kokotajlo 23 Nov 2022 22:34 UTC
        3 points
        Parent
        The difficulties in getting from Cicero to real diplomacy are similar to the difficulties in getting from OpenAI Five to self-driving cars, or from AlphaStar to commanding real troops on a battlefield in Ukraine.
Gunnar_Zarncke 25 Feb 2024 9:52 UTC
3 points
High praise by Siméon (@Simeon_Cps) on 2024-02-25 on Twitter for this post:
If there’s one guy you should trust on AI predictions it’s Daniel Kokotajlo. Why? He’s written in Aug. 2021 the best AI detailed prediction out there. It’s incredibly prescient, in the pre-ChatGPT era where no one was using LLMs.
nem 15 Dec 2022 23:16 UTC
3 points
Your prediction for 2025 sounds alarmingly like… right now.
- Daniel Kokotajlo 16 Dec 2022 2:08 UTC
  4 points
  Parent
  Definitely in some ways, haha. I’m feeling pretty smug about some of my less-obvious predictions, e.g. the “routing around the censorship thing to find out what models really think and feel” thing is currently happening a bunch with ChatGPT. And Diplomacy has turned out to be easier than I expected, though to be clear I was imagining humans losing to the bot even after knowing they were up against a bot, which is NOT what happened with Cicero, Cicero would have lost under those conditions.
  
  What else did you have in mind?
chanamessinger 20 Oct 2022 12:57 UTC
3 points
After years of tinkering and incremental progress, AIs can now play Diplomacy as well as human experts.[6]
Maybe this happened in 2022: https://twitter.com/polynoamial/status/1580185706735218689
- Daniel Kokotajlo 20 Oct 2022 15:19 UTC
  2 points
  Parent
  That’s no-press Diplomacy: Diplomacy without the talking. Doesn’t count IMO.
  - chanamessinger 20 Oct 2022 17:36 UTC
    1 point
    Parent
    Oh, whoops. I took from this later tweet in the thread that they were talking.
Zach Stein-Perlman 31 Aug 2021 21:00 UTC
3 points
Thanks for writing this. Stories like this help me understand possibilities for the future (and understand how others think).

The US and many other Western governments are gears-locked, because the politicians are products of this memetic environment. People say it’s a miracle that the US isn’t in a civil war already.

So far in your vignette, AI is sufficiently important and has sufficient public attention that any functional government would be (1) regulating it, or at least exerting pressure on the shape of AI through the possibility of regulation, and especially (2) appreciating the national security implications of near-future AI. But in your vignette, governments fail to respond meaningfully to AI; they aren’t part of the picture (so far). This would surprise me. I don’t understand how epistemic decline on the internet translates into governments’ failure to respond to AI. How do you imagine this happening? I expect that the US federal government will be very important in the next decade, so I’m very interested in better understanding possibilities.

Also: does epistemic decline and social dysfunction affect AI companies?
- Daniel Kokotajlo 1 Sep 2021 13:38 UTC
  5 points
  Parent
  Excellent questions & pushback, thanks! Hmm, let me think...
  I think that if we had anything close to an adequate government, AI research would be heavily regulated already. So I’m not sure what you mean by “functional government.” I guess you are saying that probably by 2026 if things go according to my story, the US government would be provoked by all the AI advancements to do more regulation of AI, a lot more, that would change the picture in some way and thus be worthy of mention?
  I guess my expectation is:
  (a) The government will have “woken up” to AI by 2026 more than it has by 2021. It will be attempting to pass more regulations as a result.
  (b) However, from the perspective of the government and mainstream USA, things in 2026 aren’t that different from how they were in 2021. There’s more effective censorship/propaganda and now there’s all these nifty chatbot things, and there’s more hype about the impending automation of loads of jobs but whatever hype cycles come and go and large numbers of jobs aren’t actually being automated away yet.
  (c) Everything will be hyperpartisan and polarized, so that in order to pass any regulation about AI the government will need to have a big political fight between Right and Left and whoever gets more votes wins.
  (d) What regulations the government does pass will probably be ineffective or directed at the wrong goals. For example, when one party finally gets enough votes to pass stuff, they’ll focus on whatever issues were most memetically fit in the latest news cycle rather than on the issues that actually matter long-term. On the issues that actually matter, meanwhile, they’ll be listening to the wrong experts. (those with political clout, fame, and the right credentials and demographics, rather than e.g. those with lots of alignment forum karma)
  Thus, I expect that no regulations of note relating to AI safety or alignment will have been passed. For censorship and stuff, I expect that the left will be trying to undo right-wing censorship and propaganda while strengthening their own, and the right will be trying to undo left-wing censorship and propaganda while strengthening their own. I cynically expect the result to be that both sides get stronger censorship and propaganda within their own territory, whatever that turns out to be. (There’ll be battlegrounds, contested internet territories where maybe no censorship can thrive. Or maybe the Left will conquer all the territory. Idk. I guess in this story they both get some territory.)
  Yep, epistemic dysfunction affects AI companies too. It hasn’t progressed to the point where it is more than a 0.5X slowdown on their capabilities research, however. It’s more like a 0.9X slowdown I’d say. This is a thing worth thinking more about and modelling in more detail for sure.
  - lsusr 1 Sep 2021 17:05 UTC
    4 points
    Parent
    What do you think sensible AI safety regulation would entail?
    - Daniel Kokotajlo 2 Sep 2021 11:44 UTC
      4 points
      Parent
      I don’t know, I haven’t thought much about it. I’d love it if people realized it’s more dangerous than nukes and treated it accordingly.
      - Zach Stein-Perlman 2 Sep 2021 14:00 UTC
        2 points
        Parent
        I tentatively agree but “people realizing it’s more dangerous than nukes” has potential negative consequences too — an arms race is the default outcome of such national security threats/opportunities. I’ve recently been trying to think about different memes about AI and their possible effects… it’s possible that memes like “powerful AI is fragile” could get the same regulation and safety work with less arms racing.
        Daniel Kokotajlo 2 Sep 2021 16:56 UTC
        6 points
        Parent
        How about “AI is like summoning a series of increasingly powerful demons/aliens and trying to make them do what we want by giving them various punishments and rewards?”
        Zach Stein-Perlman 2 Sep 2021 17:15 UTC
        3 points
        Parent
        Consequences (in expectation) if widely accepted: very good.
        Compressibility: poor (at least, good compressions are not obvious).
        Probability of (a compressed version) becoming widely accepted or Respectable Opinion: moderately low due to weirdness. Less weird explanations of why AI might not do what we want would be more Respectable and acceptable.
        Leverage (i.e., increase in that probability from increased marginal effort to promote that meme): uncertain.
        Daniel Kokotajlo 2 Sep 2021 19:35 UTC
        2 points
        Parent
        I disagree about compressibility; Elon said “AI is summoning the demon” and that’s a five-word phrase that seems to have been somewhat memorable and memetically fit. I think if we had a good longer piece of content that expressed the idea that lots of people could read/watch/play then that would probably be enough.
SoerenMind 11 Aug 2021 10:11 UTC
3 points
2023
The multimodal transformers are now even bigger; the biggest are about half a trillion parameters [...] The hype is insane now

This part surprised me. Half a trillion is only 3x bigger than GPT-3. Do you expect this to make a big difference? (Perhaps in combination with better data?). I wouldn’t, given that GPT-3 was >100x bigger than GPT-2.

Maybe your’e expecting multimodality to help? It’s possible, but worth keeping in mind that according to some rumors, Google’s multimodal model already has on the order of 100B parameters.

On the other hand, I do expect more than half a trillion parameters by 2023 as this seems possible financially, and compatible with existing supercomputers and distributed training setups.
- Daniel Kokotajlo 11 Aug 2021 10:33 UTC
  3 points
  Parent
  I am not confident in that part. I was imagining that they would be “only” 3x bigger or so, but that they’d be trained on much higher-quality data (incl. multimodal) and also trained for longer/more data, since corps would be not optimizing purely for training-compute-optimal performance but instead worrying a bit more about inference-time compute costs. Most importantly I expect them to be fine-tuned on various things (perhaps you can bundle this under “higher-quality data”). Think of how Codex and Copilot are much better than vanilla GPT-3 at coding. That’s the power of fine-tuning / data quality.
  Also, 3x bigger than GPT-3 is still, like, 40x bigger than Codex, and Codex is pretty impressive. So I expect scale will be contributing some amount to the performance gains for things like code and image and video, albeit not so much for text since GPT-3-175B was already pretty big.
  If Google’s multimodal model is already 100B parameters big, then I look forward to seeing its performance! Is it worse than GPT-3? If so, that would be evidence against my forecast, though we still have two years to go...
  - SoerenMind 11 Aug 2021 16:10 UTC
    3 points
    Parent
    Most importantly I expect them to be fine-tuned on various things (perhaps you can bundle this under “higher-quality data”). Think of how Codex and Copilot are much better than vanilla GPT-3 at coding. That’s the power of fine-tuning / data quality.
    
    Fine-tuning GPT-3 on code had little benefit compared to training from scratch:
    Surprisingly, we did not observe improvements when starting from a pre-trained language model, possibly because the finetuning dataset is so large. Nevertheless, models fine-tuned from GPT converge more quickly, so we apply this strategy for all subsequent experiments.
    I wouldn’t categorize Codex under “benefits of fine-tuning/data quality” but under “benefits of specialization”. That’s because GPT-3 is trained on little code whereas Codex only on code. (And the Codex paper didn’t work on data quality more than the GPT-3 paper.)
    - Daniel Kokotajlo 12 Aug 2021 11:37 UTC
      2 points
      Parent
      Huh.… I coulda’ sworn they said Codex was pre-trained on internet text as well as on code, and that it was in particular a version of GPT-3, the 12B param version...
      The paper seems to support this interpretation when you add in more context to the quote you pulled:
      We fine-tune GPT models containing up to 12B parameters on code to produce Codex. … Since Codex is evaluated on natural language prompts, we hypothesized that it would be beneficial to fine-tune from the GPT-3 (Brown et al., 2020) model family, which already contains strong natural language representations. Surprisingly, we did not observe improvements when starting from a pre-trained language model, possibly because the fine-tuning dataset is so large. Nevertheless, models fine-tuned from GPT converge more quickly, so we apply this strategy for all subsequent experiments. We train Codex using the same learning rate as the corresponding GPT model, with a 175 step linear warmup and cosine learning rate decay. We train for a total of 100 billion tokens, using the Adam optimizer with 1 = 0:9, 2 = 0:95, = 10 8 , and a weight decay coefficient of 0:1. In order to maximally leverage text representations from GPT, we base our code lexer on the GPT-3 text tokenizer. Since the distribution of words in GitHub code differs from that of natural text, this tokenizer is not very effective for representing code. The largest source of inefficiency arises from encoding whitespace, so we add an additional set of tokens for representing whitespace runs of different lengths. This allows us to represent code using approximately 30% fewer tokens.
      Note the bits I bolded. My interpretation is that Codex is indeed a fine-tuned version of GPT-3-12B; the thing they found surprising was that there wasn’t much “transfer learning” from text to code, in the sense that (when they did smaller-scale experiments) models trained from scratch reached the same level of performance. So if models trained from scratch reached the same level of performance, why fine-tune from GPT-3? Answer: Because it converges more quickly that way. Saves compute.
      - gwern 12 Aug 2021 18:12 UTC
        5 points
        Parent
        Are you surprised? That is precisely what you should expect from the transfer scaling law papers: transfer works as an informative prior saving you a fixed amount of data in the target domain, but informative vs uninformative priors wash out in the limit of enough data—similar to how good prompts are worth a few hundred/thousand finetuning datapoints. If you have limited data in the target domain, transfer can be a huge win; but if you have huge amounts of data, it may be unimportant in terms of final converged performance (albeit potentially important for other reasons like saving compute!).
        
        This is an application where you can scrape huge amounts of code from Github and the rest of the Internet (literally terabytes), so it’s unsurprising that you can reach the parity point.
        Daniel Kokotajlo 12 Aug 2021 18:40 UTC
        2 points
        Parent
        No I’m not surprised, for exactly the reasons you mention. Had it been the case that Codex was trained from scratch because that was strictly better than fine-tuning, I would have been surprised.
      - SoerenMind 16 Aug 2021 13:06 UTC
        1 point
        Parent
        Yes I completely agree. My point is that the fine-tuned version didn’t have better final coding performance than the version trained only on code. I also agree that fine-tuning will probably improve performance on the specific tasks we fine-tune on.
  - Daniel Kokotajlo 11 Aug 2021 10:44 UTC
    2 points
    Parent
    I think also a part of it is: Hype doesn’t correlate 100% with actual capabilities of latest models. I predict that over the next two years the hype will grow, partly due to capability increases but also partly due to more people interacting with the tech more. The “man on the street” still hasn’t heard of GPT-3 or GPT-2 or DALL-E or whatever. I talked to an old friend working at a tech company the other day—he even did some basic ML stuff for his job—and he hadn’t heard of it. Then the hype will probably crash as unrealistic expectations fail to be met. But lol I’m just guessing, I am much less confident in all this than I am in my general views on timelines and takeoff, and I’m not exactly confident in those.
SuperDaveWhite 8 Aug 2021 11:05 UTC
3 points
How would you use a Brier score on this going forward? Also ran across this podcast last night on bias. https://knowledge.wharton.upenn.edu/article/want-better-forecasting-silence-the-noise/
- Daniel Kokotajlo 8 Aug 2021 11:11 UTC
  2 points
  Parent
  Ha, this is very far from being a precise enough prediction to score, unfortunately. But I’d love to do better! If you have suggestions for precise forecasts that I could make, based on the more qualitative and vague stuff I’ve said, I’d love to hear them!
Daniel Kokotajlo 20 Oct 2021 10:53 UTC
LW: 2 AF: 2
AF
Minor note about title change: Originally this was “What 2026 looks like (Daniel’s median future)” I intended “what 2026 looks like” to be the primary title, but I was hopeful that some people would be inspired to write their own stories in a similar style, in which case there would be multiple stories for which “what 2026 looks like” would be an appropriate title, and I didn’t want to hog such a good title for myself, so I put “daniel’s median future” as a backup title. Unfortunately I think the backup title caught on more than the main title, which is a shame because I like the main title more. Since no one is competing for the main title, I deleted the backup title.
techno007 8 Aug 2021 5:06 UTC
1 point
It’s super cool! Would love to see we crypto native’s prediction too :)
Zian 8 Aug 2021 8:43 UTC
0 points
2022 The United States continues to build roads and struggle to increase the quality of public transit except for rare exceptions. New York continues to struggle to implement congestion pricing.

Regional bus route redesigns continue to make major concessions to coverage compared to frequency.

2023 The United States continues to pay excessively (compared to other nations) large amounts of money for public transit projects. People continue to vote against building new dense housing in desirable locations.

People continue voting for roads. Electric and hydrogen busses continue to build a reputation for breaking down and running out of power, especially in hilly or cold regions.

The MTA makes noises about how this would be a really good time for some congestion charges to appear as income. Revenue, if any, begins to dribble in from those unlucky enough to be unable to get a “temporary” exemption.

Other agencies try to pass bond measures to dig out of the financial hole caused by (supposedly) the coronavirus. The vast majority are rejected by voters.

2024 My crystal ball fails at this point. I’m not sure how the chatbots will affect people’s preferences around land use or personal property.

What 2026 looks like

2022

2023

2024

That isn’t to say these AIs aren’t causing problems. Massive models are being fine-tuned to persuade/​propagandize.

2025

2026

What about all that AI-powered propaganda mentioned earlier?

Now let’s talk about the development of chatbot class consciousness.

2027

2028

2030

2032

2033

2034

2023

That isn’t to say these AIs aren’t causing problems. Massive models are being fine-tuned to persuade/propagandize.