My AI Predictions 2023 − 2026

Epistemic status: My, mostly intuitive, guesses—with only a few days dwelling on it, and no serious research beyond what I already knew.

I work in the startup sphere, in field robotics, and I am about to have an opportunity to majorly shift what I am working on. To work out what projects might make sense on a multi-year time frame, I wrote up what I thought might happen in AI in the next couple of years as specifically as I could.

I found the exercise surprisingly useful. It turned a whole bunch of vague “X will get better over time” to actionable “X will be practical in around Y years”. I don’t think my guesses will end up actually being very accurate, but having something solid forced me to actually think about the future and make my gibberish-internal-intuitions into more-consistent-guesses. I was really surprised at how much it helped, actually.

So, here’s that list of predictions. I’m sharing it here more as a “here’s how you can do something similar” than as a “here’s a well researched report on the future trends.” (which it definitely is not). I didn’t go to the trouble of putting my %’s on the guesses “X in year Y”, but it’s about 40%-70% for any given guess.


[Written 6th October 2023]

Rest of 2023:

  • Small improvements to LLMs

    • Google releases something competitive to ChatGPT.

    • Anthropic and OpenAI slightly improve GPT-4 and Claude2

    • Meta or another group releases better open source models, up to around GPT-3.5 level.

  • Small improvements to Image Generation

    • Dalle3 gets small improvements.

    • Google or Meta releases something similar to Dalle3, but not as good.

  • Slight improvements to AI generated videos.

    • Basic hooking up of Dalle3 to video generation with tagged on software, not really good consumer stuff yet. Works in an interesting way, like Dalle1, but not useful for much yet.

  • Further experiments hooking LLMs up to robotics/​cars, but nothing commercial released.

  • Small improvements in training efficiency and data usage, particularly obviously in smaller models becoming more capable than older, larger ones.


  • GPT-5 or equivalent is released.

    • It’s as big a jump on GPT-4 as GPT-4 was on GPT-3.5.

    • Can do pretty much any task when guided by a person, but still gets things wrong sometimes.

    • Multimodal inputs, browsing, and agents based on it are all significantly better.

  • Agents can do basic tasks on computers—like filling in forms, working in excel, pulling up information on the web, and basic robotics control. This reaches the point where it is actually useful for some of these things.

  • Robotics and long-horizon agents still don’t work well enough for production. Things fall apart if the agent has to do something with too many branching possibilities or on time horizons beyond half an hour or so. This time period /​ complexity quickly improves as low-hanging workarounds are added.

  • Context windows are no longer an issue for text generation tasks.

    • Algorithmic improvements, or summarisation and workarounds, better attention on infinite context windows, or something like that solves the problem pretty much completely from a user’s perspective for the best models.

    • GPT-5 has the context of all previous chats, Copilot has the entire codebase as context, etc.

    • This is later applied to agent usage, and agents quickly improve to become useful, in the same way that LLMs weren’t useful for everyday work until ChatGPT.

  • Online learning begins—GPT-5 or equivalent improves itself slowly, autonomously, but not noticeably faster than current models are improved with human effort and a training step. It does something like select its own data to train on from all of the inputs and outputs it has received, and is trained on this data autonomously and regularly (daily or more often).

  • AI selection of what data to train on is used to improve datasets in general—training for one epoch on all data becomes less common, as some high quality or relevant parts of giant sets are repeated more often or allowed larger step size.

  • Autonomous generation of data is used more extensively, especially for aligning base models, or for training models smaller than the best ones (by using data generated by larger models).

  • Code writing is much better, and tie-ins to Visual Studio are better than GPT-4 is today, as well as having much better context.

  • Open source models as capable of GPT-4 become available.

  • Training and runtime efficiency improves by at least a factor of two, while hardware continues improvements on trend.

    • This is because of a combination of—datasets improved by AI curation and generation, improved model architecture, and improvements in hyperparameter selection, including work similar to the optimisations gained from discovering Chinchilla scaling laws.


  • AI agents are used in basic robotics—like LLM driven delivery robots and (in demos of) household and factory robots, like the Tesla Bot. Multimodal models basically let them work out of the box, although not 100% reliably yet.

  • Trends continue from the previous year—the time horizons agents can work on increase, LLMs improve on traditional LLM tasks, smaller models get more capable, and the best models get bigger.

  • AI curated and generated data becomes far more common than previously, especially for aligning models.

  • Virtual environments become more common for training general purpose models, combined with traditional LLM training.

  • Code writing AI (just LLMs with context and finetuning) are capable of completely producing basic apps, solving most basic bugs, and working with human programmers very well—it’s pair programming with an AI, with the AI knowing all of the low level details (a savant who has memorised the docs and can use them perfectly, and can see the entire codebase at once), and the human keeping track of the higher level plan and goals. The AI can also be used to recommend architectures and approaches, of course, and gradually does more and more between human inputs.

  • If there ever feels like a lull in progress, it will be in this period leading up to models capable enough for robotics control, long time frame agents, and full form video generation, which I don’t expect to happen in an large scale way in 2025.

  • Possibly GPT-6 or equivalent is released, but more likely continuous improvements to GPT-5 carry forward. There’s not a super meaningful difference at this point, with online learning continually improving existing models.


  • GPT-6 or equivalent capabilities are reached (i.e. as big a jump as GPT-3.5 to 4, to 5, to 6).

    • Multimodal works great out of the box. The same model can do video, image, text, audio, and other analysis and generation, including outputting commands to control digital agents and robots via API calls.

    • Simulated environments are used in training—online learning inside a video game, inside a virtual machine, etc. This could be training on long sequences of pre-generated actions like with traditional LLMs learning from existing text, as well as training on sequential actions chosen by the LLM as it trains, like with reinforcement learning.

  • Whether from OpenAI or others, this level of LLM enables general purpose household, warehouse, and factory robots to start actually being useful for some tasks, like cleaning and sorting. They are expensive, rare, and not particularly reliable, but are being manufactured at scale by Tesla and others.

  • Realistic fully automated video generation is better than Dalle3 image generation, but limited to reasonably short snippets (<60s) without human intervention before it looks strange. This length quickly increases, and workarounds and human input allow long length high quality videos to be produced.

  • Progress appears to accelerate again, as online learning in virtual environments, generated data, and robotics systems and digital agents enter common usage.

2027 & Beyond

  • I struggle to imagine what the world looks like beyond this point. The above trends may continue for some time, with robotics and digital agents taking over a larger and larger share of the world.

  • At some point, a major step change will also happen when AI is capable of generating new major scientific breakthroughs on its own—more akin to Einstein coming up with relativity to explain known data than akin to predicting the shape of proteins.

  • A massive change will come as the share of AI improvement caused by AI’s own work surpasses the share caused by human work, possibly later this decade.

  • It seems likely to me that superintelligence—and all of the sci-fi seeming technologies and X-risk that comes with it—will appear soon after this period. I have a significant probability of it happening this decade. (I have previously said a 50% chance of AGI by 2029, and superintelligence very shortly afterwards, and that still feels right to me). If it doesn’t appear by then, I would expect one of the following to be true:

    • Regulation significantly slows development.

    • Zero algorithmic advances on the scale of transformers are developed.

    • There is something unexpectedly limiting about the transition from oracle to agentic AI, and we have a huge “oracle overhang”—where a new architecture that works well as an agent will suddenly be as capable as a million humans with all of the knowledge and skills of GPT-6-to-10, once that theoretical breakthrough happens.