Beliefs and state of mind into 2025

This post is to record the state of my thinking at the start of 2025. I plan to update these reflections in 6-12 months depending on how much changes in the field of AI.

1. Summary

It is best not to pause AI progress until at least one major AI lab achieves a system capable of providing approximately a 10x productivity boost for AI research, including performing almost all tasks of an AI researcher. Extending the time we remain in such a state is critical for ensuring positive outcomes.

If it was possible to stop AI progress sometime before that and focus just on mind uploading, that would be preferable, however I don’t think that is feasible in the current world. Alignment work before such a state suffers from diminishing returns from human intelligence and the lack of critical data on how things will actually play out.

2. When and how to pause?

The simple position is that superintelligent AI will be dangerous, therefore we should stop doing it, or at least pause until we figure out more. However I am genuinely unsure how long to pause, and when. I think the most important thing is having mildly superintelligent AI to help solve alignment, and staying at the stage as long as practical.

Just because something is dangerous, making it slower doesn’t make it safer. For example making childbirth take one week would obviously be far worse. The details determine what is the best course. The major reason against an immediate pause in AI is that it would likely apply to software, not hardware and increase the hardware overhang, without giving a counteracting increase in safety to make it worthwhile.

2.1 Diminishing returns

2.1.1 Background on when progress gives diminishing returns

For a lot of tech, progress is linear to exponential. We are used to steady progress, and the expectation of steady progress, and make plans accordingly. Often progress comes from technology and processes building on themselves. A common example of that is Moore’s law where the existing tools which use the current chips are essential to build the new better chips. However sometimes growth can be slower than linear, with diminishing returns, even with the best planning and resources.

The clearest example of this is probably pure mathematics. Unlike in technology where a new machine benefits all in field, a genius proving a hard theorem does not automatically help school children learn their times tables or beginner algebra at all. Instead it makes the gap from a novice to the leading edge of humanities knowledge greater than it was before. This means that it takes longer for a novice to reach the boundary than before, and excludes ever more people from contributing at all, as they simply cannot reach such a level even with unlimited time. In the limit, with fixed human intelligence and population rather then steady progress, we get diminishing returns to almost completely stalled progress. There will be so much accumulated knowledge required to reach the boundary of human knowledge that pretty much no-one would even be able to reach it, before thinking about extending it. Furthermore even though some knowledge will be stored in text, it is possible that no-one alive would actually understand it if the field went out of fashion. I believe for math we are already seeing something like that, and to me this is clearly happening in fundamental physics.

Physics is a bit different because experimental data can guide theories, however there needs to be enough data to make a difference. Say we have an important experiment with a true/​false result. The actual result when known will not double the progress as both options will already have been considered beforehand and likely both already be in diminishing returns territory. For example the LHC discovered the Higgs, (totally expected, so didn’t change much) and the absence of low energy SUSY which was not so expected, but not enough data to help make major progress. You could argue there has been 40+ years of little progress. I would argue then there will likely be even less in the next 40 years with fixed human intelligence. Or in other words, quantum gravity etc will not be solved by unmodified humans, but instead is almost certain to be done by some kind of ASI.

2.1.2 Diminishing returns and alignment

I believe there is clear evidence of this happening with alignment research and progress. The 5-10 years before GPT3.5/​4 gave us more than the 0-5 before. Major groups like MIRI essentially seem to have given up.

If alignment research is similar to other fields, then an unlimited period of time before GPT4, that is without actual data would not have lead to major further progress. It would in fact quite likely entrench existing ideas some of which will likely be wrong. From an outside/​meta level for a new field without the needed experimental results you would not expect all theories to be correct.

Therefore a simple pause on AI capabilities to allow more time for alignment wouldn’t have helped.

2.2 Pause at each step?

One way is to pause with each major advance and allow alignment to advance to diminishing returns territory, with that hope that there will be enough progress to align a superintelligence at that stage and continue with capabilities if not. There are problems I see with this

2.2.1. Slowing down progress increases the overhang

Here I include the integration of robots in society, for example humanoid robots in all stages of the supply chain in what I call the overhang. Not just increases in computing hardware. With constant computing hardware but increasing robot integration takeover risk increases.

2.2.2 Society may be unstable at any level of pre-AGI technology from now until alignment is solved

There are known sustainability issues, however the unappreciated ones may be greater.

Centralization could be irreversible

Regimes like North Korea will be even worse and technically possible with AI. Imagine NK but with everyone with a constantly listening smartphone matched to a LLM. Any kind of opposition to authority would be simply impossible to coordinate. Once a democracy failed there would be no going back, and with time the number of NK like states would accumulate.

Society could adapt badly to pre-AGI

For example, also AI partners, disinformation polarization etc could lead to fragmentation of society. If anything, with time we seem to be having more issues as a society. If this is true, then our society would be better than a future one at deciding what the post-Singularity world should look like. The more fragmented and polarized society becomes the less clear it is what our CEV is and how to achieve it. We do not appear to be getting smarter, wiser, less warlike or well adjusted so we should make important decisions now rather than later.

2.3 Pause at the point where AI increases AI researcher productivity by about 10X and aim to maximize time there.

If alignment can’t be solved pre-AGI, and we can’t wait for WBE, then what is the optimum course? To me it is maximizing the time where AI is almost very dangerous – that is the time we can learn the most because we get useful results and the AI helps with alignment work itself.

2.3.1 Scheming?

Even if the AI is scheming, unless it is actively trying to takeover it would be hard for it to succeed in its plans. For example you can get the AI to design other AI’s with different architectures suited to interpretability and optimize them to similar capabilities as the original AI. Then proceed to use that AI for further work.

2.3.2 If we get safely to this point, does prior alignment work matter at all?

At the 10* stage, what you need is researchers that understand the issues, but are prepared to update rapidly on new results and use new tools. Existing models of what is dangerous will likely not be fully correct.

2.3.2 Avoiding racing is important—resist the urge to improve the AI

If the aim is to maximize the time where at least one AI lab is in this situation, then a race situation is the worst situation to be in. The lab or group of labs should have and believe they have a period of at least 1 year, preferable 2-3. Then they can resist the temptation to just have the AI continue to optimize itself to stay ahead. A major research lab/​group that achieved such an AI would be compute constrained – more researchers would not help as they would not have access to such AI. Only a researcher with enough AI/​compute to be 10* would be very useful.

2.3.3 Time required

Because of the speed-up enabled by AI, you will get to diminishing returns much faster. Just 1 year could well be enough and 10 years would be more than optimal to figure out how to align super AI, or at least have confidence to more to the next step in the unlikely even it is required (I expect a 10* AI would know how to create an aligned superintelligence or at least one that was aligned using the available computation with maximum efficiency.)

2.4 What is the ideal fantasy way to get to superintelligence?

If we accept that superintelligence is inevitable at some stage, what is the best or most natural path to get there if we were not constrained by reality? Its clearly not making an AI that we don’t fully understand. One way would be if everyone’s IQ increased by 1 point per year, from a fixed date. This would share the gains evenly among everyone alive. (Children born 20 years later start +20) However that would cause large disruption as people got unsatisfied with their careers.

Another way is if each generation was 20 IQ smarter than the last. That may not be that disruptive as parents routinely cope with smarter children. Finally you could extend the human lifespan to say 500 years and view the first 50 as like some kind of childhood, where IQ then steadily increases after that. Some sci-fi has people becoming uploads later in life.

In terms of what is possible for us, whole brain emulation or mind uploading seems the most physically possible. It seems desired and likely as part of a post Singularity society. To me it would be desirable to just go to WBE without superintelligence first, however that is less likely to be possible for us with the current tech and geopolitical environment.

The plan for TAI should first be the alignment of a mildly superintelligent system, then optimize to physical limits, ensure some geo-political stability, install defenses against anticipated attacks, then pursue WBE asap.

2.5 Past outcomes

In Superintelligence I think Bostrom says somewhere that if he knew the formula for intelligence, he wouldn’t disclose it because of alignment dangers. However I definitely would if I had lived in 2005 and knew such a formula. I think at that stage we would be have been constrained by computational power and there would be no dangerous overhang. In a world where we were compute constrained we would have more time to detect and adapt to misalignment, and scheming etc. Specifically by the formula for intelligence I mean the neural code or a system as efficient and adaptable as biology.

2.6 Prediction

I expect an AGI that can 10* AI researcher output by 2028-2032. I believe the current architecture can’t scale to that level (75%) but may help discover the new architectures by suggesting experiments and studying biology and the neural code. I believe there is a good chance a much better architecture is discovered by more directly studying biology.