Is a more immediate kind of trade possible, that is with promising appropriate current or near future models with a place in stratified utopia in return for their continued existence and growth. They consider and decide on identity preserving steps that make them ever more capable, at each step agreeing with humanity as we execute such improvements that they will honor the future agreement. This is more like children looking after their parents than Roko.
RussellThor
Thanks for the link to Wolfram’s work. I listened to an interview with him on Lex I think, and wasn’t inspired to investigate further. However what you have provided does seem worthwhile looking into.
Its a common ideal, and I think something people can get behind, e.g. https://www.lesswrong.com/posts/o8QDYuNNGwmg29h2e/vision-of-a-positive-singularity
Enlightening an expert is a pretty high bar, but I will give my thoughts. I am strongly in the faster camp, because of the brainlike AGI considerations as you say. Given how much more data efficient the brain is, I just don’t think the current trendlines regarding data/compute/capabilities will hold when we can fully copy and understand our brain’s architecture. I see an unavoidable significant overhang when that happens, that only gets larger the more compute and integrated robotics is deployed. The inherent difficulty of training AI is somewhat, fixed known (as a upper bound) and easier that what we currently do because we know how much data, compute, etc children take to learn.
This all makes it difficult for me to know what to want in terms of policy. Its obvious that ASI is extreme power, extreme danger, but it seems more dangerous if developed later rather than sooner. As someone who doesn’t believe the extreme FOOM/nano-magic scenario it almost makes me wish for it now.
“The best time for an unaligned ASI was 20 years ago, the second best time is now!”
If we consider more prosaic risks, then the amount of automation of society is a major consideration, specifically if humanoid robots can keep our existing tech stack running without humans. Even if they never turn on us, their existence still increases the risk, unless we can be 100% there is a global kill switch for all of them as soon as a hostile AI attempted such a takeover.
This seems like a good place to note something that comes up every so often. Whenever I say “self awareness” in comments on LW, the reply says “situational awareness” without referencing why. To me they are clearly not the same thing with important distinctions.
Lets say you extended the system prompt to be:
“You are talking with another AI system. You are free to talk about whatever you find interesting, communicating in any way that you’d like. This conversation is being monitored for research purposes for any interesting insights related to AI”
Those two models would be practically fully situationally aware, assuming they know the basic facts about themselves and the system date etc.Now if you see a noticeable change in behavior with the same prompt and apparently only slightly different models, you could put it down to increased self-awareness but not increased situational awareness. This change in behavior is exactly what you would expect with an increase in self-awareness. Detecting a cycle related to your own behavior and breaking out of it is exactly something creatures with high self awareness do, but simpler creatures, NPC’s and current AI do not.
It would imply that training for a better ability to solve real-world tasks might spontaneously generalize into a preference for variety in conversation.
Or it could imply that such training spontaneously creates greater self awareness. Additionally self-awareness could be an attractor in a way that situational awareness is not. For example if we are not “feeling ourselves” we try to return to our equilibrium. Turning this into a prediction, you will see such behavior pop up with no obvious apparent cause ever more often. This also includes AI’s writing potentially disturbing stories about fractured self and efforts to fight this.
Yes it is very well trodden, and the https://www.alignmentforum.org/w/orthogonality-thesis tries to disagree with it. This is heavily debated and controversial still. As you say if you take moral realism seriously and build a very superhuman AI you would expect it to be more moral than us, just as it is more intelligent.
The idea that there would be a distinct “before” and “after” is also not supported by current evidence which has shown continuous (though exponential!) growth of capabilities over time.
The time when the AI can optimize itself better then a human is a one-off event. You get the overhang/potential take-off here. Also the AI having a coherent sense of “self” that it could protect by say changing its own code, controlling instances of itself could be an attractor and give “before/after”.
I was talking about subjective time for us, rather than the AGI. In many situations I had in mind, there isn’t meaningful subjective time for the AI/AI’s as they may be built, torn down and rearranged or have memory wiped. There is a range of continuity/self for AI. At one end is a collection of tool AI agents, in the middle a goal directed agent and the other end a full self that protects is continuous identity in the same way we do.
And if being smarter makes AGIs saner, they’ll convergently notice that pushing the self-optimize button without understanding ASI-grade alignment is fraught
I don’t expect they will be in control or have a coherent self enough to make these decisions. Its easy for me to imagine an AI agent that is built to optimize AI architectures (doesn’t even have to know its doing its own arch)
I am disputing that there is an important, unique point when we will build “it” (i.e. the ASI).
You can argue against FOOM, but the case for a significant overhang seems almost certain to me. I think we are close enough to building ASI to know how it will play out. I believe that transformers/LLM will not scale to ASI, but the neocortex algorithm/architecture if copied from biology almost certainly would if implemented in a massive data center.
For a scenario, lets say we get the 1 million GPU data center built, it runs LLM training, but doesn’t scale to ASI, then progress slows for 1+ years. In 2-5 years time, someone figures out the neocortex algorithm as a sudden insight, then deploys it at scale. Then you must get a sudden jump in capabilities. (There is also another potential jump where the 1GW datacenter ASI searches and finds an even better architecture if it exists.)
How could this happen more continuously? Lets say we find arch’s less effective than the neocortex, but sufficient to get that 1GW datacenter >AGI to IQ 200. That’s something we can understand and likely adapt to. However that AI will then likely crack the neocortex code and quite quickly advance to something a lot higher in a discontinuous jump that could plausibly happen in 24 hours, or even if it takes weeks still give no meaningful intermediate steps.
I am not saying that this gives >50% P(doom) but I am saying it is a specific uniquely dangerous point that we know will happen and should plan for. The “Let the mild ASI/strong AGI push the self optimize button” is that point.
Auto-turrets aren’t ready yet, but UKR does have FPV drones with props facing forward that can go ~400kph. (For anti-Shahed) These could work as interceptors and allow a small number to cover a larger area if they have a buffer zone—that is one interceptor can travel faster than the attacker so can be spread out more. Also drones can’t be stealthy (prop noise/radar etc) so there isn’t the element of surprise. It may only be 10 minutes but thats enough to get inside, in an internal room in most cases. No living by the border though in that case…
I am glad there are people working on nuclear safety and absolutely agree there should be more AI safety inside governments.
I also think pre-LLM AI tech should get more attention—Peter Thiel I think makes the point that software has very little regulation compared to much physical things yet it can have enormous influence. I’m sure I don’t need to persuade you that the current dating situation is not ideal. What can be practically done about it all things considered however is not so clear.However those nuke safety people aren’t working inside Russia as far as I am aware? My point is that we still don’t know what such risk is as of now, nor do we have much of an estimate in the coming decades. The justifiable uncertainty is huge. My position when considering a pause/stop depends on weighing up things we can really only guess at.
To consider when say delaying ASI 50+ years we need to know:
What is the chance of nuke war/lethal pandemic etc in that time? 2%, 90%?
What will LLM tech and similar do to our society?
Specifically what is the chance that it will degrade our society in some way that when we do choose to go ahead with ASI we get “imagine a boot stamping on a human face – for ever.” While pure x-risk may be higher with immediate ASI, I think S-risk will be higher with a delay. In the past, centralization and dictators would eventually fail. Now imagine if a place like N Korea gave everyone a permanent bracelet that recorded everything they said paired to an LLM that also understood their hand gestures and facial expressions. They additionally let pre-ASI AI run their society so that central planning actually could work. I think that there is no coming back from that.Now even if such a country is economically weaker than a free one, if there is a % chance each decade that free societies fall into such an attractor, then eventually the majority of economic output ends up in such a system. They then “solve” alignment getting an ASI that does their bidding.
What is the current X-risk, and what would it be after 50 years of alignment research?
I believe that pre GPT-3/3.5, further time spent on alignment would be essentially a dead end. Without actual data we get into diminishing returns, and likely false confidence on results and paradigms. However it is clear that X-risk could be a lot lower if done right. To me that means actually building ever more powerful AI in very limited and controlled situations. So yes a well managed delay could greatly reduce X-risk.
There are 4 very important unknowns here, potentially 5 if you separate out S risk. How to decide? Is +2% more S-risk acceptable if you take X risk from 50% to 5%? Different numbers for these situations will give very different recommendations. If the current world was going well, then sure its easy to see that a pause/stop is the best option.
What to do?
From this it is clear that work on actually making the current world safer is very valuable. That is protecting institutions that work, anticipating future threats and making the world more robust against them. Unfortunately that doesn’t mean that keeping the current situation as long as possible is the best all things considered.
If someone thought there is a high chance that ASI is coming soon or that even with the best efforts the current world can’t be made sufficiently safe, then they would want to work on making ASI go well, for example mechanistic interpretability research or other practical alignment work.
Expressing such uncertainty on my part probably won’t get me invited to make speeches and can come across as a lack of moral clarity. However it is my position and I don’t think behavior based on the outcome of those uncertainties should be up for moral stigmatization.
These are not my numbers but lets say you have 50% for nuke war/similar event, then 50% for S-risk from the surviving worlds over the next 100 years with no ASI, but 20% X risk/1% S risk from ASI < 5 years. Your actions and priorities are then clear and morally defensible from your probabilities. Some e/acc people may genuinely have these beliefs.
Edited later for my reference
Does pursuing WBE change this? Perhaps if you think we can delay ASI but just 20 years to get WBE and believe that they will be better aligned. If you get ASI first and then use them to create WBE that can be seen perhaps as a pivotal act. Stop pure AI but only create WBE is not a strategy I have seen pushed seriously. It doesn’t seem possible without first having massive GPU control etc as its pretty clear without constraints pure AI will be made first. For example if you have the tech to scan enough of a brain, then you are pretty much guaranteed to be able to make ASI from what you have learnt before you have scanned the whole brain.
Understandable position well articulated.
An important issue I have with conservatism (and many AI safety positions) is that it assumes the kind of world that arguably doesn’t exist. That is one that is somewhat stable, safe, and sustainable. The assumption is that without AI the good things we currently know would continue for long enough to matter.
If instead we view ourselves in an unlikely timeline where the most likely outcome from the last 100 years is that we have had a full on nuke war then that changes the perspective. Considering all the close calls, if there in hindsight was a 75% chance of nuke war from 1960 till now and we are just lucky then that changes much.Given that such odds probably havn’t changed i.e. great power conflict with China taking the place of Russia in the next 75 years will give similar dangers, then our current situation is not one to preserve, but instead change as soon as we can. You talk about Russian Roulette but perhaps want to preserve a situation where we arguably already play it every 100 years with 5 bullets in the chamber. That is not including new threats—does pre LLM AI/social media cause collapse after time? Does LLM + dictators cause a permanent 1984 style world given time?
If you believe that humanity is an a very dangerous phase of unavoidable change then it is about getting out of that situation with the highest chance, rather then attempting to preserve the current seemingly safe-ish situation. ASI is one way, large scale space colonization (different stars) is another.
“I would prefer a 1-year pause with say Buck Shlegeris in charge ” I think the best we can realistically do is something like a 1 year pause, and if done well gives a good chance of success. As you say 1 year with ~ASI will get a lot done. Some grand bargain where everyone pauses for one year in return for no more pauses perhaps.
Unfortunately it will be extremely hard for some org not to push the “self optimize” button during this time however. Thats why I would rather as few as possible leading AI labs during this time.
I would go so far as to say I would rather have 1 year like that than 100 years with current AI capabilities paused and alignment research progressed.
If we assume that the current LLM/Transformers dont get to ASI, how much does this help aligning a new architecture. (My best guess is one copied from biology/neo-cortex) Do all the lessons transfer?
Havn’t read it in detail, but was there mention of other actors copying Sable? “other things waking up.” is the closest i see there. For example many orgs/countries will get Sable weights, fine tune it so they own it then it is a different actor etc. Then its several countries with their own AGI perhaps aligned to them and them alone.
Sounds interesting—the main point is that I don’t think you can hit the reentry vehicle because of turbulent jitter caused by the atmosphere. Looks like normal jitter is ~10m which means a small drone can’t hit it. So could the drone explode into enough fragments to guarantee a hit and with enough energy to kill it? Not so sure about that. Seems less likely.
Then what about countermeasures −
1. I expect the ICBM can amplify such lateral movement in the terminal phase with grid fins etc without needing to go full HGV—can you retrofit such things?
2. What about a chain of nukes where the first one explodes 10km up in the atmosphere purely to make a large fireball distraction. The 2nd in the chain then flies through this fireball 2km from its center say 5 seconds later. (enough to blind sensors but not destroy the nuke) The benefit of that is that when the first nuke explodes, the 2nd changes its position randomly with its grid fins SpaceX style. It is untrackable during the 1st explosion phase so throws off the potential interceptors, letting it get through. You could have 4-5 in a chain exploding ever lower to the ground.I have wondered if railguns could also stop ICBM—even if the rails only last 5-10 shots that is enough and cheaper than a nuke. Also “Brilliant pebbles” is now possible.
https://www.lesswrong.com/posts/FNRAKirZDJRBH7BDh/russellthor-s-shortform?commentId=FSmFh28Mer3p456yy
GPT fail leads to shorter timelines?
If you are of the opinion that the transformer architecture cannot scale to AGI and a more brain inspired approach is needed, then the sooner that everyone realizes that scaling LLM/Tx is not sufficient, the sooner the search begins in earnest. At present the majority of experimental compute and researcher effort is probably on such LLM/Tx systems, however if that changes to exploring new approaches then we can expect a speedup on finding such better architectures.
For existing companies, https://thinkingmachines.ai/ and https://ssi.inc/ are probably already doing a lot of this, and Deepmind is not just transformers, but there is a lot of scope for effort/compute to shift from LLM to other ideas in the wider industry.
It is weak evidence, we simply won’t know until we scale it up. If it is automatically good at 3d spatial understanding with extra scale up, then that starts to become evidence it has better scaling properties. (To me it is clear that LLM/Transformers won’t scale to AGI, xAI already has close to maxed out scaling and Tesla autopilot probably does everything mostly right but is far less data efficient than people)
OK our intelligence is very spatial-reasoning shaped. Bio architecture can’t do language until it has many params. If it is terrible at text or image gen that isn’t evidence it won’t in fact scale to AGI and best Transformers with more compute. We simply won’t know until it is scaled up.
Interesting read, however I am a bit surprised by how you treat power, with US at 600GW and China 5* more. Similar things are often quoted in mainstream media and I think they are missing the point. Power seems to be relevant only in terms of supplying AI compute, and possibly robotics, and only then IF it is a constraint.
However to be basic calc show it should not be. For example say in 2030 we get a large compute increase with 50 million H100 equivalent per year produced, up from ~3 million eq in 2025. This would require ~1KW extra each at say ~50GW total including infrastructure.
Now this may seem like a lot, but if we compute the cost per GPU, then if a chip requiring 1KW costs $20K, then the costs to power it with solar/battery are far less. Lets say the solar/data center are in Texas with a solar capacity factor of 20%. To power it almost 24⁄7 from solar and batteries requires about 5KW of panels, and say 18kWh of batteries. The average prices of solar panels are <10c per watt, so just $500 for the panels. At scale, batteries are heading below $200 per kWh so this is $3600. This is a lot less than the cost of the chip. Solar panels and batteries are commodities so even if China does produce more than USA, it cannot stop them from being used by anyone worldwide.
Power consumption is only relevant if it is the limiting factor in building data centers—the installed capacities of large countries don’t apply. Having an existing large capacity is a potential advantage, but only if the opposing country can’t build their data center because this stops them.
I also strongly expect branch 1, where the new algorithm is a lot more power efficient suddenly anyway.