RussellThor comments on My talk on AI risks at the National Conservatism conference last week

RussellThor 12 Sep 2025 22:47 UTC
1 point
0
I am glad there are people working on nuclear safety and absolutely agree there should be more AI safety inside governments.
I also think pre-LLM AI tech should get more attention—Peter Thiel I think makes the point that software has very little regulation compared to much physical things yet it can have enormous influence. I’m sure I don’t need to persuade you that the current dating situation is not ideal. What can be practically done about it all things considered however is not so clear.
However those nuke safety people aren’t working inside Russia as far as I am aware? My point is that we still don’t know what such risk is as of now, nor do we have much of an estimate in the coming decades. The justifiable uncertainty is huge. My position when considering a pause/stop depends on weighing up things we can really only guess at.
To consider when say delaying ASI 50+ years we need to know:
What is the chance of nuke war/lethal pandemic etc in that time? 2%, 90%?
What will LLM tech and similar do to our society?
Specifically what is the chance that it will degrade our society in some way that when we do choose to go ahead with ASI we get “imagine a boot stamping on a human face – for ever.” While pure x-risk may be higher with immediate ASI, I think S-risk will be higher with a delay. In the past, centralization and dictators would eventually fail. Now imagine if a place like N Korea gave everyone a permanent bracelet that recorded everything they said paired to an LLM that also understood their hand gestures and facial expressions. They additionally let pre-ASI AI run their society so that central planning actually could work. I think that there is no coming back from that.
Now even if such a country is economically weaker than a free one, if there is a % chance each decade that free societies fall into such an attractor, then eventually the majority of economic output ends up in such a system. They then “solve” alignment getting an ASI that does their bidding.
What is the current X-risk, and what would it be after 50 years of alignment research?
I believe that pre GPT-3/3.5, further time spent on alignment would be essentially a dead end. Without actual data we get into diminishing returns, and likely false confidence on results and paradigms. However it is clear that X-risk could be a lot lower if done right. To me that means actually building ever more powerful AI in very limited and controlled situations. So yes a well managed delay could greatly reduce X-risk.
There are 4 very important unknowns here, potentially 5 if you separate out S risk. How to decide? Is +2% more S-risk acceptable if you take X risk from 50% to 5%? Different numbers for these situations will give very different recommendations. If the current world was going well, then sure its easy to see that a pause/stop is the best option.
What to do?
From this it is clear that work on actually making the current world safer is very valuable. That is protecting institutions that work, anticipating future threats and making the world more robust against them. Unfortunately that doesn’t mean that keeping the current situation as long as possible is the best all things considered.
If someone thought there is a high chance that ASI is coming soon or that even with the best efforts the current world can’t be made sufficiently safe, then they would want to work on making ASI go well, for example mechanistic interpretability research or other practical alignment work.
Expressing such uncertainty on my part probably won’t get me invited to make speeches and can come across as a lack of moral clarity. However it is my position and I don’t think behavior based on the outcome of those uncertainties should be up for moral stigmatization.
These are not my numbers but lets say you have 50% for nuke war/similar event, then 50% for S-risk from the surviving worlds over the next 100 years with no ASI, but 20% X risk/1% S risk from ASI < 5 years. Your actions and priorities are then clear and morally defensible from your probabilities. Some e/acc people may genuinely have these beliefs.
Edited later for my reference
Does pursuing WBE change this? Perhaps if you think we can delay ASI but just 20 years to get WBE and believe that they will be better aligned. If you get ASI first and then use them to create WBE that can be seen perhaps as a pivotal act. Stop pure AI but only create WBE is not a strategy I have seen pushed seriously. It doesn’t seem possible without first having massive GPU control etc as its pretty clear without constraints pure AI will be made first. For example if you have the tech to scan enough of a brain, then you are pretty much guaranteed to be able to make ASI from what you have learnt before you have scanned the whole brain.