RussellThor comments on Beliefs and state of mind into 2025

RussellThor 3 Jul 2025 11:10 UTC
2 points
0
July 2025 update
My core view hasn’t changed much since this post. AI has continued to progress, but not out of line with what I would expect if LLM’s were going to plateau. I have found OpenAI O3 model very useful now at writing in seconds functions that could take me 10-60 minutes, in line with task completion 50% metrics.
I want to put some further ideas and predictions down. Firstly I may explore more a world constrained by TEPS/brain architecture as opposed to FLOPS
https://www.lesswrong.com/posts/yew6zFWAKG4AGs3Wk/?commentId=mWGxpDb5chtyTGx8k
During this time I spent more time learning the details of AI more, as a result was more convinced it wasn’t just FLOPS and SW cannot make up for HW topology.
I also just re-read https://www.lesswrong.com/posts/pdaGN6pQyQarFHXF4/reward-is-not-the-optimization-target and mostly agree with the big picture. However I am not sure how relevant it will be—I expect a major way an ASI changes it values is by self-reflection and once such an AI gets this advance I don’t think we can predict how RL will go when it is trying to influence the process.
Ideas are that in that case if there is a limited “intelligence” budget then companies and countries may toss up between spending it on may human level AGI, some small number of larger, slower ASI. Will those ASI try to do things like solve aging or be entirely on getting ahead by designing better chip manufacturing? In this case existing chip manufacturing is very valuable, Taiwan especially strategic. That would certainly be a type of slow takeoff, and could also go backwards if international strife reduced worldwide chipmaking capacity.
Moral Realism and Orthogonality thesis
The strong OT has never seemed correct to me, with the point about the vast space of potential minds seeming clearly a wrong way of looking at the problem. I have also found the reasoning about ASI and human values a bit self contradictory. e.g. ASI is superintelligent but our values are still better? Which leads to Moral Realism.
MR could be more empirically as the extent to which different creatures values converge as they are raised to Superintelligence. Consider the following process—a biological creature becomes superintelligent somehow and reflects on its values. It realizes that if evolution went differently for it, then it may have different values. It would be interested in finding that out. e.g. say running simulated evolution of many different creatures, raising them to SI and seeing how their values differed. If all such creatures, simulated and original decided their values should be the same, then they would converge. The extent to which such values converge with ever greater self reflection and experimentation is a measure of MR. If there remain many different value systems, then that is less MR, if they all converge, then that is strong MR. As much as being a philosophical position MR is a prediction that all manner of intelligences values will converge as they become SI.
This doesn’t make SI safe, especially in a slow take-off as a weak SI could still self reflect and come up with values very undesirable to us.
The effect of Pre AGI on the world
One major reason I am not in favor of a pause right now is that I am not convinced it is safe. Its not like we are in a safe world, we could even have had >50% chance of nuclear war up to this date. That risk remains, and a further one is the affect of pre-AGI AI is having. This includes both polarization, and centralization/totalitarianism. 1984 can be done ever more effectively as AI advances. I am not sure if this is temporary or start of a permanent downward trend—https://www.globalexpressionreport.org/ If we do lose democracy etc, then I expect future govts to make very bad choices with AI, S-Risk even worse then X-Risk potentially.
Measuring and controlling the level of AI Self awareness
There doesn’t seem to be much research into that yet—for example what architecture is essential for self awareness (I don’t mean situational awareness). Can we tell based on the AI circuits and turn it up/down? It seems like we should be doing that so we know more about what the AI is thinking.
P(Doom) and rationality
I don’t think you can rationally defend a P(Doom) figure for AI. Mine is about 10-20% though that is more of a feeling and I believe that the chance of humans/descendants becoming interstellar isn’t changed that much by SI existing. AI just compresses the risk into a smaller timeframe.
I also get the feeling that we promote rationality to its “Level of Incompetence” that is we use it most when we can be least sure of ourselves. I got that strong impression from the Rootclaim COVID origins debate. Both sides were attempting to be rational and be Bayesian, but gave extremely far apart estimates. Rootclaim would be very good at applying such techniques at situations were there was history, such as murder claims, but they spent most of their attention applying it to COVID origins which were a one-off and did not do well.