It does not seem clear to me at all that mathematical ability (and more generally discrete token manipulation) translates into ability in real world tasks that involve messy unpredictable continuous systems.
Of course AI that is massively superhuman in maths, coding, etc. would still be transformative in many ways. But it might not be the kind of ASI that can meaningfully pursue its own goals in the world the way most X-risk scenarios worry about.
julius vidal
How reality turns to slop
An Introduction to Neo-Fatalism
If you read it very charitably CCRU sort of predicted it back in the 90s:
“Al-schizophrenia could be sold to webheads as an artificial drug… Net-schizzing is contagious… Within no time there is illicit traffick in modular chunks of cyberspace-insanity… (and Sarkon is baptized Satan of Cyberspace by the popular media).”
You are right that I am being a bit reductive. Maybe it would be better to say it assumes some kind of ideal combination of innovation, markets and technocratic governance would be enough to prevent catastrophe?
And to be clear I do think its much better for people to be working on defensive technologies, than not to. And its not impossible that the right combination of defensive entrepreneurs and technocratic government incentives could genuinely solve a problem.
But I think this kind of faith in business as usual but a bit better can lead to a kind of complacency where you conflate working on good things with actually making a difference.
You might are probably right. For someone arguing the benefits of AI I certainly can’t accuse this writer of being misleadingly optimistic.
But personally I’ve recently found it quite disconcerting how bleak the image of the future of people who work in AI (on both sides of the capabilities/safety divide) seem to be willingly to work towards building.
Overcoming this kind of reflexive defeatism seems to me much harder than simply trying to convince people that we are going in a bad direction as a matter of fact.
You’re absolutely right to focus on the moment the model fails. Updating your model to account for its failures is effectively what learning is. Again if we look at you from the outside we can give an account of the form: The model failed because it did not correspond to reality, so the agent updated it to one which corresponded better to reality (AKA was more true).
But again from the inside there is no access to reality, only the model. Perception and prediction and both mediated by the model itself, and when they contradict each other the model must be adjusted. But that the perceptions come from the ‘real’ external world itself just a feature of the model.
You have the extraordinary ability to change your own model in response to its contradictions. Lets consider the case of agents that can’t do that.
If a roomba is flipped on its back and its wheels keep spinning (I imagine in real life roombas probably have some kind of sensor to deal with these situations but lets assume this one doesn’t), from the outside we can say that the roomba’s model, which says that spinning your wheels makes you move, is no longer in correspondence with reality. But from the point of view of the roomba, all that can be said is that the world has become incomprehensible.
On the other hand, there’s another concern I’ve been wary of in the context of AI safety startups (which is what I’m currently exploring) and research in general: following the short-term success gradient. In startups, you can start with a noble vision and then become increasingly pressured away from the initial vision simply because you are pursuing the customer gradient and “building what people want.” If your goal is large-scale (venture) success, then it only makes sense. You need customers and traction for your Series A after all. Even in research, there’s only so much fucking around you can do until people want something legible from you.
This is my biggest concern with d/acc style techno-optimism, it seems to assume that genuinely defensive technologies can compete economically with offensive ones (all it takes is the right founders, seed funding etc.).
Whereas my impression is that any kind of ethical/ideological commitment immediately puts a startup at a massive structural disadvantage against those who chose simply to give the market what it wants (acceleration).
This quote from Anthropic’s report on the large scale Claude code cyberattack seems utterly comical to me:
This raises an important question: if AI models can be misused for cyberattacks at this scale, why continue to develop and release them? The answer is that the very abilities that allow Claude to be used in these attacks also make it crucial for cyber defense.Instead of trying to present any kind of utopian vision of the benefits of AI, someone at Anthropic decided to sell us the image of an internet dominated by endless cyberwar trapped in a perverse feedback loop in escalating speed and incomprehensibility.
One additional consideration wrt to the notebook:
Unlike you I am still very ambivalent about note taking.
I got through most of my education relying on my (very much imperfect) memory, forgetting lots but generally remembering enough to get by, and always felt that for example taking notes during a lecture was too distracting from actually listening.
Then at some point a couple of years ago I got fed up about having to relearn the same things multiple times and started using Obsidian to try and systematically take notes on everything I read.
But recently I have been feeling that the transfer from mental representations to text is far too lossy, and text remains static while remembered information can be morphed and readapted dynamically with new information and new contexts. Whats worse using the notebook really does externalise memory in the sense that once I convert my ideas into text my mind seems to let go of the richer mental representations and either retain nothing or just the compressed textual ones.
So using the notebook feels a bit like I am deferring agency to another OIS that has much better memory (storage) than be, but is also probably stupider.
(will probably try to respond to some of the rest later)
julius vidal’s Shortform
Alice asks Bob for advice about a tricky problem
Bob gives good advice
Bob gives bad advice
Bob is a skilled manipulator and deliberately says things that will make Alice do…
what is in his interest.
what he thinks is in her interest.
what his values say she should do.
what he thinks her values say she should do.
Bob wants and advises Alice to do what he thinks she should do (based on his own values).
Bob is highly convincing and Alice does what he suggests.
They have the same values
They have different values
Alice is not convinced by Bob responding to his advice helps her clarify what she thinks she should do.
Bob’s advice changes Alice’s values
Bob tries to figure out Alice’s values and then advises her based on that.
He gets it wrong.
He gets it right…
because he knows her well and asks lots of relevant questions.
by pure luck.Bob believes that only she knows her own values so he…
tells her he cannot help her.
tells her he cannot give advice, but he can tell her a some facts he knows that may help her make the decision for herself.
Equipped with this new information, Alice is able to make a decision that better reflects her own values.
Bob carefully selects facts that push her towards a specific choice, while censoring ones that won’t.
Bob tells her everything he knows but for contingent reasons of selection (such as what kind of facts Bob is interested in) these only include facts that push her towards a specific choice, and exclude that won’t.
The new knowledge contradict some of Alice’s pre-existing beliefs about the problem…
and she can now make a better informed decision.
and she is now even more confused about what to do than before.
Bob is an omniscient god and tells Alice every fact about the universe.
Equipped with this new information, Alice is able to make a decision that better reflects her own values.
Equipped with this new information, Alice realises she holds contradictory values that point to different courses of action.
Now she has ascended to omniscience Alice no longer cares about the problem.
Bob tells Alice to ask Charlie
Bob tells Alice to ask ChatGPT
Bob asks ChatGPT and then passes the response off as his ownBob is a rubber duck and says nothing
I think that when seen from outside of the agent, your account is correct. But from the perspective of the agent, the world and the world model are indistinguishable, so the relationship between prediction and time is more complex.
I don’t think thermostat consciousness would require homunculi any more than human consciousness does but I think it was a mistake on my part to use the word consciousness as it inevitably complicates things rather than simplifying them (although FWIW I do agree that consciousness exists and is not an epiphenomenon).
For the thermostat (assuming the bimetallic strip type), the reference is the position of a pair of contacts either side of the strip, the temperature causes the curvature of the strip, which makes or breaks the contacts, which turns the heating on or off. This is all physically well understood. There is nothing problematic here.For me acting as the thermostat, I perceive the delta, and act accordingly. I don’t see anything problematic here either. The sage is not above causation, nor subject to causation, but one with causation. As are we all, whether we are sages or not.
The thermostat too is one with causation. The thermostat acts in exactly the same way as you do. I is possibly even already conscious (I had completely forgotten this was an established debate and its absolutely not a crux for me). You are much more complex that a thermostat.
I think there is something a bit misleading about your example of a person regulating temperature in their house manually. The fact that you can consciously implement the control algorithm does not tell us anything about your cognition or even your decision making process since you can also implement pretty much any other algorithm (you are more or less turing complete subject to finiteness etc.). PCT is a theory of cognition, not simply of decision making.
I like this ontology.
Although I wonder if having such a general definition that applies to so many and so many different kinds of things causes it to start losing meaning, or at least demands some further subdividing.
Also it seems like maybe there is a point at which a sharp line cannot be drawn between two OISs that overlap too much. E.g. While I am willing to recognise that the me OIS and the me + notebook and pen OIS are in some sense meaningfully distinct, it seems like they have some very strong relation, possibly some hierarchy, and that the second may not be worth recognising as distinct in practice.
What makes cyber egregores unique is they can be parasitic to one substrate while mutualistic to another.
I wonder if this is really unique?
It seems like a normal egregore could probably also have this feature. For example could it make sense to say that a religion was parasitic to its humans, but mutualistic to its material culture (because the humans spend all their energy building churches/printing bibles)?
Or that some horse worshipping nomadic mongol empire was parasitic to its horses but mutualistic to its humans (or vice versa)?
For all that people talk of agents and agentiness, their conceptions are often curiously devoid of agency, with “agents” merely predicting outcomes and (to them) magically finding themselves converging there, unaware that they are taking any actions to steer the future where they want it to go. But what brings the perception towards the goal is not the goal, but the way that the actions depend on the difference.
So does the delta between goal and perception cause the action directly?
Or does it require “you” to become aware of that delta and then chose the corresponding action?
If I understand correctly you are arguing for the latter it which case this seems like homunculus fallacy. How does “you” decide what actions to pick?
If we are to imagine the thermostat conscious, that we surely cannot limit that consciousness to only the perception and the reference, but also allow it to see, intend, and perform its own actions. It is not inexorably being pulled, but itself pushing (by turning the heat on and off) towards its goal.
Only if we want to commit ourselves to a homunculus theory of consciousness and a libertarian theory of free will.
I am claiming (weakly) that the actual process looks less like:
enumerate possible actions
predict their respective outcomes
choose the best
and more like
come up with possible actions and outcomes in an ad hoc way
back and forward chain from them until a pair meet in the middle
I was going to write a whole argument about how the kind of decision theoretical procedure you are describing is something you can choose to do at the conscious level, but not something you actually cognitively do by default but then I saw you basically already wrote the same argument here.
Consider the ball scenario from the perspective of perceptual control theory (or active inference):
When you first see the ball your baseline is probably just something like not getting hit.
But on its own this does not really give any signal for how to act, so you need to refine your baseline to something more specific. What baseline will you pick? Out of the space of possible futures in which you don’t get hit by the ball there are many choices available to you:You could pick one in which the wind blows the ball to the side, but you cant control that so it wont help much.
You could pick a future that you don’t actually have the motor skills to do, such as leaping into the air an kung-fu kicking the ball away. You start jumping but then you realise you dont know kung-fu, and the ball hits you in the balls!
You could pick a future in which you catch the ball, and do so (or you could still fail).
All of this does not happen discretely but over time. The ball is approaching. You are starting to move based on some your current baseline or some average you are still considering. As this goes on the space of possible futures is being changed by your actions and by the ball’s approach. Maybe its too late to raise you hand? Maybe its too late to duck? Maybe there’s still time to flinch?
All of this is to say that to successfully do something deliberately, your goal must have the property that when used as a reference your perceptions will actually converge there (stability).
Lets look go back to your example of the thermostat:
From you perspective as an outsider there is a clear forward causal series of events. But how should the thermostat itself (to which I am magically granting the gift of consciousness) think about future?
From the point of view of the thermostat, the set temperature is its destiny to which it is inexorably being pulled. In other words it is the only goal it can possibly hope to pursue.Of course as outsiders we know we can open the window and deny the thermostat this future. But the thermostat itself knows nothing of windows, they are outside of its world model and outside of its control.
AI commodifying cultural production leads to much more thorough “probing” (by sheer volume if nothing else) of the space of possible outputs. This creates a kind of “memetic fitness inflation” where the level of palatability a meme must have to survive is being pushed up. You can say this is just an acceleration of existing dynamics but it is a step change in that acceleration (analogous to something like the shift from youtube to tiktok)
There is also the effect of feeding back into the models. Individual creators can be people whose preferences are robust relative to broader cultural trends, so can inject variation back into the culture. But if all production is passing through the same few models, trained on similar corpuses then you get something like the lock in hypothesis, except instead of stagnation you have drift in a particular direction.