I have a similar view as @Charbel-Raphaël that the ‘hard problem’ can be dissolved rather than solved. In that sense again I am an illusionist. It’s not easy to dissolve right now, because of how little we understand consciousness, but I believe that solving the mapping problem precisely will give us enough conceptual leverage to understand why the hard problem is nonsensical.
vals tutor
Yes the thought experiment is meaningful based on the fact that they are the same. It serves to answer the question “why talk about qualia at all rather than just calling experiences sense data?”. Sense data is generally referred to with respect to a boundary, eg. a camera has sensors that perceive photons. And then there’s processing (eg. sensors’ electric signals are digitized and carried along).
If one can imagine keeping their sense data the same (input to eyes, and input from cones) yet switch the inner experience based on reworking the processing system, then the thing we’re talking about exists at the processing level and is not sense data, thus experience is not sense data.
Celene’s first question which cascaded to this post was ~”what should I do if I wasn’t sure if I was conscious?”. It seems here Celene is considering whether she ~uniquely is unconscious, is lacking a quality of experience most other people have. Fwiw I do think she is lacking a quality of experience many people have, which is the intuitive feeling that qualia are a thing separate from the world, but that particular feeling is not what consciousness is imo.
Imagine someone who misunderstood what weight was as a concept, didn’t find it meaningful or interesting, then went around claiming “I am weightless! I am weightless!”. Maybe they even found some definition of weight for which that’s true, and maybe they even talked to very confused people who couldn’t explain what weight is to their satisfaction. Nevertheless, they would not be weightless under the common understanding and usage people have for weight.
I think Celene is doing something in that category of mistake, though it is more understandable since qualia/consciousness are even more confusing than weight and have even more disagreement on definitions and pointers. It wouldn’t be wrong to think for a given definition she has that she isn’t conscious, but I think she’s wrong if she thinks she isn’t conscious while thinking others are.
> Uhh, they are part of the world, and as you just pointed to, they have brain states that can be modified, so they dislike their brains being in some states, right?
In those cases the experiences would be terminal values, with the brain states being instrumental values.
To illustrate the difference, consider a mind who knows of height and width, who is designing a rectangle. They want a rectangle of a particular heigh, but happen to be working in software which only allows squares, so to get the right height is equivalent to getting the width right. While they’re in the software, you can’t measure whether height or width is their terminal goal, but given another software which allow rectangles, you could realize they only cared about the height.
Similarly, there exist currently minds who value their experiences (eg. not experiencing suffering), but who’s only way to mediate that is affecting their brain states. If it were possible to just modify the experiences without affecting the brain states, then we could see the difference. Whether it is logically possible to separate experiences from physical states I am unsure of, and lean negative. But you could now see how someone who does believe they can be separated would be valuing the experiences and not the brain states.
Thanks for the link!
> Qualia are supposed to be special properties, in some hard-to-define way
I don’t myself defend qualia as having special properties separate from the physical/logical world, I am probably at least partially, if not fully, an illusionist in that sense. I think I’m speaking against a less-subtle-than-illusionism stance of “I don’t understand definitions/pointers to experiences, I can’t verify I have them, so I can correctly claim I’m unconscious”. There probably exist definitions of consciousness for which Denialist is unconscious, but for the reasonable ones of those I guess ~everyone is unconscious.
I agree! Consciousness/qualia is probably useful if it was selected for, and I’d assume that it notably is/finds a way of connecting different stimuli, and that the correlations in qualia space correspond to correlations in conceptual space. For this reason, and the fact that we’re trained from the same architecture with similar training data, I broadly believe that most humans on earth have pretty similar red qualia to each other, and that you in fact can’t just flip someone’s red qualia without affecting anger and other concepts in their mind.
I have written a response post, mainly pointing out that the denialist move here stems from failure to understand any of the existing coherent and useful meanings for qualia/experience.
I there explain and argue that qualia is a useful concept, notably to talk about preferences over qualia, as illustrated by the existence of artists who want to create particular experiences, eg. the experience of green. They don’t care about lightwave frequencies, they want to create the experience. There are multiple ways to get there, maybe with direct stimulation of eye cones, maybe with psychedelics, maybe with brain modification.
Understanding the above should clear some of the confusion as to why people talk about qualia. It’s useful.
Suffice it to say, a large advantage in AI capabilities would allow its creator, or the rogue AI, to perform an extremely low-cost, low-risk takeover of all other countries and actors in the world.
A simple “large advantage” is not enough to get low-cost low-risk takeover. I think most people would say that frontier models have a large advantage over open weight ones (eg. Claude Mythos compared to Kimi K2.6), but keeping this gap into the future would not allow the US to low-cost low-risk take over China.
What advantage would you need? I’d surmise AGI/ASI with at least a year ahead of anyone else. You’d probably need to cripple others without detection (otherwise face retaliation), and still pay a high cost of developping physical infrastructure to operate such takeoevers (Drones&robots).
This Feb 2026 survey of some AI safety leaders found median timelines of 2033 for the following definition of AGI
An AI system (or collection of systems) that can fully automate the vast majority (>90%) of roles in the 2025 economy. A job is fully automatable when machines could be built to carry out the job better and more cheaply than human workers. Think feasibility, not adoption.
It featured the following comment
“I think >10% of roles in the 2025 economy are either manual or otherwise require human-like bodies: construction, barbers, restaurant server, etc. If we restrict to knowledge workers (roughly, jobs that can be done on a laptop), these dates move even closer.”
On the current paradigm, AI capabilities progress on niche tasks and diffusion will be linked[1] and diffusion can go rather slowly even when tools are incredibly productivity enhancing, thus there could be an intuitively surprisingly large gap between automation of 50% human tasks[2] and 90% and 99%, true even if we restricted the prediction to computer work tasks.[3]
I’m 80%+ confident we get automated expert+ level coding and ml research by 2030, and that there will be a significant amount of low hanging fruit in software/algorithmic space to allow fast progress on all tasks for which we have data, but I believe generalisation will stay somewhat limited (very very far from “figure out gravity from a picture of a bent blade of grass, more like “when speaking to a human expert in a niche field, knows how to interview them over 10 to 100 hours to extract most important info and then be mostly autonomous on known tasks, but still needs feedback from reality to learn more”), aka ~human level generalisation at best up to 2031.
The combination of “need feedback from reality” and slow diffusion makes slower timelines to “superintelligence” (eg. better than all humans at 99.99%+ of 2026 tasks) surprisingly plausible (eg. 5 to 10 years between AGI and ASI, thus ASI by 2040). I guess without a pause/significant politically influenced slowdown, we’d 80%+ have ASI by 2040. I’d set my 50% for ASI around 2036.[4]
I think technical alignement for human level AGI is solvable and not even off track, thus the world will look fine/good in 2030 (few to zero severe power seeking and deceptive misalignment problems in deployment from Anthropic AI systems) but have high uncertainty about the “use ai to do ai safety work” plan allowing us to successfully know how to train aligned ASI within five years of that. Overall I place myself at 10% or less p(doom) from sharp left turn risks, but around 40% all things considered p(doom) by including gradual disempowerment/value drift and societal response.
- ^
We need people to be deploying the technology to gather the relevant data to train/learn from, because generalisation is limited and because lots of expert knowledge only exists in human minds and structures of human relationships right now.
- ^
Note I’m weighing by “meaningfully different task” rather than “frequency of task”. Given power law distributions most tasks might be “read email/slack, respond”, which computer use will know how to operate, but not be able to respond to intricacies of different work situations.
- ^
Because computer work often involves using domain expert knowledge to do the right things on the computer.
- ^
I haven’t researched robotics enough to know how fast we could produce and deploy 100 million humanoid robots worldwide which seems like an appropriate level of effort required to gather the required data.
- ^
+1 to this, it’s not obvious to me this would be utopia and I was surprised this is what someone described as their favorite depiction of utopia. Transhumanism is apparently not one position but a myriad of them. I am in many ways closer to enjoying The Culture’s version of utopia. I feel like this one’s focus on avoiding pain is somewhat naive as to the distinctions between pain and suffering.
But also yes, this is probably better than my median expectation for what will happen in the next 30 years.
Cool but too dismissive of the “clap if you can hear me” technique imo. That one has a strong advantage that it can be easily by people who’ve never heard of it, it’s very direct and simple.
But if you can introduce a group norm, my preferred (that I’ve used over a hundred times by now in many crowds) is raising arms and humming. It’s faster for the signal to spread when it’s also visual, and more fun for me. I generally do it as a “my arms slowly come up above me in a very slow clap” while humming, and everyone’s joined by the time I finish the clap
1. How do you think recursive self-improvement works in this model? Could this create an super exponential capability growth that create big gaps?
Assuming we haven’t failed the alignment step of any steps of our recursion[1], then each currently in power AI system have the same incentives to not produce a misaligned future system and will only transfer power once they’re sure it’s aligned. It is thus every actor (including AIs) being prudent that cause each layer to want to insure not too large of a gap to its successor. Each layer is responsible for going at a safe pace and would not want to uncontrollably recursively improve.[2]
at this level of intensity, they could definitely rise because of international rivalry for instance.
Yes that’s plausible to me, my claim is only that they could want to be much much more cautious than they currently are, but not that overall this cautiousness will prevail against very high pressures
- ^
Say humans have smarts 1, and that they can evaluate and align a 1.1x smarter being to robustly follow human values, then you can kick off a theoretical infinite chain of alignment.
- ^
I’m answering about “recursive improvement” and dropping the “self” because that’s the general case. If an agent thought “self” was actually a coherent thing and were aligned to self rather than to humanity, then they might do RSI, but that’d mean we failed step 1 of the recursion.
- ^
Sufficient for what? I’d agree it’s clearly insufficient for getting p(doom) < 1%, but plausibly fine for under 25%. [1]
(assuming my mentioned best available plan from an earlier response to Vladimir_Nesov “My implied best available plan for humanity is to create each successive superintelligence with sufficiently fewer resources that it could not takeover despite its mild efficiency advantage at using resources strategically. Thus, you can create and deploy misaligned superintelligence and not end up in the doom scenarios but get to try again.”
This is rather sparse and vague and I’d like to write more on this in the future, but it’s vaguely assume that Redwood agenda is implemented at all top labs at least semi competently)- ^
Of course I’d prefer if we lived in the world where we could get p(doom) << 1%, here I’m trying to disambiguate what goes wrong under a given plan.
- ^
First of all, you’ve accidentally messed up the link to Greenblatt’s plans for misalignment risk
Thank you, fixed!
it had Agent-3 obtain merely flimsy evidence of Agent-4 being misaligned
If you have flimsy evidence of X, then it’d lead to suspicion of X. Are you disagreeing with that characterization?
The authors also managed to create a footnote where they doubt that Agent-4 will even be caught.
In case it’s unclear, I’d have pretty high p(doom) if I thought AI labs will in fact be as reckless and irresponsible as they are in AI 2027 scenario. But I think it’s not that hard technically to catch misalignment in only-somewhat-more-capable agents when you’re using an oom more resources to catch it, and I think there will be significant efforts at such surveillance (eg. OpenAI is already monitoring 99%+ of internal AI use), with better tools and protocols being developed.
the CEOs have overruled them
CEOs should not have the power to overall the safety teams.
Then they would have to cause the USG to prevent idiots from xAI [...] from internally deploying their misaligned AIs
I support corporate governance, national and international governance that would indeed allow “preventing idiots” from “internally deploying their misaligned AIs”.
In 2022 Soares writes
My guess for how AI progress goes is that at some point, some team gets an AI that starts generalizing sufficiently well, sufficiently far outside of its training distribution, that it can gain mastery of fields like physics, bioengineering, and psychology, to a high enough degree that it more-or-less singlehandedly threatens the entire world
It’s unlikely that a new AI system would be able to “threaten the entire world” based on its mastery of physics etc if it were not substantially smarter than its predecessors. It would not have enough of an edge to take over, in a world already full of AI systems in place with only slightly lesser capabilities. Do you disagree that this scenario doesn’t require a substantial gap?
(Without a substantial gap, an AI system could try to start taking over but would presumably not have enough advantage to never be detected and then be stopped by the existing set of AI systems)
it makes the weaker AIs a general resource that doesn’t specifically protect humanity, but can be repurposed by superintelligence just as well for its own ends, once it’s more capable than humanity at wielding it.
I clarified that the new slightly more intelligent system would be deployed with fewer resources until we’re sure it’s aligned. And it’s only slightly more intelligent than what’s in place, so I don’t see why you’d think it could take over the previous AI systems, who are actively suspicious of it and monitoring it.
Re Yudkowsky’s post, it notably says
This giant historically unprecedented problem has many ordinary-world valid analogies. Like how you can’t determine if someone is trustworthy to handle a billion dollars by seeing how they handle ten dollars, even if it’s in fact the same person and they’re not getting much smarter, because they can think intelligently about whether it’s a good time to steal the money.
Yudkowsky does not engage with the many differences between evaluating AI systems and humans, which in fact make a lot of the problems here quite solvable, in particular under my assumption of no large capability gap. The amount of simulations and tests we can do on AI can allow us to know about them being aligned without hidden motives much better than we can for humans today (but also I wouldn’t lose hope at identifying if a human had ulterior motives given billions of dollars of resources and work to solve that). I think other people have already presented many of these differences in various AI control articles.
(I haven’t read the whole post again in full just to respond to your comment—if there are more important points you think are relevant to my argument here I’ll respond to any you highlight)
This is neither a good operationalization of “superintelligence” nor a crux for most models of doom.
Is it not a crux for “classic ai doom scenarios”?
I agree it’s not a crux for what should be currently highly rated models of doom, that’s in large part why I argue this, to remove mind share from the old scenarios.
If that power is instead superintelligence (that doesn’t follow humanity),
My implied best available plan for humanity is to create each successive superintelligence with sufficiently fewer resources that it could not takeover despite its mild efficiency advantage at using resources strategically. Thus, you can create and deploy misaligned superintelligence and not end up in the doom scenarios but get to try again. (This does necessitate that we can catch misalignment, which seems likely given the current nature of Ai systems and that we can put ooms more resources into auditing the models than they can put defending themselves. It also necessitates that we change course when we catch misalignment, thus my recommendations for better corporate and national governance).
It seems to me “we only get one try” continues to be a frequently argued for position and I think it’s in practice false (though I’ve seen some tautological définitions for which it’s true but uninformative). This post contibutes to weakening that position.
I broadly agree with your second paragraph
Was there a theory of change for doing this research? Is there a way in which this is useful or might update you in some way?
I went and read Didicosm ahead of this angry review and don’t think it was particularly worth my time. I’m writing this as info for future readers of this review and as suggestion to this post’s author to prepend the review with a non-spoiler heavy info about what the story “is about” and who might or might not wanna read it.
For context, I found many Greg Egan stories really good so read this one based on that prior and am confident in saying this story is not worth reading for most people who like classic Greg Egan.
Re the potshot “Or when people with actual political power believed that AI was on the verge of bootstrapping itself to superintelligence?”, it is pretty rich to place yourself in the future and claim nothing bad ended up happening ahead of it happening. Rather like saying “Fool! I have written speculative fiction set in the future where your worries didn’t pan out, so you are already foolish now to be worrying about them!”
I like frogs and you probably do too
I’ve heard that the smaller the screen the more narrow the attention and the more trapped, I partially believe that
There should be a word for experiences/qualia, and I think it should not be “sense data”, since in fact a larger part of your experience depends on the processing than the inputs. Whenever you look to a different spot you go blind during the eye movement but your visual field doesn’t blur and come back.