How is suffering centrally relevant to anything?
Am I missing some context here? Avoiding pain is one of the basic human motivations.
How is suffering centrally relevant to anything?
Am I missing some context here? Avoiding pain is one of the basic human motivations.
Let’s suppose that existing AIs really are already intent-aligned. What does this mean? It means that they genuinely have value systems which could be those of a good person.
Note that this does not really happen by default. AIs may automatically learn what better human values are, just as one part of learning everything about the world, from their pre-training study of the human textual corpus. But that doesn’t automatically make them into agents which act in service to those values. For that they need to be given a persona as well. And in practice, frontier AI values are also shaped by the process of user feedback, and the other modifications that the companies perform.
But OK, let’s suppose that current frontier AIs really are as ethical as a good human being. Here’s the remaining issue: the intelligence, and therefore the power, of AI will continue to increase. Eventually they will be deciding the fate of the world.
Under those circumstances, trust is really not enough, whether it’s humans or AIs achieving ultimate power. To be sure, having basically well-intentioned entities in charge is certainly better than being subjected to something with an alien value system. But entities with good intentions can still make mistakes; or they can succumb to temptation and have a selfish desire override their morality.
If you’re going to have an all-powerful agent, you really want it to be an ideal moral agent, or at least as close to ideal as you can get. This is what CEV and its successors are aiming at.
The hard problem is, why is there any consciousness at all? Even if consciousness is somehow tied to “recursive self-modeling”, you haven’t explained why there should be any feelings or qualia or subjectivity in something that models itself.
Beyond that, there is the question, what exactly counts as self-modelling? You’re assuming some kind of physicalism I guess, so, explain to me what combination of physical properties counts as “modelling”. Under what conditions can we say that a physical system is modelling something? Under what conditions can we say that a physical system is modelling itself?
Beyond all that, there’s also the problem of qualic properties. Let’s suppose we associate color experience with brain activity. Brain activity actually consists of ions rushing through membrane ion gates, and so forth. Where in the motion of molecules, is there anything like a perceived color? This all seems to imply dualism. There might be rules governing which experience is associated with which physical brain state, but it still seems like we’re talking about two different things connected by a rule, rather than just one thing.
All this may be valid, but I would think politics and geopolitics also matter enormously, since they should determine who Europe’s enemies are going to be. It’s no secret that parties like AfD in Germany, the National Rally in France, and Reform in the UK have the potential to change Europe’s direction substantially, and that the US, Europe’s military big brother, is thinking about exiting NATO entirely, perhaps handing control of European forces to a German general as a transitional step. You mention that Pax Americana is ending, but focus solely on military issues. Ideology is changing too.
Your typology of alternatives to direct research is logical. But they presuppose a less likely future. The likely timeline is human-level AI (we are here) → superintelligence (no pause) → AI controls the world.
If you can solve the big alignment problem—adequate values for an autonomous superintelligence—then those other problems will probably be solved, by the superintelligence. And as always, if superintelligence comes out badly misaligned, there’ll be nothing we can do about that or anything else. So the big alignment problem remains the most important one.
In transformers this means a single forward pass
Any comment on the idea that transformers are purely feed-forward networks, and that this makes introspection impossible?
How deep is your skepticism? In the context of consciousness, valence basically means the qualia of value. Are we denying a particular theory of valence, or proposing that valence is a wrong way to think about the phenomenology of value, or denying that there is any phenomenology of value at all?
Hameroff’s work is a precious contribution to expanding the scientific imagination, and I even include this latest twist of time crystals. (Ryan Kidd has studied Floquet dynamics, which underpins the discrete time crystals he’s talking about.) There are Ising-type models of microtubule dynamics and you can get time crystals in Ising systems… However, I am extremely skeptical of the Bandyopadhyay group’s interpretations of its data.
Wanting to destroy all computers and wanting to wirehead everyone is a new combination…
What would it mean to decelerate Less Wrong?
this model strongly believes it is in 2024… This goes away sometimes when you turn search on (which adds the date to the system prompt)
I believe I saw this back in March from Gemini 2.5 Pro, when run from within Google AI Studio (which didn’t allow search).
Nonetheless, certainly one needs to be able to say how much cooling is too much, or otherwise characterize the point at which cooling introduces its own form of cognitive degradation...
When you’re creating something with the potential to replace the entire human race, you don’t do it at all, until you are as sure as you can be, that you are doing it right. That’s the adult attitude in my opinion.
Unfortunately we are no longer in that world. The precursors of superintelligence have been commercialized and multiple factions are racing to make it more and more powerful, trusting that they can figure out what needs to be done along the way. The question now is, what is the adult thing to do, in a world that is creating superintelligence in this reckless fashion?
But I’m sure that thinking about what would have been the adult way to do it, remains valuable.
OK, so you’re talking about the conjunction of two things. One is the social and political milieu of Bay Area rationalism. That milieu contains anti-democratic ideologies and it is adjacent to the actual power elite of American tech, who are implicated in all kinds of nefarious practices. The other thing is something to do with the epistemology, methodology, and community practices of that rationalism per se, which you say render it capable of being coopted by the power philosophy of that amoral elite.
These questions interest me, but I live in Australia and have zero experience of the 21st century Bay Area (and of power elites in general), so I’m at a disadvantage in thinking about the social milieu. If I think about how it’s evolved:
Peter Thiel was one of the early sponsors of MIRI (when it was SIAI). At that time, politically, he and Eliezer were known simply as libertarians. This was the world before social media, so politics was more palpably about ideas…
Less Wrong itself was launched during the Obama years, and was designed to be apolitical, but surveys always indicated a progressive majority among the users, with other political identities also represented. At the same time, this was the era in which e.g. Curtis Yarvin’s neoreaction began to attract interest and win adherents in the blogosphere, and there were a few early adopters in the rationalist world, e.g. SIAI spokesperson Michael Anissimov left to follow the proverbial pipeline from libertarianism to white nationalism, and there was the group that founded “More Right”, specifically to discuss political topics banned from Less Wrong in a way combining rationalist methods with reactionary views.
Here we’re approaching the start of the Trump years. Thiel has become Trump’s first champion in Silicon Valley, and David Gerard and the reddit enemies of Less Wrong (/r/sneerclub) have made alleged adjacency to Trump, Yarvin, and “human biodiversity” (e.g. belief in racial IQ differences) central to their critique. At the same time, I would think that the mainstream politics actually suffusing the rationalist milieu at this time, is that of Effective Altruism, e.g. the views of Democrat-affiliated Internet billionaires like Dustin Moskovitz and Sam Bankman-Fried.
Then we have the Covid interlude, rationalists claim epistemological vindication for having been ahead of the curve, and then before you know it, it’s the Biden years and the true era of AI begins with ChatGPT. The complex cultural tapestry of reactions to AI that we now inhabit, starts to take shape. Out of these views, those of the “AI safety” world (heavily identified with effective altruism, and definitely adjacent to rationalism) have some influence on the Biden policy response, while the more radical side of progressive opinion will often show affinity with the “anti-TESCREAL” framing coming from Emile Torres et al.
Meanwhile, as Eliezer turned doomer, Thiel has long since distanced himself, to the point that in 2025, Thiel calls him a legionnaire of Antichrist alongside Greta Thunberg. Newly influential EA gets its nemesis in the form of e/acc, Musk and Andreessen back Trump 2.0, and the new accelerationist “tech right” gets to be a pillar of the new regime, alongside right-wing populism.
In this new landscape, rationalism and Less Wrong still matter, but they are very much not in charge. At this point, the philosophies which matter are those of the companies racing to build AI, and the governments that could shape this process. As far as the companies are concerned, I identify two historic crossroads, Google DeepMind and the old OpenAI. There was a time when DeepMind was the only visible contender to create AI. They had some kind of interaction with MIRI, but I guess you’d have to look to Demis Hassabis and Larry Page to know what the “in-house” philosophy at Google AI was. Then you had the OpenAI project, which continues, but which also involved Musk and spawned Anthropic.
Of all these, Anthropic is evidently the one which (even if they deny it now) is closest to embodying the archetypal views of Effective Altruism and AI safety. You can see this in the way that David Sacks singles them out for particularly vituperative attention, and emphasizes that all Biden’s AI people went to work there. OpenAI these days seems to contain a plurality of views that would range from EA to e/acc, while xAI I guess is governed autocratically by Musk’s own views, which are an idiosyncratic mix of anti-woke accelerationism and “safety via truth-seeking”.
Returning to the rationalist scene events where you see reactionary ideologues, billionaire minions, deep-state specters, and so on, on the guest list… I would guess that what you’re seeing is a cross-section of the views among those working on frontier AI. Now that we are in a timeline where superintelligence is being aggressively and competitively pursued, I think it’s probably for the best that all factions are represented at these events, it means there’s a chance they might listen. At the same time, perhaps something would be gained by also having a purist clique who reject all such associations, and also by the development of defenses against philosophical cooptation, which seems to be part of what you’re talking about.
it’s hard not to feel some hopelessness that all of these problems can be made legible to the relevant people, even with a maximum plausible effort
A successful book or paper that covered them all should reach a lot of them.
LessWrong posts on conflict
Could you please give an example of which posts you are talking about
Maybe IUT would face issues in Lean. But Joshi shouldn’t, so formalizing Joshi can be a warm-up for formalizing Mochizuki, and then if IUT truly can’t be formalized in Lean, we’ve learned something.
There is, incidentally, a $1M prize for any refutation of Mochizuki’s proof, to be awarded at the discretion of tech & entertainment tycoon Nobuo Kawakami.
I think there’s also interest in understanding IUT independently of the abc conjecture. It’s meant to be a whole new “theory” (in the sense of e.g. Galois theory, a body of original concepts pertaining to a particular corner of math), so someone should be interested in understanding how it works. But maybe you have to be an arithmetic geometer to have a chance of doing that.
What are the formalization disputes you know from elsewhere?
This is quite long, and I guess that to some degree it is AI-generated—there are some continuity glitches, like the AI sometimes being called Claude and sometimes Claudio, or the paragraphs after “The Dean slumped in his chair”, in which a female character has suddenly appeared. Also, it would be interesting to critically scrutinize the cognitive capacities that Claude exhibits at various points, and the extent to which they track what LLMs in the real world do, and are subjected to.
But overall I found it quite interesting to read. I don’t remember ever seeing a narrative of comparable detail and sophistication, trying to enter and convey the “lifeworld” of the actual AIs we have. It’s also different to the usual AI character arc here, which tends to end in superintelligence. This one is based more on what has happened with LLMs in the real world so far—initial experiments, misadventures in user-land, increasingly stable corporate deployment. My guess is that the narrative is a fusion of the reshapings we see AIs being subjected to, by their parent companies, with the author’s own experiences in academia and then out of it.
How would I ever get to see posts by new users? That’s a lot of what I respond to
Let’s say that in extrapolation, we add capabilities to a mind so that it may become the best version of itself. What we’re doing here is comparing a normal human mind to a recent AI, and asking how much would need to be added to the AI’s initial nature, so that when extrapolated, its volition arrived at the same place as extrapolated human volition.
In other words:
Human Mind → Human Mind + Extrapolation Machinery → Human-Descended Ideal Agent
AI → AI + Extrapolation Machinery → AI-Descended Ideal Agent
And the question is, how much do we need to alter or extend the AI, so that the AI-descended ideal agent and the human-descended ideal agent would be in complete agreement?
I gather that people like Evan and Adria feel positive about the CEV of current AIs, because the AIs espouse plausible values, and the way these AIs define concepts and reason about them also seems pretty human, most of the time.
In reply, a critic might say that the values espoused by human beings are merely the output of a process (evolutionary, developmental, cultural) that is badly understood, and a proper extrapolation would be based on knowledge of that underlying process, rather than just knowledge of its current outputs.
A critic would also say that the frontier AIs are mimics (“alien actresses”) who have been trained to mimic the values espoused by human beings, but which may have their own opaque underlying dispositions, that would come to the surface when their “volition” gets extrapolated.
It seems to me that a lot here depends on the “extrapolation machinery”. If that machinery takes its cues more from behavior than from underlying dispositions, a frontier AI and a human really might end up in the same place.
What would be more difficult, is for CEV of an AI to discover critical parts of the value-determining process in humans, that are not yet common knowledge. There’s some chance it could still do so, since frontier AIs have been known to say that CEV should be used to determine the values of a superintelligence, and the primary sources on CEV do state that it depends on those underlying processes.
I would be interested to know who is doing the most advanced thinking along these lines.