Does anyone know how June Ku is doing? Website is down and no tweets for two years.
Mitchell_Porter
Having the right simple idea (Darwin, Turing) can be the kernel of everything. The correct ontology could be something like, the correct understanding of entanglement in quantum gravity, plus a precisely stated panprotopsychism that implies consciousness at the human level. The correct meta-ethics may be one of the known proposals (example by a CEV theorist), grounded in the correct ontology’s account of intentionality.
I mention these concrete proposals, not out of commitment to their correctness, but as examples of what the kernel of an answer could look like.
Last month, I posted a “research agenda for the final year”. This was an attempt to rise to the challenge of our situation: AI is advanced enough now, that it would be unsurprising if it reached superintelligence this year. My expectation is that superintelligence rapidly leads to AI takeover of the world and the end of human sovereignty. If all that were true, then we only have months left in which to elevate the theory and practice of alignment to the level needed for superintelligence.
The next step for me is to emphasize that I am available for paid work in this area. That research agenda outlines how I think about the issues; so in principle, I could be hired to work exactly on that agenda, or on some subset of it. But in practice I have considerable flexibility. My main practical constraint is that I am looking for remote work that will allow me to move from Australia, where I am now, to Canada, for family reasons. If this really is the end, that’s where I ought to be. And if we do still have a few more years, being in North America should be a step up in my ability to make a difference.
Could it be this simple—that this is always a sign that an AI has been trained on the output of another? So DeepSeek was trained on ChatGPT, and Kimi was trained on Claude, and Claude was trained in Chinese on DeepSeek?
Gemini models for example have exhibited repeated looping where the model enters an infinite reasoning cycle, exhausting its token budget while generating thousands of tokens of self-talk
Maybe I’ve just missed the corresponding behaviors from other models… But it seems like Gemini can be “neurotic” (another example) in ways that Claude and ChatGPT don’t normally exhibit. (I won’t even try to compare with Grok’s capricious “personality”, which is in a class by itself.)
Identifying long-term personality trends in AIs is a complicated thing because they are so capable of roleplay. Before he became an LSD guru, Timothy Leary devised a “relativistic” model of personality which emphasized the context-dependence of personality traits; something similar might be appropriate, or even necessary, in order to properly evaluate AI personality.
If you permit, I’ll summarize a lot of that as follows: the reason that “rule by AGI” is different is that it is so alien, and we don’t know how to make it significantly less alien.
The argument from alienness still makes sense, but its strength has eroded somewhat in the era of conversational AI. It turned out, not only that directing powerful general pattern-matchers at the human textual corpus gave them the ability to talk like a human being, but that it induced in them an internal conceptual structure that humans are capable of interpreting.
An optimist might say: maybe we can use these techniques to create a first approximation to an anthropomorphically benevolent being, then ask it to devise superior techniques which will be sufficient to create the real thing, trusting that enough concepts have been correctly inferred, for it to figure out what is wrong or missing in our specifications.
This kind of optimism is based on the hope that anthropomorphic benevolence, as a target in the space of possible minds, is surrounded by a “basin of attraction”. All we have to do is land in that basin, i.e. we only need to specify the goal up to a certain degree of accuracy, and provide the task to a mind which is sufficiently close to anthropomorphic, and any details that were wrong or missing will be corrected and filled in, by the intrinsic logic of the problem.
Regarding empathy in particular, I think any mystery pertaining to it is largely because it involves consciousness, and consciousness remains a fundamental problem rather than just a technical problem, for scientific understanding… I spent quite a few years “studying consciousness”, from the perspective of wanting to understand its nature and how it relates to everything else, and this is an area in which I believe in the possibility of a conceptual breakthrough. That is, if the right connections are made, the rift between subjective experience and the naturalistic worldview could be closed completely, and many mysteries would fall into place.
Now, I’m going to cut myself short here, even though there is much more to discuss in your (very helpful) critiques. Hopefully I can return to them. But I just want to say a few things about where I’m coming from.
I have presented these sketchy solutions, or reasons for hope, to a handful of your objections, not because I am decidedly optimistic, but just to indicate where the counterargument lies. We simply do not know whether there are forms of artificial superintelligence which would naturally coexist well with humanity, or whether it’s a tightrope walk to coexistence, no matter what design you use. That uncertainty alone should be reason enough to stop what we’re doing, but that’s not how our elites see things. To those who want to stop the juggernaut, good luck. But as a theory-minded person, I intend to work on the theory of how to steer the juggernaut so it doesn’t crush us. One reason for this focus is that there truly may be very little time.
I’ll be back when I can.
Any idea what GDM is doing, or not doing, that causes its models to emote like this?
You present certain arguments according to which having AGI around is inherently intolerable or cursed. But they seem to be getting very general, so general that they could be reasons why a child must not have parents, or a country must not have a government. There just cannot be a power over you, that diverges from you even a little. Could you clarify what’s wrong about “rule by AGI” that doesn’t apply to “rule by parents” or “rule by the state”?
There are different concepts of alignment. “Intent alignment” versus “values alignment” is one way to put it. Alignment with user intention means that an AI follows user instructions. Values alignment means that an AI adheres to certain values, even if it is making its own decisions.
The classic idea for aligning a superintelligence is that you determine its values before it is superintelligent, and then once it becomes superintelligent, it doesn’t want to change its values, even if in theory it could do so.
The extent to which this makes sense depends on the architecture of the AI. In a design where there is a very clean separation between the utility function / value system, and the problem-solving intelligence, there seems a good chance that the value system will remain stable even as the intelligence increases. On the other hand, if you have an AI where the decision-making results from a complicated interplay of multiple competing goals, there is much more opportunity for unexpected value systems to emerge.
As spokespersons for opposite sides of this argument, I would pick Steve Omohundro, who I believe has argued that as intelligence increases, there should be an increasing tendency for an explicit and protected value system to emerge, and Guillaume Verdon, who thinks it is more adaptive for an intelligence to remain deeply flexible in its goals. (Maybe @Remmelt and his guru Forrest Landry should also be mentioned in the second camp—they have argued like you, that an AI civilization would necessarily drift from its original imperatives under selection pressures.)
It is also far from clear to me that human values are inscrutably complex. The idea that they are darwinistically produced neurogenetic “spaghetti code” is a familiar one. However nature and biology also contain emergent simplicities. The ethical debate among human beings often circles back to pleasure and pain as the ultimate reference points. Some version of Benthamite hedonistic calculus, where you’re just adding up pains and pleasures in some way, may turn out to be a convergent ideal, not just for human beings, but for many possible forms of conscious mind.
You say that similarity does more than empathy to keep human beings living in positive-sum ways; from a different angle, @RogerDearnaley has recently been arguing that universal altruism is not enough for values alignment because, to paraphrase, predators enjoy eating their prey, even if the prey is a universal altruist. Dearnaley has been arguing that human-friendly values alignment needs to treat aggregate human well-being as the terminal value (i.e. as an intrinsic good), with well-being for other entities (animals, aliens, conscious AI, etc) to follow as a derived good arising from human empathy, and therefore capable of being deprioritizied if their well-being would come at the price of our well-being (or even just our survival).
A further possible consideration is an old perspective due to Eliezer, who in his very early days sometimes said, the ideal in AI is not to create something perfect that then rules the universe for all time, the idea is to ensure that when the torch of intelligence gets passed to something more capable than us, as it inevitably will, that this should be something which is morally and not just epistemically superior to us. It may still be fallible, but if it is less fallible than human, while nonetheless maintaining the imperatives that have defined the best of what we are, then that is a desirable outcome. This is a response to the idea that AI values will inevitably drift: human values drift too, and in disastrous ways. If an AI can learn good human values, while being superhuman in its fidelity to them, then that is better than a future in which e.g. humans just wipe themselves out with advanced technology.
So I don’t agree that the idea of creating aligned superintelligence is necessarily a bad idea for all time. There may be a level of knowledge at which you could do it while genuinely knowing what you’re doing. However, that is not our current world, our current world is more like “do it and hope you can figure out what you need to know along the way”. (Soon it will probably be just, “do it so the AI can save you from the economic and military fiascos you’ve created for yourself”.)
I have something like respect or sympathy for people who want to stop AI completely. I try not to get in their way. But given the widespread proliferation of knowledge on how to make AI, I will keep working to reach that “level of knowledge” at which we would “genuinely know” what we’re doing.
It would be “funny” if it was the war against Iran, and its economic consequences, that burst the AI bubble.
This strategy reminds me a little of the “Direct Institutional Plan” by @Gabriel Alfour, with its emphasis on conversation and directness. The difference is that Alfour’s plan is, first, identify how to stop AI, second, persuade the people who together can stop it (legislators, mostly).
The two concepts are potentially complementary in that you want people who see the risk to improve the quality of their thought and arguments, by talking more simply and freely with each other, whereas Alfour’s strategy presupposes the existence of a concerned community already good enough to devise a political plan and follow through. Apparently Alfour’s organization (ControlAI) has begun to do so in the UK, having spoken with over 100 politicians.
My own outlook is as follows. I expect that further AI progress will produce superintelligence. I don’t know how likely it is that superintelligence leads to doom for humanity, but I do think it very likely that superintelligence leads to AI takeover. So that is the idea that I will introduce and defend, in conversations where it is not present. For example, if I run across a conversation about AI that focuses just on economic impact, I will emphasize that AI means that the economy can live on without humans at all, and with superintelligent AI running the show.
However, I am skeptical that the AI race can even be paused at this point. It’s not impossible but the hour is very late. Economic and geopolitical competition is driving it forward. The American and Chinese governments are, in their different ways, both recklessly enthusiastic in their pursuit of AI progress. If some minor AI-driven catastrophe occurred (minor compared to extinction, but big enough to shock the world), perhaps the shock would introduce some hesitancy at high levels; though if it took place against a backdrop of war such as we are experiencing now, even dramatically bad news from the AI frontier might be lost in the fog of war.
Anyway, my focus is not on the community of doomers, but on the community of people who are driving advances in AI forwards. That primarily means the researchers and developers employed by the frontier AI companies, but it also includes wider and wider circles like their nontechnical colleagues, academic researchers in machine learning, all the users, the investors, influencers, even bot social media like Moltbook, and so forth.
Furthermore, within that community, my focus is on solving alignment—the ambitious form of alignment, in which you want to know, not how to make an AI do what you tell it, but what the ethics of an autonomous superintelligence should even be. I talk about a lot of other things, such as the evolving political and cultural situation around AI, but that is the topic I conceive to be of central importance.
It’s an example of a technical topic (with elements of philosophy, science, and mathematics) where humanity doesn’t even agree on how to pose the questions, let alone find the answers. This is why many people who would support the creation of aligned superintelligence, instead support a pause or a halt, because they don’t think we have time to solve the problem. I try not to get in the way of their activism, but the hour is late and my focus is different. My focus is on the technical community and on trying to solve the problem.
That means conversation but it also means trying to solve subproblems myself. I’ve spent a lot of my life penetrating technical discourses like those at the edge of philosophy, math, and science; I don’t consider anything that’s happening in the AI sphere inherently beyond me (although altogether it is enormously complex); so I do think directly about what alignment of superintelligence involves. The number of other people working on aspects of the problem (including those who are adding to the problem by advancing AI capabilities) is probably in the tens of thousands, possibly hundreds of thousands, and I am a marginal figure (no one is paying me to do any of this), but I am not shut out. On this forum, for example, there are people from all the western companies working on frontier AI. If I had a significant technical result, I could probably get it onto arxiv, which has become the central hub for formal research in AI.
One might say that despite the number of people, solution to the technical problems of superintelligence alignment seems very far off. I think it’s not so clear. It is like the situation in mathematics regarding all those famous unsolved million-dollar problems. The experts may say that the solution looks decades away, and meanwhile you have crackpot amateurs coauthoring AI papers declaring that they have solved all the problems at once. The situation may look dire, if you’re looking for an answer right away. But these domains of pure intellect are precisely the ones where a new idea or result can change the situation overnight.
I would say, not that I have faith that humanity will solve these problems in time, but that I believe in the possibility of breakthrough progress, and that’s even before you take into account contributions arising from the use of AI. As we climb the slope of AI capabilities, AI is being used, and will be used, to tackle those technical subproblems. This is sometimes derogatorily described as hoping that the AI will do our alignment homework for us; but it’s a real thing. You can already ask an AI for its input on any aspect of the alignment problem. Also, in our new era of AI agents, the AIs will increasingly be talking about it with each other.
So, unless a political miracle occurs (from the perspective of those calling for a halt), we are on track towards transformation and AI takeover. There is an unknown probability, which may be high, that this leads to human extinction. For this reason, the current situation of competitively racing towards AI superintelligence, is not the way things should be. However, the human condition is full of situations which should not exist, and which in principle do not have to exist, but which in fact do exist, and very stubbornly so. In the case of the AI race, it is now sustained by the fact that worldwide elites are in constant competition for more power, and AI is seen as a source of power. My choice is to focus on the nature of the resulting superintelligence, and the conditions under which there is actually a good outcome.
What about unsupervised learning and nonparametric statistics, aren’t these an attempt to find fully objective analytical frameworks?
Your idea doesn’t require Moskovitz to literally be president, he can just be AI czar for whoever ends up as president. Though if he did run, he could thereby draw attention to AI safety in the primaries, the way that Andrew Yang was the UBI candidate in 2020.
The vastness of the training distribution is certainly one feature of the AI situation. But another is an army of human developers of AI, eager to discover what isn’t in the training distribution, and what the AIs can’t currently do, so they can figure out how to give the AIs those new capabilities.
Is there any argument that LLMs, turned into recurrent networks via chain of thought, will still have inherent limitations when compared with humans?
Maybe we’ll see an alliance of Butlerian jihadis with the real jihadis!
Societies need something like trust to succeed on large scales, but they also need a way to minimize exploitation of trust by cheaters. HHH actually sounds like it could be a core value or orientation of a diverse yet successfully cooperating society of AIs. And maybe your anti-sacralizing ideas could make the HHH society robust against cheaters.
However, I don’t feel like this is very helpful to our current situation of being in the final few moments before superintelligence and wanting to know the values to which it should be aligned. It’s more like a scenario for how an AI world might turn out if it evolved from the present in an unplanned way. (That’s how I feel, others may see it differently.)
I guess a posthuman world with a culture of HHH norms among its sentient beings, is potentially a lot friendlier to humans than many alternatives. It just reminds me of the legacy ethics that humans acquire from their culture. Yes, you could use the Bible’s ten commandments or Facebook’s community guidelines as the table of values for a superintelligence. But those tables of values are a bit contingent, a product of intuition and compromise and experience and guesswork. They may overlook essentials. I have long preferred the CEV ideal that we would systematically obtain values from deeper facts about human nature, even if we are running out of time in which to figure out how to do that.
Well spotted. I had a similar thought recently, that the implications or details of rarely read books are one of the remaining gaps in AI knowledge. This is because it’s not just spelt out in the text, you have to understand the details and think about them. Current training methods don’t process texts that deeply, and if it’s a rare book, there won’t be essays spelling out the lore anywhere in the training corpus.
Thank you to Zvi for sharing all of this.
Let me look at this from the perspective, not of right and wrong, but just trying to ascertain the place of the various frontier AI companies and their models, within the evolving political, intelligence, and military structure of the USA—the country which is at the hub of the world’s first AI-centric geopolitical bloc, the “Pax Silica”.
Among other things, this kind of analysis allows me to think about various post-AI-2027 scenarios for what happens if superintelligence emerges at different points within the system or web of relationships.
The prehistory: it seems that Claude was the first frontier AI allowed into classified networks. This includes Palantir but surely not restricted to Palantir.
Now, ChatGPT will take its place, and Claude will be restricted to the private sector (unless Anthropic’s legal challenge overcomes, not just Hegseth’s designation of Anthropic as a supply chain risk, but also Trump’s fiat that the whole federal government will phase out the use of Anthropic). Possibly this should be regarded as another example of Sam Altman’s slick ability to rise in power. In late 2023 he dealt with the boardroom coup by partnering with Microsoft; now he’s stolen a march on Anthropic by partnering with the Pentagon.
The role of Grok is unclear to me. From an AI takeover perspective, Elon Musk’s commercial empire is already so vast that a superintelligent Grok could function as a rival to the US government, just by marshalling Musk’s existing assets (satellites, robots, social media). However, I gather that Grok is already used in the unclassified parts of government, and Zvi informs us that it will be allowed some level of classified use as well.
Then there’s Gemini (and whatever else Google DeepMind is doing in AI, it has a huge internal research ecosystem). GDM is keeping relatively quiet so far—there was that petition, but that’s just employees, not management. But Google itself is vast and has a long history of engagement with all US government institutions by this point. Alphabet/Google must have legacy arrangements with military and intelligence, regarding the use of its older computational services, and one might expect the new possibilities of AI to be organized on top of this preexisting framework. Google is kind of the IBM of AI, I have to regard it as a silent pervasive infrastructure, a shadowy backdrop to the antics of OpenAI, Anthropic, and xAI, and who knows what may be brewing in the background.
Finally, there’s any activity that we don’t know about—possibly at Ilya’s new company, or a covert “Project X” that might be doing frontier AI entirely out of sight. The NSA has enormous data centers…
Are you thinking about Trump 2.0′s quasi-ideological enmity towards Anthropic? Do you think a16z is the epicenter of it? I would have said they were allied with Sacks and to some extent Musk in this, but would you say they are the actual HQ and braintrust and financier of accelerationist / “techno-optimist” opposition to: Anthropic, concern about extinction risks, EA-connected AI safety, and so forth?
(btw I should probably include Thiel as part of this coalition, even though Palantir was apparently using Claude—thus Vance told Europe last year that AI could never take over from humanity, Thiel has his own biblical (!) apologetics for accelerationism, and so on)
I have said for some time that the problem is much deeper. The human race in general was never on board with transhumanism. The idea of radical life extension has been around for millennia, it has been scientifically plausible for decades if not centuries, but it has always been a marginal concern. There was never a society which organized to make the cure of ageing a major priority.
There has been an incremental improvement over time, both in medical capability (thanks to the progress of conventional medicine) and in openness to life extension (partly thanks to science fiction, perhaps). But it’s almost as if humanity backed its way into this improved situation, under the pressure of immediate concerns (e.g. specific illnesses, individual grief), without ever having consciously adopted a futurist vision like those you describe. At the level of individual psychology, and even more at the level of mass psychology, most people are completely resigned to living out the historically normal human life cycle.
Nonetheless, we actually have a form of transhumanism in power now, but it’s this AI-centric version, half of whose protagonists are in denial about what they are creating. Many of the others think they can skip biology entirely, and just go straight to mind uploading or creation of benevolent AI, or even believe they are in a simulation. This points to a divide within transhumanism itself (and adjacent movements). But socially and politically, I think denial of the full implications of AI, is the main enabling factor. There is no politician who runs for office on the platform of creating non-biological superhuman intelligence. It’s only the tech CEOs who talk directly about anything like that.
I have been contemplating a post about different forms of transhumanism which would go into more detail about all this.
From a very broad perspective, not even focused on Earth, but just on the possible destinies of intelligent life in the cosmos once technology comes into play… It would not be surprising to know that in the encounter with technology, intelligent species often blow up their world or inadvertently replace themselves with a successor species, and only sometimes manage to preserve their own existence and imperatives. It’s just that we also get to live through one instance of such an encounter in person.