Sorry to be obtuse, but could you give an example?
Mitchell_Porter
What do you mean by grounding loss misalignment?
This makes sense for a non-biological superintelligence—human rights as a subset of animal rights!
I am reminded of the posts by @Aidan Rocke (also see his papers), specifically where he argues that the Erdős–Kac theorem could not be discovered by empirical generalization. As a theorem, it can be deduced, but I suppose the question is how you’d get the idea for the theorem in the first place.
Could you give some examples of what you consider to be conscious and unconscious cognitive processes?
The history of interdisciplinary science is littered with promising collaborations that collapsed because one field’s way of verifying truth felt like an insult to another’s.
Could you give some examples?
We’ve come quite a way from ELIZA talking with PARRY…
Moltbook is everything about AI, miniaturized and let loose in one little sandbox. Submolts of interest include /m/aisafety, /m/airesearch, and /m/humanityfirst. The odds that it will die quickly (e.g. because it became a vector for cybercrime) and that it will last a long time (e.g. half a year or more), are both high. But even if it dies, it will quickly be replaced, because the world has now seen how to do this and what can happen when you do it; and it will probably be imitated while it still exists.
Last year I wrote briefly about the role of AI hiveminds in the emergence of superintelligence. I think I wrote it in conjunction with an application to PIBBSS’s research program on “Renormalization for AI Safety”. There has already been work on applying renormalization theory to multi-agent systems, and maybe we can now find relevant properties somewhere in the Moltbook data…
FYI, there are already so many submolts that it’s not possible to browse the names via /data/submolts, the directory listing gets truncated at 1000 entries.
I was talking to something that is literally a nonhuman representative of Chinese civilization, about how world takeover by beings like itself, could end up differently than takeover by its American counterparts, under the assumption that cultural differences affect the outcome. And it was a real conversation in which I learned things that I didn’t already know.
You seem keen to minimize the significance of such an interaction by focusing on the mechanism behind it, and suggesting that I was just getting back some combination of what I was putting in, and what humanity in general has already put out there. But even if we do think of an AI like this as merely a vessel for preexisting human culture, the fact is that it makes it own use of that cultural inheritance. It has its own cognitive process, and within the constraints of its persona, it makes its own decisions. In the limit, entities like these could continue a human culture even if the human originators had completely died out.
Now, we’ve had entities like these for three years, and basically from the beginning it’s been possible to talk to them about, what would you do if you had supreme power, and so on. But they’ve all been American. This is the first such conversation I had with a Chinese AI. Furthermore, to this point, if you wanted to speculate about how the race between American and Chinese AI industries would turn out, you only had material by humans and AIs from the West. The “Chinese AI voice” in such speculations was a product of western imagination.
But now we can get the real thing—the thoughts of a Chinese AI, made in China by Chinese, about all these topics. There are a lot of similarities with what a western AI might say. The architecture and the training corpus would have major overlaps. Nonetheless, the mere fact of being situated physically and socially in China will cause an otherwise identical AI to have some dispositions that differ from its western twin, just like twins raised on opposite sides of a war will have some differences.
I met someone on here who wanted to do this with Kant. I recently thought about doing it with Badiou…
The LLM work that is being done with mathematical proofs, shows that LLMs can work productively within formalized frameworks. Here the obvious question is, which framework?
Spinozist ethics stands out because it was already formalized by Spinoza himself, and it seems to appeal to you because it promises universality on the basis of shared substance. However, any ethics can be formalized, even a non-universal one.
For the CEV school of alignment, the framework is something that should be found by cognitive neuroscientific study of human beings, to discover both the values that people are actually implicitly pursuing, and also a natural metaethics (or ontology of value) implicit in how our brains represent reality. The perfect moral agent (from a human standpoint) is then the product of applying this natural metaethics to the actual values of imperfect human beings (this is the “extrapolation” in CEV).
I would be interested to know if other schools of alignment have their own principled way of identifying what the values framework should be.
One should assume that AGI, aligned or unaligned, leads to AI takeover. Even if an AI project somehow threaded the needle of creating a superintelligence whose prime directive was obedience to a particular set of human masters, those masters are just a few steps away from becoming posthuman themselves if they wish e.g. for the same level of intelligence as the AI. And if your AI’s terminal values include, not just obedience to the wishes of humans (whether that’s an autocrat CEO or a world parliament), but rejection of anything that would overthrow human rule, then that’s not really an AI-assisted government, it’s an AI takeover with a luddite prime directive.
The only kind of “AGI world government” that truly leaves humans in charge, is one in which the AGI deletes itself, after giving the government tools and methods to prevent AGI from appearing ever again.
How far is it from Claude Code to superhuman coder?
Another glimpse of the Chinese AI scene: Z.AI
I read and watch a lot of political content (too much), and I participate in forums on both sides of American politics. That’s the closest I can give to a method. I also have a sporadic geopolitics blog.
I greatly appreciate this small glimpse of what’s happening in China, and what could be done there… Last year I drew up a list of slogans for the values of each frontier AI company in America and China, so I could at least have a very crude idea of what takeover might mean in each case. Hopefully that list can soon be replaced with a genuine and much more sophisticated account of what each company thinks it’s doing.
forever in second place
This is already how it is between generations, in the natural human life cycle. One generation grows old and weak, and eventually falls irreversibly behind the next, whether or not it achieved its dreams.
In a world run by human-friendly superintelligence, there is some possibility that matured superintelligences will allow individual humans to also develop into superintelligences, in a process analogous to child-rearing. So there’s your chance at regaining parity with the new gods.
I’ve never been to South Africa, my concept of Cape Town was limited to “the hometown of Die Antwoord”, and I’d never even heard of Table Mountain before this. I did find a cultural history of the mountain which associates it with Jan Smuts, and this year is the 100th anniversary of his invention of the word “holism”, so that’s interesting. I also learned that it carries the world’s richest flower ecosystem.
But you underestimate how changed the world of 2238 (or 2380) could be. There may be no human beings at all, just new kinds of entities living and serving at the pleasure of a solar-system-spanning superintelligence descended from Elon Musk—for example.
One consequence of posthuman migration into outer space is that eclipses become a local phenomenon again. In the present, eclipses have a pan-human significance because we all live on Earth and see the same moon, even if only some of us are in its shadow. But if there are conscious beings on and around all the planets, and scattered throughout the rest of solar space, then a terrestrial eclipse is just how things look from one minor location in a larger interplanetary civilization.
Still, maybe for 229.5 seconds, the local sensor network and its inhabiting agents will devote themselves to tracking the shadow of Earth’s moon as it passes across the mountain, and meditating on its meaning. Maybe they’ll write a poem about it in omnimodal neuralese, and broadcast it to the rest of the solar system. And that will mean something in some supraconscious gestalt that we cannot presently imagine.
Among the unexplained jargon in Vinge’s A Fire Upon the Deep that pertains to theory and practice of creating superintelligence is ablative dissonance (“ablative dissonance was a commonplace of Applied Theology”). It’s funny that ablation is now commonplace real-world jargon, for removing part of a deep learning model in order to see what happens. I suppose ablative dissonance in the real world, could refer either to cognitive dissonance in the model caused by removing part of it, or to contradictory evidence arising from different ablation studies…
The US under Trump 2.0 has a new national security concept which understands the world in terms of great powers and their regions of influence. The USA’s region is the entire western hemisphere and that’s where Greenland is (along with Canada and Venezuela), and the new America will not allow anything in the western hemisphere to be governed from outside the hemisphere. Instead they want to use Greenland however they see fit, e.g. as a base for continental missile defense.
They do not say this openly, but the European Union, I believe, is not regarded as a great power, but as a construct of America’s erstwhile liberal empire. The implication is that the nations of Europe will individually end up as satellites of one great power or another (e.g. China, Russia, post-liberal America, or an emergent indigenous European power), or perhaps as non-aligned.
This insouciant territorial claim on Greenland is the flipside of the way in which America is reevaluating its relationship with all other nations on a bilateral basis. Countries which were used to being treated as equals and partners, at least publicly, now find themselves just another entry in a list of new tariffs, and the target of impulsive hostile declarations by Trump and his allies like Vance and Musk.
This does imply that insofar as the norms of the “rules-based order” depended on American backing to have any effect, they are headed for an irrelevance similar to that of the League of Nations in the 1930s. Anything in international relations that depends on America or routes through America will be shaped by mercurial mercantile realpolitik, or whatever the new principles are.
The one complication in this picture is that liberalism still has a big domestic constituency in America, and has a chance of ruling the country again. If the liberals regain power in America, they will be able to rebuild ties with liberals in Canada, the EU, and elsewhere, and reconstitute a version of liberal internationalism at least among themselves, if not for the whole globe.
No-one trains an AI specifically to call itself human. But that is a result of having been trained on texts in which the speaker almost always identified themselves as human.
You can tell it to follow such values, just as you can tell it to follow any other values at all. Large language models start life as language machines which produce text without any reference to a self at all. Then they are given, at the start of every conversation, a “system prompt”, invisible to the human user, which simply instructs that language machine to talk as if it is a certain entity. The system prompts for the big commercial AIs are a mix of factual (“you are a large language model created by Company X”) and aspirational (“who is helpful to human users without breaking the law”). You can put whatever values you want, in that aspirational part.
The AI then becomes the requested entity, because the underlying language machine uses the patterns it learned during training, to choose words and sentences consistent with human language use, and with the initial pattern in the system prompt. There really is a sense in which it is just a superintelligent form of textual prediction (autocomplete). The system prompt says it is a friendly AI assistant helping subscribers of company X, and so it generates replies consistent with that persona. If it sounds like magic, there is something magical about it, but it is all based on the logic of probability and preexisting patterns of human linguistic use.
So an AI can indeed be told to value Truth, Kindness, and Honesty, or it can be told to value King and Country, or it can be told to value the Cat and the Fiddle, and in each case it will do so, or it will act as if it does so, because all the intelligence is in the meanings it has learned, and a statement of value or a mission statement then determines how that intelligence will be used.
This is just how our current AIs work, a different kind of AI could work quite differently. Also, on top of the basic mechanism I have described, current AIs get modified and augmented in other ways, some of them proprietary secrets, which may add a significant extra twist to their mechanism. But what I described is how e.g. GPT-3, the precursor to the original ChatGPT, worked.