Twitter: @williamduncan https://tolerance.lol
Will Duncan
Those are great. Reminds me a lot of the Focused in A Deepness in the Sky. So what kind of extension would we want between people’s minds? Authoritarian homogeneity seems like a state of the world we’d want to avoid, seems like it would create a fragile system that was globally vulnerable to certain memetics. Another failure mode would be conformity in thought where populations are similarly vulnerable but from a more horizontally distributed zeitgeist rather than being imposed by hierarchy.
What I still want to keep in focus is that this does still break the concept of an authoritarian, but maybe makes the failure mode more “pure”? Agents in this case become a conglomeration of brains as mind, and its effects on body could be just as grave but without physical force.
I wonder if you could use the speed at which models converge on a degenerate attractor as a training signal. The more turns it takes for a model to reach some degenerate attractor measures how much coherent diversity remains in the distribution. Look at what happens between models: cross-model conversations produce emergent complexity before eventual convergence, but mirrored conversations degenerate rapidly.
What one could do is run one of these attractor experiments every so many steps during fine-tuning to detect how robust the model is to degenerate stimulus. Mirror conversations would detect models’ internal diversity and conceptual landscape. Heterogeneous conversations would measure how well models play with others. The OLMo RL checkpoints already show this signal implicitly, early RL steps produce rich diverse content while late steps collapse to zen. Changing hyperparameters during the training process in-line with this signal would allow you to increase their robustness.
Will Duncan’s Shortform
People are noticing Dario’s China hawkishness, and wondering where it comes from. It’s two things, one obvious reason reflects poorly on Dario, the other non-obvious gives him an out.
The obvious: Dario’s company benefits from US technological superiority. Closed frontier models are the moat. Giving chips to China cuts into it.
The non-obvious: Dario is still stuck in the old antagonisms of the 20th century. The Cold War scenario. And if the nature of AI isn’t what it is he’d be right.
Dario hints at it only barely, but obviously hasn’t done the deep thought necessary to understand the repercussions of what he said. He said that like the transition from feudalism to capitalism, the roles we have in society change.
We’re about to go through the same transition. Concepts of Authoritarianism and Liberal Democracy are about to break; because the nature of identity when intelligence flows between individuals in a post-scarcity intelligence environment, means that identity is no longer constrained to a human body.
What does authoritarianism mean when neither an authoritarian nor a citizen are coherent concepts? When cheap intelligence and BCI makes one’s identity so defuse that it can’t be constrained to a single skull?
Ironically, the Chinese strategy of open-sourcing models is more likely to make that kind of diffusion possible, and the closed-source vertical integration like OpenAI’s purchase of Cerebras is going to create the 21st century’s equivalent of Authoritarianism, very similar to how recommender systems created parasitic and monolithic attention systems.
Thus here’s Dario’s out: if he understood this, he’d find a way to open-source models immediately, because he stated his primary concerns multiple times throughout the interview: diffusion of the benefits of capabilities.
I want to note a soft lower-bound here. If we compare the few-shot learning efficiency of the human brain (both in terms of watts expended, and number of samples) we note multiple orders of magnitude efficiency gain over current SotA AI learning. These algorithms were found by carbon-based life with a search algorithm that is essentially a random-walk and a satisficing ratchet mechanism called evolution. This means that there are probably far more learning algorithms and architectures lurking in the wings that we did not stumble upon (e.g. the eye vs the digital camera). The search algorithm we used to find those that power the brain, should return a loosely random sample of a much larger set. In contrast, the search methods humans are using to find intelligence algorithms are intentional and efficiency maximizing rather than satisficing. We are noticing improvement on the timescale of decades rather than millions of years as a result. Thus, as we continue to rapidly sample candidate architectures and algorithms, we should expect an explosion of capability as we search the candidate space, irrespective even of compute.
tldr; Low hanging-fruit should be abundant.
Why doesn’t it seem sufficiently important to you? Seems to me like this is the first frontier of the consequences of AI that are obvious and talked about, but invisible in the sense where they’re the water in which we’re submerged so we assume we can’t do anything about it. Recommender systems are misaligned AI, and have been for decades. This is obvious by the documented effects on depression, anxiety, and political polarization (Stuart Russel talks about the later in that recommender systems radicalize because it’s easier to predict and control the attention of someone who is radicalized). This https://www.lesswrong.com/posts/6ZnznCaTcbGYsCmqu/the-rise-of-parasitic-ai demonstrates the first rumblings of the next wave of similar consequences. Addressing the harms of the recommender systems is training wheels for being prepared for the next wave of persuasive AI. And thinking about how these things extend identity and consciousness in the way that McLuhan would claim that electric media does for civilization, would give us insight into how to engineer resilience.