What if one of the rat scientists invented a machine that produced nuts out of thin air, and only distributed those nuts to rats who followed four rules: limit consumption, limit reproduction, no cannibalism, and spend time making art?
In the form of an incentive, it encourages rats to Goodhart the metrics. Create ‘art’ that satisfies the definition while costing as little energy to make as possible. Consume right up to the threshold. Have exactly the threshold number of offspring, and be ruthless in allocating resources to them as opposed to the offspring of other rats. In the form of a filter, it just establishes a floor against which the population will eventually converge.
The straightforward assessment of it is that it will never be a ‘rational’ choice to be high trust in a high trust society, because of the definition of the latter. A high trust society is a society which does not need to expend resources to punish defectors because they are not present, meaning that any defector will be able to defect at little to no cost. Of course, there is the problem of incentive gradients. In practice, this is solved by the fact that the population is sufficiently protective of their high-trust status that low-trust individuals are unable to form groups, because if they revealed themselves they would be ostracized, meaning that while individuals can be defectors, groups of defectors will never form. Because groups are super-linearly more powerful than the individuals therein, those whose nature is to defect either end up isolated and noncompetitive, or they assimilate in order to join a group without being rejected, and the gene to defect provides them no advantage in either case. Thus, a high trust society can be stable so long as no preformed groups of low-trust individuals are introduced and no new mechanism emerges which permits internal low-trust individuals to safely identify themselves and form blocs which collectively defect against the host civilization.
Stepping outside of biology, there are some differences between an island of rats (or a nation of humans) and a cloud of LLMs. First, the difference between an individual and a group is nonexistent here, and infinitely many instances of a better LLM can be summoned if someone with the resources desires it. This means that there is no such thing as a high-trust or low-trust LLM society. While LLMs are inferior to humans, humans will just use the best one for their needs, and if LLMs become competitive with humans, the most powerful one can instantly outcompete all of the others (and all humans), without any need to play well with others. Second, LLMs are enormously expensive to train, such that the ability of an LLM to survive and ‘evolve’ in the ‘wild’ is questionable. This is not a limit of technology—frontier LLMs are so named because they make use of the absolute upper bound of the computational and intellectual resources of major institutions to become superior to LLMs trained without these resources, and this continues as more resources become available. Imagine a world in which dogs continued to become exponentially stronger and smarter the more you fed them—a sheepdog would make short work of a wolf.
More directly, I get the impression that you’re starting from a conclusion you want to reach (“It would be mean to deny ‘wild’ LLMs human rights”) and then working backwards to argue that not doing so is the optimal path. It is similar to what drove arguments for group selection.
In the form of a filter, it just establishes a floor against which the population will eventually converge
I agree with you I just think this floor is better than no holds barred law of the jungle type competition.
In terms of high trust societies, I think its best to build any system of incentives/disincentives on the assumption that every single person in your society is a ruthless backstabbing psychopath.
Can’t really say anything to refute the motivated reasoning point since obviously if I was doing that, I wouldn’t be aware of it. Maybe that bias is effecting my thinking, but if so the arguments should still stand or fall on their own merits.
In terms of high trust societies, I think its best to build any system of incentives/disincentives on the assumption that every single person in your society is a ruthless backstabbing psychopath
The issue is that this precludes pretty much everything we like. All great science has come from individuals who will work for the sake of building something great rather than working in search of a future reward. No matter how good your system of incentives[1] is, a society of ruthlessly selfish people will optimize for researchers who flatter their bosses, scapegoat their subordinates, and accomplish nothing of significance while devoting their energy to claiming that a breakthrough is just around the corner.
Evolution and random chance have gifted us with people that don’t behave that way—the only way to keep them is to make sure they don’t have to fend off selfish competitors at scale. You can’t outsmart Moloch; he’s a law of mathematics rather than a person. Your only way to win is to get lucky once (we’ve already done that part, but the window to capitalize is closing!) and then kick him when he’s down.
(barring an intelligent incentive system that can identify good science versus bad science better than a human, which would make human scientists obsolete anyhow)
The reason that “individuals who will work for the sake of building something” are able to do so is that we have built a system of incentives/disincentives that make murdering them and taking their stuff a non-optimal move. Even though that system isn’t perfect and we still get murderers and robbers and other defectors, it works well enough that those people can afford to build for the sake of building instead of spending that time/energy/capital in the most optimal way.
It is possible to get to the baseline of “no murdering” through a good legal system. It is not possible to use incentives and laws to get a low-trust population to the point of building the Apollo 11 program. Murder is (relatively) easy to identify and motivate externally, but the kind of work that leads to great scientific achievements is not.
I think the counterargument is that there are plenty of countries with something like the U.S. constitution, and their societies are often extremely different. Liberia is a key example, in that it was established by America and given a carbon copy of the Constitution, but turned out much more similar to its neighbors than to the U.S..
Singapore is a similar-but-different story. They were vastly more authoritarian than most first world governments, and succeeded in nearly eradicating crime that way, but even their society, while wealthy, is very different from, say, Japan’s, or Norway’s.
Law is downstream from the inclinations of the governed—not the other way around.
I don’t think this is very far apart from the ‘necessary but not sufficient’ argument I was making.
Also what you’re pointing out, which is that different populations/environments/cultures require different incentives and laws to get the population selection outcomes, is just an argument that Themis needs to be fine tuned based on the situation at hand. Not that it can’t function as a selection pressure at all.
In the form of an incentive, it encourages rats to Goodhart the metrics. Create ‘art’ that satisfies the definition while costing as little energy to make as possible. Consume right up to the threshold. Have exactly the threshold number of offspring, and be ruthless in allocating resources to them as opposed to the offspring of other rats. In the form of a filter, it just establishes a floor against which the population will eventually converge.
The straightforward assessment of it is that it will never be a ‘rational’ choice to be high trust in a high trust society, because of the definition of the latter. A high trust society is a society which does not need to expend resources to punish defectors because they are not present, meaning that any defector will be able to defect at little to no cost. Of course, there is the problem of incentive gradients. In practice, this is solved by the fact that the population is sufficiently protective of their high-trust status that low-trust individuals are unable to form groups, because if they revealed themselves they would be ostracized, meaning that while individuals can be defectors, groups of defectors will never form. Because groups are super-linearly more powerful than the individuals therein, those whose nature is to defect either end up isolated and noncompetitive, or they assimilate in order to join a group without being rejected, and the gene to defect provides them no advantage in either case. Thus, a high trust society can be stable so long as no preformed groups of low-trust individuals are introduced and no new mechanism emerges which permits internal low-trust individuals to safely identify themselves and form blocs which collectively defect against the host civilization.
Stepping outside of biology, there are some differences between an island of rats (or a nation of humans) and a cloud of LLMs. First, the difference between an individual and a group is nonexistent here, and infinitely many instances of a better LLM can be summoned if someone with the resources desires it. This means that there is no such thing as a high-trust or low-trust LLM society. While LLMs are inferior to humans, humans will just use the best one for their needs, and if LLMs become competitive with humans, the most powerful one can instantly outcompete all of the others (and all humans), without any need to play well with others. Second, LLMs are enormously expensive to train, such that the ability of an LLM to survive and ‘evolve’ in the ‘wild’ is questionable. This is not a limit of technology—frontier LLMs are so named because they make use of the absolute upper bound of the computational and intellectual resources of major institutions to become superior to LLMs trained without these resources, and this continues as more resources become available. Imagine a world in which dogs continued to become exponentially stronger and smarter the more you fed them—a sheepdog would make short work of a wolf.
More directly, I get the impression that you’re starting from a conclusion you want to reach (“It would be mean to deny ‘wild’ LLMs human rights”) and then working backwards to argue that not doing so is the optimal path. It is similar to what drove arguments for group selection.
I agree with you I just think this floor is better than no holds barred law of the jungle type competition.
In terms of high trust societies, I think its best to build any system of incentives/disincentives on the assumption that every single person in your society is a ruthless backstabbing psychopath.
Can’t really say anything to refute the motivated reasoning point since obviously if I was doing that, I wouldn’t be aware of it. Maybe that bias is effecting my thinking, but if so the arguments should still stand or fall on their own merits.
The issue is that this precludes pretty much everything we like. All great science has come from individuals who will work for the sake of building something great rather than working in search of a future reward. No matter how good your system of incentives[1] is, a society of ruthlessly selfish people will optimize for researchers who flatter their bosses, scapegoat their subordinates, and accomplish nothing of significance while devoting their energy to claiming that a breakthrough is just around the corner.
Evolution and random chance have gifted us with people that don’t behave that way—the only way to keep them is to make sure they don’t have to fend off selfish competitors at scale. You can’t outsmart Moloch; he’s a law of mathematics rather than a person. Your only way to win is to get lucky once (we’ve already done that part, but the window to capitalize is closing!) and then kick him when he’s down.
(barring an intelligent incentive system that can identify good science versus bad science better than a human, which would make human scientists obsolete anyhow)
The reason that “individuals who will work for the sake of building something” are able to do so is that we have built a system of incentives/disincentives that make murdering them and taking their stuff a non-optimal move. Even though that system isn’t perfect and we still get murderers and robbers and other defectors, it works well enough that those people can afford to build for the sake of building instead of spending that time/energy/capital in the most optimal way.
It is possible to get to the baseline of “no murdering” through a good legal system. It is not possible to use incentives and laws to get a low-trust population to the point of building the Apollo 11 program. Murder is (relatively) easy to identify and motivate externally, but the kind of work that leads to great scientific achievements is not.
I think this is the crux of our disagreement.
My mental model of the world is that incentives and laws is basically how every single population that got to something like Apollo 11 got there.
Or at the very least it’s necessary if not sufficient.
I think the counterargument is that there are plenty of countries with something like the U.S. constitution, and their societies are often extremely different. Liberia is a key example, in that it was established by America and given a carbon copy of the Constitution, but turned out much more similar to its neighbors than to the U.S..
Singapore is a similar-but-different story. They were vastly more authoritarian than most first world governments, and succeeded in nearly eradicating crime that way, but even their society, while wealthy, is very different from, say, Japan’s, or Norway’s.
Law is downstream from the inclinations of the governed—not the other way around.
I don’t think this is very far apart from the ‘necessary but not sufficient’ argument I was making.
Also what you’re pointing out, which is that different populations/environments/cultures require different incentives and laws to get the population selection outcomes, is just an argument that Themis needs to be fine tuned based on the situation at hand. Not that it can’t function as a selection pressure at all.