IMO this is a prime candidate for curation/editing work, which I might be happy to do if no one else does.
I wonder if the GC-MS exists for municipal water supplies already, and can just be aggregated and compared against population obesity rates? Less precise than doing it house-by-house, but much cheaper if someone has already done it for you and also it might not vary much house-by-house.
I guess my sense is that most biological systems are going to be ‘package deals’ instead of ‘cleanly separable’ as much as possible—if you already have a system that’s doing learning, and you can tweak that system in order to get something that gets you some of the benefits of a VoI framework (without actually calculating VoI), I expect biology to do that.
But in experiments, they’re not synchronized; the former happens faster than the latter.
This has the effect of incentivizing learning, right? (A system that you don’t yet understand is, in total, more rewarding than an equally yummy system that you do understand.) So it reminds me of exploration in bandit algorithms, which makes sense given the connection to motivation.
The ‘generation time’ is the one that can’t be negative. Suppose Alice gets infected on day 1, infects Bob on day 2, Bob shows symptoms on day 3, and Alice shows symptoms on day 4. We end up with:
Incubation periods of 3 days (for Alice) and 1 day (for Bob)
Generation times of 1 day (Bob infected—Alice infected)
Serial intervals of −1 day (Bob symptoms—Alice symptoms)
I am deeply confused how a serial interval can be negative. If I understand the words involved that means you spread it on to someone who gets their symptoms before you do?
If I understand that correctly, it means you’re breathing out infectious levels of virus days before you cough / notice that you are sick.
(I’m confused about your confusion, because I thought the negative serial interval of COVID was one of its most striking features, and the reason why many of the old ‘control system’ things failed; like, people were used to “if you feel fine you are fine” and wouldn’t accept “everyone needs to act as though they could be sick, because you won’t know whether or not you’re infectious until after the fact.”)
Yeah, I think there’s a (generally unspoken) line of argument that if you have a system that can revise its goals, it will continue revising its goals until it it hits a reflectively stable goal, and then will stay there. This requires that reflective stability is possible, and some other things, but I think is generally the right thing to expect.
I do think it’s fair to describe this as the ‘standard argument’.
I think we’re in a sort of weird part of concept-space where we’re thinking both about absolutes (“all X are Y” disproved by exhibiting an X that is not Y) and distributions (“the connection between goals and intelligence is normally accidental instead of necessary”), and I think this counterexample is against a part of the paper that’s trying to make a distributional claim instead of an absolute claim.
Roughly, their argument as I understand it is:
Large amounts of instrumental intelligence can be applied to nearly any goal.
Large amounts of frame-capable intelligence will take over civilization’s steering wheel from humans.
Frame-capable intelligence won’t be as bad as the randomly chosen intelligence implied by Bostrom, and so this argument for AI x-risk doesn’t hold water; superintelligence risk isn’t as bad as it seems.
I think I differ on the 3rd point a little (as discussed in more depth here), but roughly agree that the situation we’re in probably isn’t as bad as the “AIXI-tl with a random utility function implemented on a hypercomputer” world, for structural reasons that make this not a compelling counterexample.
Like, in my view, much of the work of “why be worried about the transition instead of blasé?” is done by stuff like Value is Fragile, which isn’t really part of the standard argument as they’re describing it here.
On this proposal, any reflection on goals, including ethics, lies outside the realm of intelligence. Some people may think that they are reflecting on goals, but they are wrong. That is why orthogonality holds for any intelligence.
I think I do believe something like this, but I would state it totally differently. Roughly, what most people think of as goals are something more like intermediate variables which are cognitive constructs designed to approximate the deeper goals (or something important in the causal history of the deeper goals). This is somewhat difficult to talk about because the true goal is not a cognitive construct, in the same way that the map is not the territory, and yet all my navigation happens in the map by necessity.
Of course, ethics and reflection on goals are about manipulating those cognitive constructs, and they happen inside of the realm of intelligence. But, like, who won WWII happened ‘in the territory’ instead of ‘in the map’, with corresponding consequences for the human study of ethics and goals.
Persuasion, in this view, is always about pointing out the flaws in someone else’s cognitive constructs rather than aligning them to a different ‘true goal.’
So, to argue that instrumental intelligence is sufficient for existential risk, we have to explain how an instrumental intelligence can navigate different frames.
This is where the other main line of argument comes into play:
I think ‘ability to navigate frames’ is distinct from ‘philosophical maturity’, roughly because of something like a distinction between soldier mindset and scout mindset.
You can imagine an entity that, whenever it reflects on their current political / moral / philosophical positions, using their path-finding ability like a lawyer to make the best possible case for why they should believe what they already believe, or to discard incoming arguments whose conclusions are unpalatable. There’s something like another orthogonality thesis at play here, where even if you’re a wizard at maneuvering through frames, it matters whether you’re playing chess or suicide chess.
This is just a thesis; it might be the case that it is impossible to be superintelligent and in soldier mindset (the ‘curiosity’ thesis?), but the orthogonality thesis is that it is possible, and so you could end up with value lock-in, where the very intelligent entity that is morally confused uses that intelligence to prop up the confusion rather than disperse it. Here we’re using instrumental intelligence as the ‘super’ intelligence in both the orthogonality and existential risk consideration. (You consider something like this case later, but I think in a way that fails to visualize this possibility.)
[In humans, intelligence and rationality are only weakly correlated, in a way that I think supports this view pretty strongly.]
Sticking a typo over here instead of the other tree:
This thought it sometimes called the
“thought is sometimes”
So, what would prevent a generally superintelligent agent from reflecting on their goals, or from developing an ethics? One might argue that intelligent agents, human or AI, are actually unable to reflect on goals. Or that intelligent agents are able to reflect on goals, but would not do so. Or that they would never revise goals upon reflection. Or that they would reflect on and revise goals but still not act on them. All of these suggestions run against the empirical fact that humans do sometimes reflect on goals, revise goals, and act accordingly.
I think this is not really empathizing with the AI system’s position. Consider a human who is lost in an unfamiliar region, trying to figure out where they are based on uncertain clues from the environment. “Is that the same mountain as before? Should I move towards it or away from it?” Now give that human a map and GPS routefinder; much of the cognitive work that seemed so essential to them before will seem pointless now that they have much better instrumentation.
An AI system with a programmed-in utility function has the map and GPS. The question of “what direction should I move in?” will be obvious, because every direction has a number associated with it, and higher numbers are better. There’s still uncertainty about how acting influences the future, and the AI will think long and hard about that to the extent that thinking long and hard about that increases expected utility.
The orthogonality thesis is thus much stronger than the denial of a (presumed) Kantian thesis that more intelligent beings would automatically be more ethical, or that an omniscient agent would maximise expected utility on anything, including selecting the best goals: It denies any relation between intelligence and the ability to reflect on goals.
I don’t think this is true, and have two different main lines of argument / intuition pumps. I’ll save the other for a later section where it fits better.
Are there several different reflectively stable moral equilibria, or only one? For example, it might be possible to have a consistent philosophically stable egoistic worldview, and also possible to have a consistent philosophically stable altruistic worldview. In this lens, the orthogonality thesis is the claim that there are at least two such stable equilibria and which equilibrium you end up in isn’t related to intelligence. [Some people might be egoists because they don’t realize that other people have inner lives, and increased intelligence unlocks their latent altruism, but some people might just not care about other people in a way that makes them egoists, and making them ‘smarter’ doesn’t have to touch that.]
For example, you might imagine an American nationalist and a Chinese nationalist, both remaining nationalistic as they become more intelligent, and never switching which nation they like more, because that choice was for historical reasons instead of logical ones. If you imagine that, no, at some intelligence threshold they have to discard their nationalism, then you need to make that case in opposition to the orthogonality thesis.
For some goals, I do think it’s the case that at some intelligence threshold you have to discard it, hence the ‘more or less’, and I think many more ‘goals’ are unstable, where the more you think about them, the more they dissolve and are replaced by one of the stable attractors. For example, you might imagine it’s the case that you can have reflectively stable nationalists who eat meat and universalists who are vegan, but any universalists who eat meat are not reflectively stable, where either they realize their arguments for eating meat imply nationalism or their arguments against nationalism imply not eating meat. [Or maybe the middle position is reflectively stable, idk.]
In this view, the existential risk argument is less “humans will be killed by robots and that’s sad” and more “our choice of superintelligence to build will decide what color the lightcone explosion is and some of those possibilities are as bad or worse than all humans dying, and differences between colors might be colossally important.” [For example, some philosophers today think that uploading human brains to silicon substrates will murder them / eliminate their moral value; it seems important for the system colonizing the galaxies to get that right! Some philosophers think that factory farming is immensely bad, and getting questions like that right before you hit copy-paste billions of times seems important.]
So, intelligent agents can have a wide variety of goals, and any goal is as good as any other.
The second half of this doesn’t seem right to me, or at least is a little unclear. [Things like instrumental convergence could be a value-agnostic way of sorting goals, and Bostrom’s ‘more or less’ qualifier is actually doing some useful work to rule out pathological goals.]
Lots of different comments on the details, which I’ll organize as comments to this comment.
(I forgot that newer comments are displayed higher, so until people start to vote this’ll be in reverse order to how the paper goes. Oops!)
Overall, I think your abstract and framing is pretty careful to narrow your attention to “is this argument logically sound?” instead of “should we be worried about AI?”, but still this bit jumps out to me:
the argument for the existential risk of AI turns out invalid.
Maybe insert “standard” in front of “argument” again?
Oxygen: Not much to say here. Your body needs oxygen. This doesn’t stop while we sleep. If possible open a window.
This is a pet peeve of mine, but: you’re not running out of oxygen as input. Instead exhaust products are building up in the room, of which the most well-known is carbon dioxide. (Outside air outside contains about 500x as much O2 as CO2, and in typical stuffy rooms the ratio is down to about 100x.) For some reason, we seem to be very sensitive to those exhaust products (tho it also seems like this might be a dimension that people vary on significantly).
I think I basically disagree with this, or think that it insufficiently steelmans the other groups.
For example, the homeless vs. the landlords; when I put on my systems thinking hat, it sure looks to me like there’s a cartel, wherein a group that produces a scarce commodity is colluding to keep that commodity scarce to keep the price high. The facts on the ground are more complicated—property owners are a different group from landlords, and homelessness is caused by more factors than just housing prices—but the basic analysis that there are different classes, those classes have different interests, and those classes are fighting over government regulation as a tool in their conflict seems basically right to me. Like, it’s really not a secret that many voters are motivated by keeping property values high, politicians know this is a factor that they will be judged on.
Maybe you’re trying to condemn a narrow mistake here, where someone being an ‘enemy’ implies that they are a ‘villain’, which I agree is a mistake. But it sounds like you’re making a more generic point, which is that when people have political disagreements with the rationalists, it’s normally because they’re thinking in terms of enemy action instead of not thinking in systems. But a lot of what the thinking in systems reveals is the way in which enemies act using systemic forces!
Interestingly, I think this is pretty obviously stated in The Wealth of Nations; Chapter 1 identifies division of labor as the cause of capital accumulation, Chapter 2 identifies trade as the cause of division of labor, and Chapter 3 identifies the size of the market as a limiter on specialization.
This is… actually sort of surprising that now I have two examples of economic concepts which are really better explained by Adam Smith than they are by modern textbooks (the other is supply and demand), and this makes me even more glad that I read The Wealth of Nations in high school before I had come across any modern textbooks.