FAI is a sidetrack, if we don’t have any path to FNI (friendly natural intelligence).
I don’t think I understand the reasoning behind this, though I don’t strongly disagree. Certainly it would be great to solve the “human alignment problem”. But what’s your claim?
If a bunch of fully self-interested people are about to be wiped out by an avoidable disaster (or even actively malicious people, who would like to hurt each other a little bit, but value self-preservation more), they’re still better off pooling their resources together to avert disaster.
You might have a prisoner’s dilemma / tragedy of the commons—it’s still even better if you can get everyone else to pool resources to avert disaster, while stepping aside yourself. BUT:
that’s more a coordination problem again, rather than an everyone-is-too-selfish problem
that’s not really the situation with AI, because what you have is more a situation where you can either work really hard to build AGI or work even harder to build safe AGI; it’s not a tragedy of the commons, it’s more like lemmings running off a cliff!
One point of confusion in trying to generalize bad behavior (bad equilibrium is an explanation or cause, bad behavior is the actual problem) is that incentives aren’t exogenous—they’re created and perpetuated by actors, just like the behaviors we’re trying to change. One actor’s incentives are another actor’s behaviors.
Yeah, the incentives will often be crafted perversely, which likely means that you can expect even more opposition to clear discussion, because there are powerful forces trying to coordinate on the wrong consensus about matters of fact in order to maintain plausible deniability about what they’re doing.
In the example being discussed here, it just seems like a lot of people coordinating on the easier route, partly due to momentum of older practices, partly because certain established people/institutions are somewhat threatened by the better practices.
I find it very difficult to agree to any generality without identifying some representative specifics. It feels way too much like I’m being asked to sign up for something without being told what. Relatedly, if there are zero specifics that you think fit the generalization well enough to be good examples, it seems very likely that the generalization itself is flawed.
My feeling is that small examples of the dynamic I’m pointing at come up fairly often, but things pretty reliably go poorly if I point them out, which has resulted in an aversion to pointing such things out.
The conversation has so much gravity toward blame and self-defense that it just can’t go anywhere else.
I’m not going to claim that this is a great post for communicating/educating/fixing anything. It’s a weird post.
I see what you mean, but there’s a tendency to think of ‘homo economicus’ as having perfectly selfish, non-altruistic values.
Also, quite aside from standard economics, I tend to think of economic decisions as maximizing profit. Technically, the rational agent model in economics allows arbitrary objectives. But, what kinds of market behavior should you really expect?
When analyzing celebrities, it makes sense to assume rationality with a fame-maximizing utility function, because the people who manage to become and remain celebrities will, one way or another, be acting like fame-maximizers. There’s a huge selection effect. So Homo Hollywoodicus can probably be modeled well with a fame-maximizing assumption.
This has nothing to do with the psychology of stardom. People may have all kinds of motives for what they do—whether they’re seeking stardom consciously or just happen to engage in behavior which makes them a star.
Similarly, when modeling politics, it is reasonable to make a Homo Politicus assumption that people seek to gain and maintain power. The politicians whose behavior isn’t in line with this assumption will never break into politics, or at best will be short-lived successes. This has nothing to do with the psychology of the politicians.
And again, evolutionary game theory treats reproductive success as utility, despite the many other goals which animals might have.
So, when analyzing market behavior, it makes some sense to treat money as the utility function. Those who aren’t going for money will have much less influence on the behavior of the market overall. Profit motives aren’t everything, but other motives will be less important that profit motives in market analysis.
My current understanding of quantilization is “choose randomly from the top X% of actions”. I don’t see how this helps very much with staying on-distribution… as you say, the off-distribution space is larger, so the majority of actions in the top X% of actions could still be off-distribution.
The base distribution you take the top X% of is supposed to be related to the “on-distribution” distribution, such that sampling from the base distribution is very likely to keep things on-distribution, at least if the quantilizer’s own actions are the main potential source of distributional shift. This could be the case if the quantilizer is the only powerful AGI in existence, and the actions of a powerful AGI are the only thing which would push things into sufficiently “off-distribution” possibilities for there to be a concern. (I’m not saying these are entirely reasonable assumptions; I’m just saying that this is one way of thinking about quantilization.)
In any case, quantilization seems like it shouldn’t work due to the fragility of value thesis. If we were to order all of the possible configurations of Earth’s atoms from best to worst according to our values, the top 1% of those configurations is still mostly configurations which aren’t very valuable.
The base distribution quantilization samples from is about actions, or plans, or policies, or things like that—not about configurations of atoms.
So, you should imagine a robot sending random motor commands to its actuators, not highly intelligently steering the planet into a random configuration.
Thinking up actual historical examples is hard for me. The following is mostly true, partly made up.
(#4) I don’t necessarily have trouble talking about my emotions, but when there are any clear incentives for me to make particular claims, I tend to shut down. It feels viscerally dishonest (at least sometimes) to say things, particularly positive things, which I have an incentive to say. For example, responding “it’s good to see you too” in response to “it’s good to see you” sometimes (not always) feels dishonest even when true.
(#4) Talking about money with an employer feels very difficult, in a way that’s related to intuitively discarding any motivated arguments and expecting others to do the same.
(#6) I’m not sure if I was at the party, but I am generally in the crowd Grognor was talking about, and very likely engaged in similar behavior to what he describes.
(#5) I have tripped up when trying to explain something because I noticed myself reaching for examples to prove my point, and the “cherry-picking” alarm went off.
(#5, #4) I have noticed that a friend was selecting arguments that I should go to the movies with him in a biased way which ignored arguments to the contrary, and ‘shut down’ in the conversation (become noncommittal / slightly unresponsive).
(#3) I have thought in mistaken ways which would have accepted modest-epistemology arguments, when thinking about decision theory.
By “is a PD”, I mean, there is a cooperative solution which is better than any Nash equilibrium. In some sense, the self-interest of the players is what prevents them from getting to the better solution.
By “is a SH”, I mean, there is at least one good cooperative solution which is an equilibrium, but there are also other equilibria which are significantly worse. Some of the worse outcomes can be forced by unilateral action, but the better outcomes require coordinated action (and attempted-but-failed coordination is even worse than the bad solutions).
In iterated PD (with the right assumptions, eg appropriately high probabilities of the game continuing after each round), tit-for-tat is an equilibrium strategy which results in a pure-cooperation outcome. The remaining difficulty of the game is the difficulty of ending up in that equilibrium. There are many other equilibria which one could equally well end up in, including total mutual defection. In that sense, iteration can turn a PD into a SH.
Other modifications, such as commitment mechanisms or access to the other player’s source code, can have similar effects.
I view the issue of intellectual modesty much like the issue of anthropics. The only people who matter are those whose decisions are subjunctively linked to yours (it only starts getting complicated when you start asking whether you should be intellectually modest about your reasoning about intellectual modesty)
I agree fairly strongly, but this seems far from the final word on the subject, to me.
One issue with the clever arguer is that the persuasiveness of their arguments might have very little to do with how persuasive they should be, so attempting to work off expectations might fail.
Ah. I take you to be saying that the quality of the clever arguer’s argument can be high variance, since there is a good deal of chance in the quality of evidence cherry-picking is able to find. A good point. But, is it ‘too high’? Do we want to do something (beyond the strategy I sketched in the post) to reduce variance?
That seems about right.
A concern I didn’t mention in the post—it isn’t obvious how to respond to game-theoretic concerns. Carefully estimating the size of the update you should make when someone fails to provide good reason can be difficult, since you have to model other agents, and you might make exploitable errors.
An extreme way of addressing this is to ignore all evidence short of mathematical proof if you have any non-negligible suspicion about manipulation, similar to the mistake I describe myself making in the post. This seems too extreme, but it isn’t clear what the right thing to do overall is. The fully-Bayesian approach to estimating the amount of evidence should act similarly to a good game-theoretic solution, I think, but there might be reason to use a simpler strategy with less chance of exploitable patterns.
Thank you! Appreciative comments really help me to be less risk-averse about posting.
I like your suggestion well enough that I might edit the post. (I’ll let it sit a bit to see whether I change my mind.)
Maybe serial-access vs random-access, as in computer memory.
Yeah.… I thought about this problem while writing, but didn’t think of an alternative I liked.
I’m curious what your guess was.
We should really be calling it Rabbit Hunt rather than Stag Hunt.
The schelling choice is rabbit. Calling it stag hunt makes the stag sound schelling.
The problem with stag hunt is that the schelling choice is rabbit. Saying of a situation “it’s a stag hunt” generally means that the situation sucks because everyone is hunting rabbit. When everyone is hunting stag, you don’t really bring it up. So, it would make way more sense if the phrase was “it’s a rabbit hunt”!
Well, maybe you’d say “it’s a rabbit hunt” when referring to the bad equilibrium you’re seeing in practice, and “it’s a stag hunt” when saying that a better equilibrium is a utopian dream.
So, yeah, calling the game “rabbit hunt” is a stag hunt.
I used to think a lot in terms of Prisoner’s Dilemma, and “Cooperate”/”Defect.” I’d see problems that could easily be solved if everyone just put a bit of effort in, which would benefit everyone. And people didn’t put the effort in, and this felt like a frustrating, obvious coordination failure. Why do people defect so much?
Eventually Duncan shifted towards using Stag Hunt rather than Prisoner’s Dilemma as the model here. If you haven’t read it before, it’s worth reading the description in full. If you’re familiar you can skip to my current thoughts below.
In the book The Stag Hunt, Skyrms similarly says that lots of people use Prisoner’s Dilemma to talk about social coordination, and he thinks people should often use Stag Hunt instead.
I think this is right. Most problems which initially seem like Prisoner’s Dilemma are actually Stag Hunt, because there are potential enforcement mechanisms available. The problems discussed in Meditations on Moloch are mostly Stag Hunt problems, not Prisoner’s Dilemma problems -- Scott even talks about enforcement, when he describes the dystopia where everyone has to kill anyone who doesn’t enforce the terrible social norms (including the norm of enforcing).
This might initially sound like good news. Defection in Prisoner’s Dilemma is an inevitable conclusion under common decision-theoretic assumptions. Trying to escape multipolar traps with exotic decision theories might seem hopeless. On the other hand, rabbit in Stag Hunt is not an inevitable conclusion, by any means.
Unfortunately, in reality, hunting stag is actually quite difficult. (“The schelling choice is Rabbit, not Stag… and that really sucks!”)
Rabbit in this case was “everyone just sort of pursues whatever conversational types seem best to them in an uncoordinated fashion”, and Stag is “we deliberately choose and enforce particular conversational norms.”
This sounds a lot like Pavlov-style coordination vs Tit for Tat style coordination. Both strategies can defeat Moloch in theory, but they have different pros and cons. TfT-style requires agreement on norms, whereas Pavlov-style doesn’t. Pavlov-style can waste a lot of time flailing around before eventually coordinating. Pavlov is somewhat worse at punishing exploitative behavior, but less likely to lose a lot of utility due to feuds between parties who each think they’ve been wronged and must distribute justice.
When discussing whether to embark on a stag hunt, it’s useful to have shorthand to communicate why you might ever want to put a lot of effort into a concerted, coordinated effort. And then you can discuss the tradeoffs seriously.
Much of the time, I feel like getting angry and frustrated… is something like “wasted motion” or “the wrong step in the dance.”
Not really strongly contradicting you, but I remember Critch once outlined something like the following steps for getting out of bad equilibria. (This is almost definitely not the exact list of steps he gave; I think there were 3 instead of 4 -- but step #1 was definitely in there.)
1. Be the sort of person who can get frustrated at inefficiencies.
2. Observe the world a bunch. Get really curious about the ins and outs of the frustrating inefficiencies you notice; understand how the system works, and why the inefficiencies exist.
3. Make a detailed plan for a better equilibrium. Justify why it is better, and why it is worth the effort/resources to do this. Spend time talking to the interested parties to get feedback on this plan.
4. Finally, formally propose the plan for approval. This could mean submitting a grant proposal to a relevant funding organization, or putting something up for a vote, or other things. This is the step where you are really trying to step into the better equilibrium, which means getting credible backing for taking the step (perhaps a letter signed by a bunch of people, or a formal vote), and creating common knowledge between relevant parties (making sure everyone can trust that the new equilibrium is established). It can also mean some kind of official deliberation has to happen, depending on context (such as a vote, or some kind of due-diligence investigation, or an external audit, etc).
I guess what I think isn’t that the mainstream isn’t explicitly confused about the distinction (ie, doesn’t make confused claims), but that it isn’t clearly made/taught, which leaves some individuals confused.
I think this has a little to do with the (also often implicit) distinction between research and application (ie, research vs engineering). In the context of pure research, it might make a lot of sense to take shortcuts with toy models which you could not take in the intended application of the algorithms, because you are investigating a particular phenomenon and the shortcuts don’t interfere with that investigation. However, these shortcuts can apparently change the type of the problem, and other people can become confused about what problem type you are really trying to solve.
To be a bit more concrete, you might test an AI on a toy model, and directly feed the AI some information about the toy model (as a shortcut). You can do this because the toy model is a simulation you built, so, you have direct access to it. Your intention in the research might be that such direct-fed information would be replaced with learning one day. (To you, your AI is “controller” type.) Others may misinterpret your algorithm as a search technique which takes an explicit model of a situation (they see it as “selection” type).
This could result in other people writing papers which contrast your technique with other “selection”-type techniques. Your algorithm might compare poorly because you made some decisions motivated by eventual control-type applications. This becomes hard to point out because the selection/control distinction is a bit tricky.
As far as I can see, no one there thinks search and planning are the same task.
I’m not sure what you mean about search vs planning. My guess is that search=selection and planning=control. While I do use “search” and “selection” somewhat interchangeably, I don’t want to use “planning” and “control” interchangeably; “planning” suggests a search-type operation applied to solve a control problem (the selection-process-within-a-control-process idea).
Also, it seems to me that tons of people would say that planning is a search problem, and AI textbooks tend to reflect this.
With regard to search algorithms being controllers: Here’s a discussion I had with ErickBall where they argue that planning will ultimately prove useful for search and I argue it won’t.
In the discussion, you say:
Optimization algorithms used in deep learning are typically pretty simple. Gradient descent is taught in sophomore calculus. Variants on gradient descent are typically used, but all the ones I know of are well under a page of code in complexity.
Gradient descent is extremely common these days, but much less so when I was first learning AI (just over ten years ago). To a large extent, it has turned out that “dumber” methods are easier to scale up.
However, much more sophisticated search techniques (with explicit consequentialist reasoning in the inner loop) are still discussed occasionally, especially for cases where evaluating a point is more costly. “Bayesian Optimization” is the subfield in which this is studied (that I know of). Here’s an example:
Gaussian Processes for Global Optimization (the search is framed as a sequential decision problem!)
Later, you ask:
How do you reckon long-term planning will be useful for architecture search? It’s not a stateful system.
The answer (in terms of Bayesian Optimization) is that planning ahead is still helpful in the same way that planning a sequence of experiments can be helpful. You are exploring the space in order to find the best solution. At every point, you are asking “what question should I ask next, to maximize the amount of information I’ll uncover in the long run?”. This does not reduce to “what question should I ask next, in order to maximize the amount of information I have right now?”—but, most optimization algorithms don’t even go that far. Most optimization algorithms don’t explicitly reason about value-of-information at all, instead doing reasoning which is mainly designed to steer toward the best points it knows how to steer to immediately, with some randomness added in to get some exploration.
Yet, this kind of reasoning is not usually worth it, or so it seems based on the present research landscape. The overhead of planning-how-to-search is too costly; it doesn’t save time overall.
I agree with most of what you say here, but I think you’re over-emphasizing the idea that search deals with unknowns whereas control deals with knows. Optimization via search works best when you have a good model of the situation. The extreme case for usefulness of search is a game like Chess, where the rules are perfectly known, there’s no randomness, and no hidden information. If you don’t know a lot about a situation, you can’t build an optimal controller, but you also can’t set up a very good representation of the problem to solve via search.
This is backwards, actually. “Control” isn’t the crummy option you have to resort to when you can’t afford to search. Searching is what you have to resort to when you can’t do control theory.
Why not both? Most of your post is describing situations where you can’t easily solve a control problem with a direct rule, so you spin up a search based on a model of the situation. My paragraph which you quoted was describing a situation where dumb search becomes harder and harder, so you spin up a controller (inside the search process) to help out. Both of these things happen.
If there’s 50% on a paperclips-maximizing utility function and 50% on staples, there’s not really any optimization pressure put toward satisfying both.
As you say, there’s no reason to make 50% of the universe into paperclips; that’s just not what 50% probability on paperclips means.
It could be that there’s a sorta-paperclip-sorta-staple (let’s say ‘stapleclip’ for short), which the AGI will be motivated to find in order to get a moderately high rating according to both strategies.
However, it could be that trying to be both paperclip and staple at the same time reduces the overall efficiency. Maybe the most efficient nanometer-scale stapleclip is significantly larger than the most efficient paperclip or staple, as a result of having to represent the critical features of both paperclips and staples. In this case, the AGI will prefer to gamble, tiling the universe with whatever is most efficient, and giving no consideration at all to the other hypothesis.
That’s the essence of my concern: uncertainty between possibilities does not particularly push toward jointly maximizing the possibilities. At least, not without further assumptions.
See this comment. Stuart and I are discussing what happens after things have converged as much as they’re going to, but there’s still uncertainty left.
Concerning #3: yeah, I’m currently thinking that you need to make some more assumptions. But, I’m not sure I want to make assumptions about resources. I think there may be useful assumptions related to the way the hypotheses are learned—IE, we expect hypotheses with nontrivial weight to have a lot of agreement because they are candidate generalizations of the same data, which makes it somewhat hard to entirely dissatisfy some while satisfying others. This doesn’t seem quite helpful enough, but, perhaps something in that direction.
In any case, I agree that it seems interesting to explore assumptions about the mutual satisfiability of different value functions.
Why would a system of more-than-moderate intelligence find such incorrect hypotheses to be the most plausible ones? There would have to be some reason why all the hypotheses which strongly disliked this corner case were ruled out.
That’s not the case I’m considering. I’m imagining there are hypotheses which strongly dislike the corner cases. They just happen to be out-voted.
Think of it like this. There are a bunch of hypotheses. All of them agree fairly closely with high probability on plans which are “on-distribution”, ie, similar to what it has been able to get feedback from humans about (however it does that). The variation is much higher for “off-distribution” plans.
There will be some on-distribution plans which achieve somewhat-high values for all hypotheses which have significant probability. However, the AI will look for ways to achieve even higher expected utility if possible. Unless there are on-distribution plans which max out utility, it may look off-distribution. This seems plausible because the space of on-distribution plans is “smaller”; there’s room for a lot to happen in the off-distribution space. That’s why it reaches weird corner cases.
And, since the variation is higher in off-distribution space, there may be some options that really look quite good, but which achieve very low value under some of the plausible hypotheses. In fact, because the different remaining hypotheses are different, it seems quite plausible that highly optimized plans have to start making trade-offs which compromise one value for another. (I admit it is possible the search finds a way to just make everything better according to every hypothesis. But that is not what the search is told to do, not exactly. We can design systems which do something more like that, instead, if that is what we want.)
When I put it that way, another problem with going off-distribution is apparent: even if we do find a way to get better scores according to every plausible hypothesis by going off-distribution, we trust those scores less because they’re off-distribution. Of course, we could explicitly try to build a system with the goal of remaining on-distribution. Quantilization follows fairly directly from that :)