Would you say that the system in my example is both trying to do what H wants it to do, and also trying to do something that H doesn’t want? Is it intent aligned period, or intent aligned at some points in time and not at others, or simultaneously intent aligned and not aligned, or something else?
The oracle is not aligned when asked questions that cause it to do malign optimization.
The human+oracle system is not aligned in situations where the human would pose such questions.
For a coherent system (e.g. a multiagent system which has converged to a Pareto efficient compromise), it make sense to talk about the one thing that it is trying to do.
For an incoherent system this abstraction may not make sense, and a system may be trying to do lots of things. I try to use benign when talking about possibly-incoherent systems, or things that don’t even resemble optimizers.
The definition in this post is a bit sloppy here, but I’m usually imagining that we are building roughly-coherent AI systems (and that if they are incoherent, some parts are malign). If you wanted to be a bit more careful with the definition, and want to admit vagueness in “what H wants it to do” (such that there can be several different preferences that are “what H wants”) we could say something like:
A is aligned with H if everything it is trying to do is “what H wants.”
That’s not great either though (and I think the original post is more at an appropriate level of attempted-precision).
I’m pretty sure Paul would give a different answer, if we ask him about “intent alignment”.
Yes, I’d say that to the extent that “trying to do X” is a useful concept, it applies to systems with lots of agents just as well as it applies to one agent.
Even a very theoretically simple system like AIXI doesn’t seem to be “trying” to do just one thing, in the sense that it can e.g. exert considerable optimization power at things other than reward, even in cases where the system seems to “know” that its actions won’t lead to reward.
You could say that AIXI is “optimizing” the right thing and just messing up when it suffers inner alignment failures, but I’m not convinced that this division is actually doing much useful work. I think it’s meaningful to say “defining what we want is useful,” but beyond that it doesn’t seem like a workable way to actually analyze the hard parts of alignment or divide up the problem.
(For example, I think we can likely get OK definitions of what we value, along the lines of A Formalization of Indirect Normativity, but I’ve mostly stopped working along these lines because it no longer seems directly useful.)
It seems more obvious that multiagent systems just falls outside of the definition-optimization framework, which seems to be a point in its favor as far as conceptual clarity is concerned.
Of course, it also seems quite likely that AIs of the kind that will probably be built (“by default”) also fall outside of the definition-optimization framework. So adopting this framework as a way to analyze potential aligned AIs seems to amount to narrowing the space considerably.
The area where I’d be most excited to see philosophical work is “when should we be sad if AI takes over, vs. being happy for it?” This seems like a natural ethical question that could have significant impacts on prioritization. Moreover, if the answer is “we should be fine with some kinds of AI taking over” then we can try to create that kind of AI as an alternative to creating aligned AI.
No, I think a simplicity prior clearly leads to daemons in the limit.
the number of queries to the model / specification required to obtain worst-case guarantees is orders of magnitude more than the number of queries needed to train the model, and this ratio gets worse the more complex your environment is
Not clear to me whether this is true in general—if the property you are specifying is in some sense “easy” to satisfy (e.g. it holds of the random model, holds for some model near any given model), and the behavior you are training is “hard” (e.g. requires almost all of the model’s capacity) then it seems possible that verification won’t add too much.
most verification techniques assume unlimited fast access to the specification
Making the specification faster than the model doesn’t really help you. In this case the specification is a somewhat more expensive than the model itself, but as far as I can tell that should just make verification somewhat more expensive.
The argument that we can only focus on the training data makes the assumption that the AI system is not going to generalize well outside of the training dataset.
I’m not intending to make this assumption. The claim is: parts of your model that exhibit intelligence need to do something on the training distribution, because “optimize to perform well on the training distribution” is the only mechanism that makes the model intelligent.
There is a large economics literature on principal agent problems, optimal contracting, etc.; these usually consider the situation where we can discover the ground truth or see the outcome of a decision (potentially only partially, or at some cost) and the question is how to best structure incentives in light of that. This typically holds for a profit-maximizing firm, at least to some extent, since they ultimately want to make money. I’m not aware of work in economics that addresses the situation where there is no external ground truth, except to prove negative results which justify the use of other assumptions. I don’t believe there’s much that would be useful to Ought, probably because it’s a huge mess and hard to fit into the field’s usual frameworks.
(I actually think even the core economics questions relevant to Ought, where you do have a ground truth and expensive monitoring, a pool of risk-averse expert some of whom are malicious, etc.; aren’t fully answered in the economics literature, and that these versions of the questions aren’t a major focus in economics despite being theoretically appealing from a certain perspective. But (i) I’m much less sure of that, and someone would need to have some discussion with relevant experts to find out, (ii) in that setting I do think economists have things to say even if they haven’t answered all of the relevant questions.)
In practice, I think institutions are basically always predicated on one of (i) having some trusted experts, or a principal with understanding of the area, (ii) having someone trusted who can at least understand the expert’s reasoning when adequately explained, (iii) being able to monitor outcomes to see what ultimately works well. I don’t really know of institutions that do well when none of (i)-(iii) apply. Those work OK in practice today but seem to break down quickly as you move to the setting with powerful AI (though even today I don’t think they work great and would hope that a better understanding could help, I just wouldn’t necessarily expect it to help as much as work that engages directly with existing institutions and their concrete failures).
This seems to ignore regularizers that people use to try to prevent overfitting and to make their models generalize better. Isn’t that liable to give you bad intuitions versus the actual training methods people use and especially the more advanced methods of generalization that people will presumably use in the future?
“The best model” is usually regularized. I don’t think this really changes the picture compared to imagining optimizing over some smaller space (e.g. space of models with regularize<x). In particular, I don’t think my intuitions are sensitive to the difference.
I don’t understand what you mean in this paragraph (especially “since each possible parameter setting is being evaluated on what other parameter settings say anyway”)
The normal procedure is: I gather data, and am using the model (and other ML models) while I’m gathering data. I search over parameters to find the ones that would make the best predictions on that data.
I’m not finding parameters that result in good predictive accuracy when used in the world. I’m generating some data, and then finding the parameters that make the best predictions about that data. That data was collected in a world where there are plenty of ML systems (including potentially a version of my oracle with different parameters).
Yes, the normal procedure converges to a fixed point. But why do we care / why is that bad?
I wonder if you could write a fuller explanation of your views here, and maybe include your response to Stuart’s reasons for changing his mind? (Or talk to him again and get him to write the post for you. :)
I take a perspective where I want to use ML techniques (or other AI algorithms) to do useful work, without introducing powerful optimization working at cross-purposes to humans. On that perspective I don’t think any of this is a problem (or if you look at it another way, it wouldn’t be a problem if you had a solution that had any chance at all of working).
I don’t think Stuart is thinking about it in this way, so it’s hard to engage at the object level, and I don’t really know what the alternative perspective is, so I also don’t know how to engage at the meta level.
Is there a particular claim where you think there is an interesting disagreement?
Couldn’t you simulate that with Opt by just running it repeatedly?
If I care about competitiveness, rerunning OPT for every new datapoint is pretty bad. (I don’t think this is very important in the current context, nothing depends on competitiveness.)
So this is an argument against the setup of the contest, right? Because the OP seems to be asking us to reason from incentives, and presumably will reward entries that do well under such analysis:
This is an objection to reasoning from incentives, but it’s stronger in the case of some kinds of reasoning from incentives (e.g. where incentives come apart from “what kind of policy would be selected under a plausible objective”). It’s hard for me to see how nested vs. sequential really matters here.
On a more object level, for reasoning from selection, what model class and training method would you suggest that we assume?
(I don’t think model class is going to matter much.)
I think training method should get pinned down more. My default would just be the usual thing people do: pick the model that has best predictive accuracy over the data so far, considering only data where there was an erasure.
(Though I don’t think you really need to focus on erasures, I think you can just consider all the data, since each possible parameter setting is being evaluated on what other parameter settings say anyway. I think this was discussed in one of Stuart’s posts about “forward-looking” vs. “backwards-looking” oracles?)
I think it’s also interesting to imagine internal RL (e.g. there are internal randomized cognitive actions, and we use REINFORCE to get gradient estimates—i.e. you try to increase the probability of cognitive actions taken in rounds where you got a lower loss than predicted, and decrease the probability of actions taken in rounds where you got a higher loss), which might make the setting a bit more like the one Stuart is imagining.
ETA: Is an instance of the idea to see if we can implement something like counterfactual oracles using your Opt? I actually did give that some thought and nothing obvious immediately jumped out at me. Do you think that’s a useful direction to think?
Seems like the counterfactually issue doesn’t come up in the Opt case, since you aren’t training the algorithm incrementally—you’d just collect a relevant dataset before you started training. I think the Opt setting throws away too much for analyzing this kind of situation, and would want to do an online learning version of OPT (e.g. you provide inputs and losses one at a time, and it gives you the answer of the mixture of models that would do best so far).
You may well be right about this, but I’m not sure what reason from selection means. Can you give an example or say what it implies about nested vs sequential queries?
What I want: “There is a model in the class that has property P. Training will find a model with property P.”
What I don’t want: “The best way to get a high reward is to have property P. Therefore a model that is trying to get a high reward will have property P.”
Example of what I don’t want: “Manipulative actions don’t help get a high reward (at least for the episodic reward function we intended), so the model won’t produce manipulative actions.”
See Marcus’s medium article for more details on how he’s been criticized
Skimming that post it seems like he mentions two other incidents (beyond the thread you mention).
Gary Marcus: @Ylecun Now that you have joined the symbol-manipulating club, I challenge you to read my arxiv article Deep Learning: Critical Appraisal carefully and tell me what I actually say there that you disagree with. It might be a lot less than you think.
Yann LeCun: Now that you have joined the gradient-based (deep) learning camp, I challenge you to stop making a career of criticizing it without proposing practical alternatives.
Yann LeCun: Obviously, the ability to criticize is not contingent on proposing alternatives. However, the ability to get credit for a solution to a problem is contingent on proposing a solution to the problem.
Gary Marcus: Folks, let’s stop pretending that the problem of object recognition is solved. Deep learning is part of the solution, but we are obviously still missing something important. Terrific new examples of how much is still be solved here: #AIisHarderThanYouThink
Critic: Nobody is pretending it is solved. However, some people are claiming that people are pretending it is solved. Name me one researcher who is pretending?
Gary Marcus: Go back to Lecun, Bengio and Hinton’s 9 page Nature paper in 2015 and show me one hint there that this kind of error was possible. Or recall initial dismissive reaction to https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Nguyen_Deep_Neural_Networks_2015_CVPR_paper.pdf …
Yann LeCun: Yeah, obviously we “pretend” that image recognition is solved, which is why we have a huge team at Facebook “pretending” to work on image recognition. Also why 6500 people “pretended” to attend CVPR 2018.
The most relevant quote from the Nature paper he is criticizing (he’s right that it doesn’t discuss methods working poorly off distribution):
Unsupervised learning had a catalytic effect in reviving interest in deep learning, but has since been overshadowed by the successes of purely supervised learning. Although we have not focused on it in this Review, we expect unsupervised learning to become far more important in the longer term. Human and animal learning is largely unsupervised: we discover the structure of the world by observing it, not by being told the name of every object.
Human vision is an active process that sequentially samples the optic array in an intelligent, task-specific way using a small, high-resolution fovea with a large, low-resolution surround. We expect much of the future progress in vision to come from systems that are trained end-toend and combine ConvNets with RNNs that use reinforcement learning to decide where to look. Systems combining deep learning and reinforcement learning are in their infancy, but they already outperform passive vision systems at classification tasks and produce impressive results in learning to play many different video games.
Natural language understanding is another area in which deep learning is poised to make a large impact over the next few years. We expect systems that use RNNs to understand sentences or whole documents will become much better when they learn strategies for selectively attending to one part at a time.
Ultimately, major progress in artificial intelligence will come about through systems that combine representation learning with complex reasoning. Although deep learning and simple reasoning have been used for speech and handwriting recognition for a long time, new paradigms are needed to replace rule-based manipulation of symbolic expressions by operations on large vectors
Housing markets move because they depend on the expectation of future rents. If I want to expose myself to future rents, I have to take on volatility in the expectation of future rents, that’s how the game goes.
in part because I don’t have much to say on this issue that Gary Marcus hasn’t already said.
It would be interesting to know which particular arguments made by Gary Marcus you agree with, and how you think they relate to arguments about timelines.
In this preliminary doc, it seems like most the disagreement is driven by saying there is a 99% probability that training a human-level AI would take more than 10,000x more lifetimes than AlphaZero took games of go (while I’d be at more like 50%, and have maybe 5-10% chance that it will take many fewer lifetimes). Section 2.0.2 admits this is mostly guesswork, but ends up very confident the number isn’t small. It’s not clear where that particular number comes from, the only evidence gestured at is “the input is a lot bigger, so it will take a lot more lifetimes” which doesn’t seem to agree with our experience so far or have much conceptual justification. (I guess the point is that the space of functions is much bigger? but if comparing the size of the space of functions, why not directly count parameters?) And why is this a lower bound?
Overall this seems like a place you disagree confidently with many people who entertain shorter timelines, and it seems unrelated to anything Gary Marcus says.
I agree with:
Most people trying to figure out what’s true should be mostly trying to develop views on the basis of public information and not giving too much weight to supposed secret information.
It’s good to react skeptically to someone claiming “we have secret information implying that what we are doing is super important.”
Understanding the sociopolitical situation seems like a worthwhile step in informing views about AI.
It would be wild if 73% of tech executives thought AGI would be developed in the next 10 years. (And independent of the truth of that claim, people do have a lot of wild views about automation.)
I disagree with:
Norms of discourse in the broader community are significantly biased towards short timelines. The actual evidence in this post seems thin and cherry-picked. I think the best evidence is the a priori argument “you’d expect to be biased towards short timelines given that it makes our work seem more important.” I think that’s good as far as it goes but the conclusion is overstated here.
“Whistleblowers” about long timelines are ostracized or discredited. Again, the evidence in your post seems thin and cherry-picked, and your contemporary example seems wrong to me (I commented separately). It seems like most people complaining about deep learning or short timelines have a good time in the AI community, and people with the “AGI in 20 years” view are regarded much more poorly within academia and most parts of industry. This could be about different fora and communities being in different equilibria, but I’m not really sure how that’s compatible with “ostracizing.” (It feels like you are probably mistaken about the tenor of discussions in the AI community.)
That 73% of tech executives thought AGI would be developed in the next 10 years. Willing to bet against the quoted survey: the white paper is thin on details and leaves lots of wiggle room for chicanery, while the project seems thoroughly optimized to make AI seem like a big deal soon. The claim also just doesn’t seem to match my experience with anyone who might be called tech executives (though I don’t know how they constructed the group).
For reference, the Gary Marcus tweet in question is:
“I’m not saying I want to forget deep learning… But we need to be able to extend it to do things like reasoning, learning causality, and exploring the world .”—Yoshua Bengio, not unlike what I have been saying since 2012 in The New Yorker.
I think Zack Lipton objected to this tweet because it appears to be trying to claim priority. (You might have thought it’s ambiguous whether he’s claiming priority, but he clarifies in the thread: “But I did say this stuff first, in 2001, 2012 etc?”) The tweet and his writings more generally imply that people in the field have recently changed their view to agree with him, but many people in the field object strongly to this characterization.
The tweet is mostly just saying “I told you so.” That seems like a fine time for people to criticize him about making a land grab rather than engaging on the object level, since the tweet doesn’t have much object-level content. For example:
“Saying it louder ≠ saying it first. You can’t claim credit for differentiating between reasoning and pattern recognition.” [...] is essentially a claim that everybody knows that deep learning can’t do reasoning. But, this is essentially admitting that Marcus is correct, while still criticizing him for saying it.
Hopefully Zack’s argument makes more sense if you view it as a response to Gary Marcus claiming priority. Which is what Gary Marcus was doing and clearly what Zack is responding to. This is not a substitute for engagement on the object level. Saying “someone else, and in fact many people in the relevant scientific field, already understood this point” is an excellent response to someone who’s trying to claim credit for the point.
There are reasonable points to make about social epistemology here, but I think you’re overclaiming about the treatment of critics, and that this thread in particular is a bad example to point to. It also seems like you may be mistaken about some of the context. (Zack Lipton has no love for short-timelines-pushers and isn’t shy about it. He’s annoyed at Gary Marcus for making bad arguments and claiming unwarranted credit, which really is independent of whether some related claims are true.)
I’m not really making a claim about momentum, I’m just skeptical of your basic analysis.
Real 30-year interest rates are ~1%, taxes are ~1%, and I think maintenance averages ~1%. So that’s ~3%/year total cost, which seems comparable to rent in areas like SF.
On top of that I think historical appreciation is around 1% (we should expect it to be somewhere between “no growth” and “land stays a constant fraction of GDP”). So that looks like buying should ballpark 10-30% cheaper if you ignore all the transaction costs, presumably because rent prices are factoring in a bunch of frictions. That sounds plausible enough to me, but in reality I expect this is a complicated mess that you can’t easily sort out in a short blog post and varies from area to area.
If you want to argue for “buying is usually a terrible idea, investors are idiots or speculators” I think you should be getting into the actual numbers.
I’m claiming that with covariance data such a thing could be constructed.
I’ll bet against.
I meant that you can get a better deal. You can get something that is only marginally more correlated but much much cheaper in the market.
What’s the alternative?