I’ll try respond properly later this week, but I like the point that embedded agency is about boundedness. Nevertheless, I think we probably disagree about how promising it is “to start with idealized rationality and try to drag it down to Earth rather than the other way around”. If the starting point is incoherent, then this approach doesn’t seem like it’ll go far—if AIXI isn’t useful to study, then probably AIXItl isn’t either (although take this particular example with a grain of salt, since I know almost nothing about AIXItl).
I appreciate that this isn’t an argument that I’ve made in a thorough or compelling way yet—I’m working on a post which does so.
Yeah, I should have been much more careful before throwing around words like “real”. See the long comment I just posted for more clarification, and in particular this paragraph:
I’m not trying to argue that concepts which we can’t formalise “aren’t real”, but rather that some concepts become incoherent when extrapolated a long way, and this tends to occur primarily for concepts which we can’t formalise, and that it’s those incoherent extrapolations which “aren’t real” (I agree that this was quite unclear in the original post).
I like this review and think it was very helpful in understanding your (Abram’s) perspective, as well as highlighting some flaws in the original post, and ways that I’d been unclear in communicating my intuitions. In the rest of my comment I’ll try write a synthesis of my intentions for the original post with your comments; I’d be interested in the extent to which you agree or disagree.
We can distinguish between two ways to understand a concept X. For lack of better terminology, I’ll call them “understanding how X functions” and “understanding the nature of X”. I conflated these in the original post in a confusing way.
For example, I’d say that studying how fitness functions would involve looking into the ways in which different components are important for the fitness of existing organisms (e.g. internal organs; circulatory systems; etc). Sometimes you can generalise that knowledge to organisms that don’t yet exist, or even prove things about those components (e.g. there’s probably useful maths connecting graph theory with optimal nerve wiring), but it’s still very grounded in concrete examples. If we thought that we should study how intelligence functions in a similar way as we study how fitness functions, that might look like a combination of cognitive science and machine learning.
By comparison, understanding the nature of X involves performing a conceptual reduction on X by coming up with a theory which is capable of describing X in a more precise or complete way. The pre-theoretic concept of fitness (if it even existed) might have been something like “the number and quality of an organism’s offspring”. Whereas the evolutionary notion of fitness is much more specific, and uses maths to link fitness with other concepts like allele frequency.
Momentum isn’t really a good example to illustrate this distinction, so perhaps we could use another concept from physics, like electricity. We can understand how electricity functions in a lawlike way by understanding the relationship between voltage, resistance and current in a circuit, and so on, even when we don’t know what electricity is. If we thought that we should study how intelligence functions in a similar way as the discoverers of electricity studied how it functions, that might involve doing theoretical RL research. But we also want to understand the nature of electricity (which turns out to be the flow of electrons). Using that knowledge, we can extend our theory of how electricity functions to cases which seem puzzling when we think in terms of voltage, current and resistance in circuits (even if we spend almost all our time still thinking in those terms in practice). This illustrates a more general point: you can understand a lot about how something functions without having a reductionist account of its nature—but not everything. And so in the long term, to understand really well how something functions, you need to understand its nature. (Perhaps understanding how CS algorithms work in practice, versus understanding the conceptual reduction of algorithms to Turing Machines, is another useful example).
I had previously thought that MIRI was trying to understand how intelligence functions. What I take from your review is that MIRI is first trying to understand the nature of intelligence. From this perspective, your earlier objection makes much more sense.
However, I still think that there are different ways you might go about understanding the nature of intelligence, and that “something kind of like rationality realism” might be a crux here (as you mention). One way that you might try to understand the nature of intelligence is by doing mathematical analysis of what happens in the limit of increasing intelligence. I interpret work on AIXI, logical inductors, and decision theory as falling into this category. This type of work feels analogous to some of Einstein’s thought experiments about the limit of increasing speed. Would it have worked for discovering evolution? That is, would starting with a pre-theoretic concept of fitness and doing mathematical analysis of its limiting cases (e.g. by thinking about organisms that lived for arbitrarily long, or had arbitrarily large numbers of children) have helped people come up with evolution? I’m not sure. There’s an argument that Malthus did something like this, by looking at long-term population dynamics. But you could also argue that the key insights leading up to the discovery evolution were primarily inspired by specific observations about the organisms around us. And in fact, even knowing evolutionary theory, I don’t think that the extreme cases of fitness even make sense. So I would say that I am not a realist about “perfect fitness”, even though the concept of fitness itself seems fine.
So an attempted rephrasing of the point I was originally trying to make, given this new terminology, is something like “if we succeed in finding a theory that tells us the nature of intelligence, it still won’t make much sense in the limit, which is the place where MIRI seems to be primarily studying it (with some exceptions, e.g. your Partial Agency sequence). Instead, the best way to get that theory is to study how intelligence functions.”
The reason I called it “rationality realism” not “intelligence realism” is that rationality has connotations of this limit or ideal existing, whereas intelligence doesn’t. You might say that X is very intelligent, and Y is more intelligent than X, without agreeing that perfect intelligence exists. Whereas when we talk about rationality, there’s usually an assumption that “perfect rationality” exists. I’m not trying to argue that concepts which we can’t formalise “aren’t real”, but rather that some concepts become incoherent when extrapolated a long way, and this tends to occur primarily for concepts which we can’t formalise, and that it’s those incoherent extrapolations like “perfect fitness” which “aren’t real” (I agree that this was quite unclear in the original post).
My proposed redefinition:
The “intelligence is intelligible” hypothesis is about how lawlike the best description of how intelligence functions will turn out to be.
The “realism about rationality” hypothesis is about how well-defined intelligence is in the limit (where I think of the limit of intelligence as “perfect rationality”, and “well-defined” with respect not to our current understanding, but rather with respect to the best understanding of the nature of intelligence we’ll ever discover).
Cool, thanks for those clarifications :) In case it didn’t come through from the previous comments, I wanted to make clear that this seems like exciting work and I’m looking forward to hearing how follow-ups go.
Yes, but the fact that the fragile worlds are much more likely to end in the future is a reason to condition your efforts on being in a robust world.
While I do buy Paul’s argument, I think it’d be very helpful if the various summaries of the interviews with him were edited to make it clear that he’s talking about value-conditioned probabilities rather than unconditional probabilities—since the claim as originally stated feels misleading. (Even if some decision theories only use the former, most people think in terms of the latter).
Some abstractions are heavily determined by the territory. The concept of trees is pretty heavily determined by the territory. Whereas the concept of betrayal is determined by the way that human minds function, which is determined by other people’s abstractions. So while it seems reasonably likely to me that an AI “naturally thinks” in terms of the same low-level abstractions as humans, it thinking in terms of human high-level abstractions seems much less likely, absent some type of safety intervention. Which is particularly important because most of the key human values are very high-level abstractions.
I have four concerns even given that you’re using a proper scoring rule, which relate to the link between that scoring rule and actually giving people money. I’m not particularly well-informed on this though, so could be totally wrong.
1. To implement some proper scoring rules, you need the ability to confiscate money from people who predict badly. Even when the score always has the same sign, like you have with log-scoring (or when you add a constant to a quadratic scoring system), if you don’t confiscate money for bad predictions, then you’re basically just giving money to people for signing up, which makes having an open platform tricky.
2. Even if you restrict signups, you get an analogous problem within a fixed population who’s already signed up: the incentives will be skewed when it comes to choosing which questions to answer. In particular, if people expect to get positive amounts of money for answering randomly, they’ll do so even when they have no relevant information, adding a lot of noise.
3. If a scoring rule is “very capped”, as the log-scoring function is, then the expected reward from answering randomly may be very close to the expected reward from putting in a lot of effort, and so people would be incentivised to answer randomly and spend their time on other things.
4. Relatedly, people’s utilities aren’t linear in money, so the score function might not remain a proper one taking that into account. But I don’t think this would be a big effect on the scales this is likely to operate on.
Apologies for the mischaracterisation. I’ve changed this to refer to Scott Alexander’s post which predicts this pressure.
Actually, the key difference between this and prediction markets is that this has no downside risk, it seems? If you can’t lose money for bad predictions. So you could exploit it by only making extreme predictions, which would make a lot of money sometimes, without losing money in the other cases. Or by making fake accounts to drag the average down.
Another point: prediction markets allow you to bet more if you’re more confident the market is off. This doesn’t, except by betting that the market is further off. Which is different. But idk if that matters very much, you could probably recreate that dynamic by letting people weight their own predictions.
Okay, so in quite a few cases the forecasters spent more time on a question than Elizabeth did? That seems like an important point to mention.
My interpretation: there’s no such thing as negative value of information. If the mean of the crowdworkers’ estimates were reliably in the wrong direction (compared with Elizabeth’s prior) then that would allow you to update Elizabeth’s prior to make it more accurate.
So the thing I’m wondering here is what makes this “amplification” in more than a trivial sense. Let me think out loud for a bit. Warning: very rambly.
Let’s say you’re a competent researcher and you want to find out the answers to 100 questions, which you don’t have time to investigate yourself. The obvious strategy here is to hire 10 people, get them to investigate 10 questions each, and then pay them based on how valuable you think their research was. Or, perhaps you don’t even need to assign them questions—perhaps they can pick their own questions, and you can factor in how neglected each question was as part of the value-of-research calculation.
This is the standard, “freeform” approach; it’s “amplification” in the same sense that having employees is always amplification. What does the forecasting approach change?
It gives one specific mechanism for how you (the boss) evaluate the quality of research (by comparison with your own deep dive), and rules out all the others. This has the advantage of simplicity and transparency, but has the disadvantage that you can’t directly give rewards for other criteria like “how well is this explained”. You also can’t reward research on topics that you don’t do deep dives on.
This mainly seems valuable if you don’t trust your own ability to evaluate research in an unbiased way. But evaluating research is usually much easier than doing research! In particular, doing research involves evaluating a whole bunch of previous literature.
Further, if one of your subordinates thinks you’re systematically biased, then the forecasting approach doesn’t give them a mechanism to get rewarded for telling you that. Whereas in the freeform approach to evaluating the quality of research, you can take that into account in your value calculation.
It gives one specific mechanism for how you aggregate all the research you receive. But that doesn’t matter very much, since you’re not bound to that—you can do whatever you like with the research after you’ve received it. And in the freeform approach, you’re also able to ask people to produce probability distributions if you think that’ll be useful for you to aggregate their research.
It might save you time? But I don’t think that’s true in general. Sure, if you use the strategy of reading everyone’s research then grading it, that might take a long time. But since the forecasting approach is highly stochastic (people only get rewards for questions you randomly choose to do a deep dive on) you can be a little bit stochastic in other ways to save time. And presumably there are lots of other grading strategies you could use if you wanted.
Okay, let’s take another tack. What makes prediction markets work?
1. Anyone with relevant information can use that information to make money, if the market is wrong.
2. People can see the current market value.
3. They don’t have to reveal their information to make money.
4. They know that there’s no bias in the evaluation—if their information is good, it’s graded by reality, not by some gatekeeper.
5. They don’t actually have to get the whole question right—they can just predict a short-term market movement (“this stock is currently undervalued”) and then make money off that.
This forecasting setup also features 1 and 2. Whether or not it features 3 depends on whether you (the boss) manage to find that information by yourself in the deep dive. And 4 also depends on that. I don’t know whether 5 holds, but I also don’t know whether it’s important.
So, for the sort of questions we want to ask, is there significant private or hard-to-communicate information?
If yes, then people will worry that you won’t find it during your deep dive.
If no, then you likely don’t have any advantage over others who are betting.
If it’s in the sweet spot where it’s private but the investigator would find it during their deep dive, then people with that private information have the right incentives.
If either of the first two options holds, then the forecasting approach might still have an advantage over a freeform approach, because people can see the current best guess when they make their own predictions. Is that visibility important, for the wisdom of crowds to work—or does it work even if everyone submits their probability distributions independently? I don’t know—that seems like a crucial question.
Anyway, to summarise, I think it’s worth comparing this more explicitly to the most straightforward alternative, which is “ask people to send you information and probability distributions, then use your intuition or expertise or whatever other criteria you like to calculate how valuable their submission is, then send them a proportional amount of money.”
Perhaps I missed this, but how long were the forecasters expected to spend per claim?
I broadly agree with the sentiment of this post, that GPT-2 and BERT tell us new things about language. I don’t think this claim relies on the fact that they’re transformers though—and am skeptical when you say that “the transformer architecture was a real representational advance”, and that “You need the right architecture”. In your post on transformers, you noted that transformers are supersets of CNNs, but with fewer inductive biases. But I don’t think of removing inductive biases as representational advances—or else getting MLPs to work well would be an even bigger representational advance than transformers! Rather, what we’re doing is confessing as much ignorance about the correct inductive biases as we can get away with (without running out of compute).
Concretely, I’d predict with ~80% confidence that within 3 years, we’ll be able to achieve comparable performance to our current best language models without using transformers—say, by only using something built of CNNs and LSTMs, plus better optimisation and regularisation techniques. Would you agree or disagree with this prediction?
Note that Val’s confusion seems to have been because he misunderstood Oli’s point.
+1, I would have written my own review, but I think I basically just agree with everything in this one (and to the extent I wanted to further elaborate on the post, I’ve already done so here).
This post provides a useful conceptual handle for zooming on what’s actually happening when I get distracted, or procrastinate. Noticing this feeling has been a helpful step in preventing it.
This post directly addresses what I think is the biggest conceptual hole in our current understanding of AGI: what type of goals will it have, and why? I think it’s been important in pushing people away from unhelpful EU-maximisation framings, and towards more nuanced and useful ways of thinking about goals.