Rohin reported an unusually large (90%) chance that AI systems will be safe without additional intervention.
This sentence makes two claims. Firstly that Rohin reports 90% credence in safe AI by default. Secondly that 90% is unusually large compared with the relevant reference class (which I interpret to be people working full-time on AI safety).
However, as far as I can tell, there’s no evidence provided for the second claim. I find this particularly concerning because it’s the sort of claim that seems likely to cause (and may already have caused) information cascades, along the lines of “all these high status people think AI x-risk is very likely, so I should too”.
It may well be true that Rohin is an outlier in this regard. But it may also be false: a 10% chance of catastrophe is plenty high enough to motivate people to go into the field. Since I don’t know of many public statements from safety researchers stating their credence in AI x-risk, I’m curious about whether you have strong private evidence.
This doesn’t make much sense in two of your examples: factory farming and concern for future generations. In those cases it seems that you instead have to convince the “powerful” that they are wrong.
I think it’s quite a mistake-theoretic view to think that factory farming persists because powerful people are wrong about it. Instead, the (conflict-theoretic) view which I’d defend here is something like “It doesn’t matter what politicians think about the morality of factory farming, very few politicians are moral enough to take the career hit of standing up for what’s right when it’s unpopular, and many are being bought off by the evil meat/farming lobbies. So we need to muster enough mass popular support that politicians see which way the wind is blowing and switch sides en masse (like they did with gay marriage).”
Then the relevance to “the struggle to rally people without power to keep the powerful in check will be a Red Queen’s race that we simply need to keep running for as long as we want prosperity to last” is simply that there’s no long-term way to change politicians from being weak-willed and immoral—you just need to keep fighting through all these individual issues as they come up.
I think besides “power corrupts”, my main problem with “conflict theorists” is that optimizing for gaining power often requires [ideology], i.e., implicitly or explicitly ignoring certain facts that are inconvenient for building a social movement or gaining power. And then this [ideology] gets embedded into the power structure as unquestionable “truths” once the social movement actually gains power, and subsequently causes massive policy distortions.
(Warning: super simplified, off the cuff thoughts here, from a perspective I only partially endorse): I guess my inner conflict theorist believes that it’s okay for there to be significant distortions in policy as long as there are mechanisms by which new ideologies can arise to address them, and that it’s worthwhile to have this in exchange for dynamism and less political stagnation.
Like, you know what was one of the biggest policy distortions of all time? World War 2. And yet it had a revitalising effect on the American economy, decreased inequality, and led to a boom period.
Whereas if you don’t have new ideologies rising and gaining power, then you can go around fixing individual problems all day, but the core allocation of power in society will become so entrenched that the policy distortions are disastrous.
(Edited to add: this feels relevant.)
I address (something similar to) Yudkowsky’s view in the paragraph starting:
I would guess that many anti-realists are sympathetic to the arguments I’ve made above, but still believe that we can make morality precise without changing our meta-level intuitions much—for example, by grounding our ethical beliefs in what idealised versions of ourselves would agree with, after long reflection.
Particularism feels relevant and fairly similar to what I’m saying, although maybe with a bit of a different emphasis.
If Alice doesn’t mean for her second sentence to be totally redundant—or if she is able to interpret Bob’s response as an intelligible (if incorrect) statement of disagreement with her second sentence—then that suggests her second sentence actually constitutes a substantively normative claim.
I don’t think you can declare a sentence redundant without also considering the pragmatic aspects of meaning. In this example, Alice’s second sentence is a stronger claim than the first, because it again contains an implicit clause: “If you want to get protein, and you don’t have any other relevant goals, you should eat meat”. Or maybe it’s more like “If you want to get protein, and your other goals are standard ones, you should eat meat.”
Compare: Alice says “Jumping off cliffs without a parachute is a quick way to feel very excited. If you want to feel excited, you should jump off cliffs without a parachute.” Bob says “No you shouldn’t, because you’ll die.” Alice’s first sentence is true, and her second sentence is false, so they can’t be equivalent—but both of them can be interpreted as goal-conditional empirical sentences. It’s just the case that when you make broad statements, pragmatically you are assuming a “normal” set of goals.
If she is able to interpret Bob’s response as an intelligible (if incorrect) statement of disagreement with her second sentence
It’s not entirely unintelligible, because Alice is relying on an implicit premise of “standard goals” I mentioned above, and the reason people like Bob are so outspoken on this issue is because they’re trying to change that norm of what we consider “standard goals”. I do think that if Alice really understood normativity, she would tell Bob that she was trying to make a different type of claim to his one, because his was normative and hers wasn’t—while conceding that he had reason to find the pragmatics of her sentence objectionable.
Also, though, you’ve picked a case where the disputed statement is often used both in empirical ways and in normative ways. This is the least clear sort of example (especially since, pragmatically, when you repeat almost the same thing twice, it makes people think you’re implying something different). The vast majority of examples of people using “if you want..., then you should...” seem clearly empirical to me—including many that are in morally relevant domains, where the pragmatics make their empirical nature clear:
A: “If you want to murder someone without getting caught, you should plan carefully.”
B: “No you shouldn’t, because you shouldn’t murder people.”
A: “Well obviously you shouldn’t murder people, but I’m just saying that if you wanted to, planning would make things much easier.”
1. “Bayesian updating has a certain asymptoptic convergence property, in the limit of infinite experience and infinite compute. So if you want to understand the world, you should be a Bayesian.”
If the first and second sentence were meant to communicate the same thing, then the second would be totally vacuous given the first.
I was a little imprecise in saying that they’re exactly equivalent—the second sentence should also have a “in the limit of infinite compute” qualification. Or else we need a hidden assumption like “These asymptotic convergence properties give us reason to believe that even low-compute approximations to Bayesianism are very good ways to understand the world.” This is usually left implicit, but it allows us to think of “if you want to understand the world, you should be (approximately) a Bayesian” as an empirical claim not a normative one. For this to actually be an example of normativity, it needs to be the case that some people consider this hidden assumption unnecessary and would endorse claims like “You should use low-compute approximations to Bayesianism because Bayesianism has certain asymptotic convergence properties, even if those properties don’t give us any reason to think that low-compute approximations to Bayesianism help you understand the world better.” Do you expect that people would endorse this?
But I do have the impression that many people would at least endorse this equally normative claim: “If you have the goal of understanding the world, you should be a Bayesian.”
Okay, this seems like a crux of our disagreement. This statement seems pretty much equivalent to my statement #1 in almost all practical contexts. Can you point out how you think they differ?
I agree that some statements of that form seem normative: e.g. “You should go to Spain if you want to go to Spain”. However, that seems like an exception to me, because it provides no useful information about how to achieve the goal, and so from contextual clues would be interpreted as “I endorse your desire to go to Spain”. Consider instead “If you want to murder someone without getting caught, you should plan carefully”, which very much lacks endorsement. Or even “If you want to get to the bakery, you should take a left turn here.” How do you feel about the normativity of the last statement in particular? How does it practically differ from “The most convenient way to get to the bakery from here is to take a left turn”? Clearly that’s something almost everyone is a realist about (assuming a shared understanding of “convenient”) at Less Wrong and elsewhere.
In general—at least in the context of the concepts/definitions in this post—the inclusion of an “if” clause doesn’t prevent a claim from being normative. So, for example, the claim “You should go to Spain if you want to go to Spain” isn’t relevantly different from the claim “You should give money to charity if you have enough money to live comfortably.”
I think there’s a difference between a moral statement with conditions, and a statement about what is best to do given your goals (roughly corresponding to the difference between Kant’s categorical and hypothetical imperatives). “You should give money to charity if you have enough money to live comfortably” is an example of the former—it’s the latter which I’m saying aren’t normative in any useful sense.
The quote from Eliezer is consistent with #1, since it’s bad to undermine people’s ability to achieve their goals.
More generally, you might believe that it’s morally normative to promote true beliefs (e.g. because they lead to better outcomes) but not believe that it’s epistemically normative, in a realist sense, to do so (e.g. the question I asked above, about whether you “should” have true beliefs even when there are no morally relevant consequences and it doesn’t further your goals).
Upon further thought, maybe just splitting up #1 and #2 is oversimplifying. There’s probably a position #1.5, which is more like “Words like “goals” and “beliefs” only make sense to the extent that they’re applied to Bayesians with utility functions—every other approach to understanding agenthood is irredeemably flawed.” This gets pretty close to normative realism because you’re only left with one possible theory, but it’s still not making any realist normative claims (even if you think that goals and beliefs are morally relevant, as long as you’re also a moral anti-realist). Maybe a relevant analogy: you might believe that using any axioms except the ZFC axioms will make maths totally incoherent, while not actually holding any opinion on whether the ZFC axioms are “true”.
In this case, I feel like there aren’t actually that many people who identify as normative anti-realists (i.e., deny that any kind of normative facts exist).
What do you mean by a normative fact here? Could you give some examples?
It seems to me, rather, that people often talk about updating your credences in accordance with Bayes’ rule and maximizing the expected fulfillment of your current desires as the correct things to do.
It’s important to disentangle two claims:
1. In general, if you have the goal of understanding the world, or any other goal that relies on doing so, being Bayesian will allow you to achieve it to a greater extent than any other approach (in the limit of infinite compute).
2. Regardless of your goals, you should be Bayesian anyway.
Believing #2 commits you to normative realism as I understand the term, but believing #1 doesn’t - #1 is simply an empirical claim about what types of cognition tend to do best towards a broad class of goals. I think that many rationalists would defend #1, and few would defend #2 - if you disagree, I’d be interested in seeing examples of the latter. (One test is by asking “Aside from moral considerations, if someone’s only goal is to have false beliefs right now, should they believe true things anyway?”) Either way, I agree with Wei that distinguishing between moral normativity and epistemic normativity is crucial for fruitful discussions on this topic.
Another way of framing this distinction: assume there’s one true theory of physics, call it T. Then someone might make the claim “Modelling the universe using T is the correct way to do so (in the limit of having infinite compute available).” This is analogous to claim #1, and believing this claim does not commit you to normative realism, because it does not imply that anyone should want to model the universe correctly.
It might also be useful to clarify that in ricraz’s recent post criticizing “realism about rationality,” several of the attitudes listed aren’t directly related to “realism” in the sense of this post.
I would characterise “realism about rationality” as approximately equivalent to claim #1 above (plus a few other similar claims). In particular, it is a belief about whether there is a set of simple ideas which elegantly describe the sort of “agents” who do well at their “goals”—not a belief about the normative force of those ideas. Of course, under most reasonable interpretations of #2, the truth of #2 implies #1, but not vice versa.
This post says interesting and specific things about climate change, and then suddenly gets very dismissive and non-specific when it comes to individual action. And as you predict in your other posts, this leads to mistakes. You say “your causal model of how your actions will affect greenhouse gas concentrations is missing the concept of an economic equilibrium”. But the whole problem of climate change is that the harm of carbon emissions affects the equilibrium point of economic activity so little. You even identify the key point (“our economy lets everyone emit carbon for free”) without realizing that this implies replacement effects are very weak. Who will fly more if I fly less? In fact, since many industries have economies of scale, me flying less or eating less meat quite plausibly increases prices and decreases the carbon emissions of others.
And yes, there are complications—farm subsidies, discontinuities in response curves, etc. But decreasing personal carbon footprint also has effects on cultural norms which can add up to larger political change. That seems pretty important—even though, in general, it’s the type of thing that it’s very difficult to be specific about even for historical examples, let alone future ones. Dismissing these sort of effects feels very much like an example of the “valley of bad rationality”.
to what extent models tend to learn their goals internally vs. via reference to things in their environment
I’m not sure what this distinction is trying to refer to. Goals are both represented internally, and also refer to things in the agent’s environments. Is there a tension there?
Yes, I’m assuming cumulatively-calculated reward. In general this is a fairly standard assumption (rewards being defined for every timestep is part of the definition of MDPs and POMDPs, and given that I don’t see much advantage in delaying computing it until the end of the episode). For agents like AlphaGo observing these rewards obviously won’t be very helpful though since those rewards are all 0 until the last timestep. But in general I expect rewards to occur multiple times per episode when training advanced agents, especially as episodes get longer.
In the context of reinforcement learning, it’s literally just the reward provided by the environment, which is currently fed only to the optimiser, not to the agent. How to make those rewards good ones is a separate question being answered by research directions like reward modelling and IDA.
So the reward function can’t be the policy’s objective – one cannot be pursuing something one has no direct access to.
One question I’ve been wondering about recently is what happens if you actually do give an agent access to its reward during training. (Analogy for humans: a little indicator in the corner of our visual field that lights up whenever we do something that increases the number or fitness of our descendants). Unless the reward is dense and highly shaped, the agent still has to come up with plans to do well on difficult tasks, it can’t just delegate those decisions to the reward information. Yet its judgement about which things are promising will presumably be better-tuned because of this extra information (although eventually you’ll need to get rid of it in order for the agent to do well unsupervised).
On the other hand, adding reward to the agent’s observations also probably makes the agent more likely to tamper with the physical implementation of its reward, since it will be more likely to develop goals aimed at the reward itself, rather than just the things the reward is indicating. (Analogy for humans: because we didn’t have a concept of genetic fitness while evolving, it was hard for evolution to make us care about that directly. But if we’d had the indicator light, we might have developed motivations specifically directed towards it, and then later found out that the light was “actually” the output of some physical reward calculation).
I don’t think I’m claiming that the value prop stories of bad startups will be low-delta overall, just that the delta will be more spread out and less specific. Because the delta of the cryobacterium article, multiplied by a million articles, is quite big, and Golden can say that this is what they’ll achieve regardless of how bad they actually are. And more generally, the delta to any given consumer of a product that’s better than all its competitors on several of the dimensions I listed above can be pretty big.
Rather, I’m claiming that there are a bunch of startups which will succeed because they do well on the types of things I listed above, and that the Value Prop Story sanity check can’t distinguish between startups that will and won’t do well on those things in advance. Consider a startup which claims that they will succeed over their competitors because they’ll win at advertising. This just isn’t the type of thing which we can evaluate well using the Value Prop Story test as you described it:
1. Winning at advertising isn’t about providing more value for any given consumer—indeed, to the extent that advertising hijacks our attention, it plausibly provides much less value.
2. The explanation for why that startup thinks they will win on advertising might be arbitrarily non-specific. Maybe the founder has spent decades observing the world and building up strong intuitions about how advertising works, which it would take hours to explain. Maybe the advertising team is a strongly-bonded cohesive unit which the founder trusts deeply.
3. Startups which are going to win at advertising (or other aspects of high-quality non-customer-facing execution) might not even know anything about how well their competitors are doing on those tasks. E.g. I expect someone who’s generically incredibly competent to beat their competitors in a bunch of ways even if they have no idea how good their competitors are. The value prop sanity check would reject this person. And if, like I argued above, being “generically incredibly competent” is one of the most important contributors to startup success, then rejecting this type of person makes the sanity check have a lot of false negatives, and therefore much less useful.
Hmm, could you say more? I tend to think of social influences as good for propagating ideas—as opposed to generating new ones, which seems to depend more on the creativity of individuals or small groups.
I guess I want there to be a minimum lower standard for a Value Prop Story. If you are allowed to say things like “our product will look better and it will be cooler and customers will like our support experience more”, then every startup ever has a value prop story. If we’re allowing value prop stories of that low quality, then Golden’s story could be “our articles will be better than Wikipedia’s”. Whereas when Liron said that 80% of startups don’t have a value prop story, they seemed to be talking about a higher bar than that.