This is nice. I think what I’ve landed on is kind-of-lying (at least by omission) to most people, but to try to have at least some outlet for taboo thoughts, be it anon alts, or specific friends that are “safe” for those thoughts. That can hopefully mitigate the bad epistemic effects.
Elias Schmied
A simple argument for trying less hard
That’s interesting. Yeah, maybe “embarrassed” was a bit strong.
Wait, but if it tracked Malthus’s unconstrained rate only until 1900, isn’t that much earlier than the invention of reliable contraception?
Also, Claude tells me that there was a while (1750-1850) where per-capita income grew even as fertility hadn’t declined yet.
Thank you Steven! Really appreciate the comment.
I feel like I’m probably misunderstanding your position here, because it really seems crazy to me.
Yes, I think you are, but I take a lot of responsibility for that :-) - the post is kind of a whirlwind, undecided between slightly different framings, simplified to intuition-pump better (as I say in the introduction), and not very optimized for comprehensibility. I tried pretty hard in the caveats section to make it clear that I’m not being this radical, but I was probably still not being loud enough about it. There is a lot of work left to be done for me to give better examples and intuition pumps, and to lay things out more.
For one thing, if people learn planning from other people, where did it come from in the first place? Somebody had to have been the first, right?
Well, no—that’s like the old creationist argument that an eye couldn’t have evolved from nothing because the eye is so complex. Complex machinery can still evolve gradually, even if it seems counterintuitive when you only see the final product. The many separate planning heuristics and long-term planning social stories can gradually culturally evolve from initial small random variation of social vibes or ideas. (and then also be improved on through individual internal cognitive processes, of course).
(and you agree with gradual emergence of it being possible anyway, I imagine, since you just think the planning algorithm evolved biologically instead)
(not sure how much you skimmed, more on this if you ctrl-F “mimesis”)
(I think some people are more motivated by following norms than others. Sociopaths, autistics, and “high-agency people” would typically be on the lower end of norm-following motivation, so I would look there first to find especially clear-cut evidence of non-social agency.)
For the record, “following norms” (and also as you later say, “just copying people”) is a much more impoverished notion of social agency than what I’m talking about. It’s more about e.g. what’s elevated to attention, subtle life stories or vibes that you saw somewhere (including ones that purposely break norms!), and the social tool of language providing a substrate for abstract ideas like long-term plans. I imagine you agree that sociopaths and autists still have lots of social RL reward in their heads (even if it’s unusual in some ways), so they’re not pure examples of some kind of hypothetical nonsocial humans.
I think planning is basically innate, although it’s augmented by a lifetime of learning how to plan better (e.g. you can learn metacognitive heuristics from experience, or from reading a book etc.).
Ah I see, maybe I should reread Brain-like-AGI safety (again, it’s been a few years). After asking Opus 4.8 about what exactly you mean (a somewhat “deflationary” notion of planning, it says), I actually think what we’re saying is probably compatible.
Maybe you should read Cate Hall’s book when it comes out? :-P
:P
Let’s get into the meat of it—you mentioned a few different examples, let me zoom in where it feels most useful.the plan to eat pizza is clearly “planning through a world-model”.
Yes, absolutely—let me say this very loudly. PLANNING THROUGH A WORLD MODEL IS REAL AND WORKS. :-)
It’s just that this world model is substantively one of social stories, not of the physical world. (or less radically, you might say the abstractions used to plan through the world model are extremely socially mediated, and the more so the more sophisticated and long-term the plans are).
if you adopt their general advice and then next week you end up with a proud new accomplishment under your belt, aren’t you marginally more likely to follow that same heuristic in the future? Obviously yes, right? So doesn’t this constitute “[learning] through feedback about how well they fulfill your goals”?
Yes, absolutely. I should probably have written “primarily/at root through feedback about how well they fulfill your goals”—that’s closer to my beliefs. Let me edit that, thank you.
Let me say this very loudly as well. WE IMPROVE AT PLANNING THROUGHOUT OUR LIFETIME, IN PART BY SEEING WHAT WORKS AND WHAT DOESN’T. :-)
Again, the caveat section goes into a little more detail here.
“if I put on ripped pants right now at 8am, then my knees might get cold when I’m outside at 10pm tonight, and I know this because it happened to me yesterday”
This is a really useful example, because I’d actually tend to agree with you here that this is nonsocial! It’s a negative update about simple physical things in your environment (ripped pants, cold weather), and a simple physical action cognitively connected to them (putting them on) - pretty plausible to me that it works with no social involvement whatsoever.
(I would still quibble here that there’s a possibility of involvement of a “responsible person social role” that activates when there’s something that pattern-matches to “failure”, and that searches internally for possible targets for a cognitive update to fix the bad “failure feeling”—but this is getting into the weeds a bit more, and I’m much more uncertain)
Now to our disagreements:
OK here’s an example that I challenge you to explain: if I’m hungry, I might take a bus to the restaurant to get a slice of pizza, but if I’m not hungry, then I won’t.
Okay, so I’ll simplify this by removing the bus part, and having the plan only be “going to the restaurant”—I’d be saying very similar things twice otherwise.
First, the feeling of hunger triggers an impulse to somehow acquire food (I feel pretty agnostic on how much this is socially learned, seems like something in early childhood that I don’t have good intuitions for).
Then, there is a possible action associated with “getting food” that pops up in your head: “I could go to a restaurant.” Seemingly a medium-term plan. Now, where did it come from? You first learned about the concept of going to restaurants sometime in your childhood, from other people. You didn’t come up with it yourself—you’re following a social script. There’s no backchaining or world-modelling in the full sense of the word happening here—it’s just an if-then statement.
You might think that this is a minor definitional quibble. But the crucial point is that the substance of the cognitive work happened previously, in the macro-cultural process, not in you. The “modeling” (or rather, actual experiencing) of going to a restaurant having good outcomes, the learning process of absorbing it - it happened gradually in the culture, as restaurants became a thing, people internalized that it was convenient and good, and going to them became normal.
That’s what I mean by “yes, you’re planning through a world model, but it is a world model of social stories, not a model of the physical world”.
Let’s modify the example. Let’s say you didn’t know the concept of restaurants, and you’re a caveman who just had a modern restaurant spawn a kilometer away from his cave. Let’s also say you already have language, and it’s been explained to you that you can get food there much easier than if you hunted and gathered, and that you magically 100% trust the source that explained it to you (let’s say it’s not a person, to make it as nonsocial as possible).
In my intuition… this would be different—even though, by assumption, you believe in the abstract concept of a restaurant in the same way, which would seem to suggest that the planning should be just as easy. It would feel importantly different from going to the restaurant today, and even from when you first did it as a child (when it felt more safe and normal than in this example, because your parents were doing it with you). That’s because there’s no social script for it—the cognitive work hasn’t been done for you yet. It would be a nontrivially cognitively difficult process, with a little bit of internal friction, to take the leap to such a weird new thing. That would be you doing the “real” cognitive work of learning about and modeling the real world yourself. And you’d (probably) internalize it permanently as a salient option here too. It would just be harder than learning it from your parents as a child—but it would actually be “nonsocial agency” in that case, so to speak.
And with going to a restaurant, at least you actually experienced it when you first learned about it—it’s a very short-term plan, still. All these dynamics get more extreme when we talk about true long-term plans over years and decades, where you have never experienced the long-term goal or even the substeps of the plan (like the “going to college” example in the main post).
Now of course, this is all still explainable in your frame as well (e.g. you could say that new things always feel weird, or something). Hyper-abstract frames like the ones we’re talking about here can generally accommodate any observation—the question is which frame requires less epicycles overall.
Repeating myself from the main post: to me, saying that long-term planning is algorithmic but that “the social world is usually very very important for (1) making options salient, and (2) making options seem appealing, and (3) providing evidence about the consequences of different options” feels like a lot of epicycles. And starting the other way around—saying we’re acting out social stories and that this gets us to some wonky general-ish planning ability, and then adding the caveat that sometimes “real” ground-level feedback can sway us from one social story to the other—feels like overall less epicycles, when we try to make our frames fit the evidence of our own introspection and human behavior in general.
Coming back to this again a few years later, such a classic. Very beautiful conceptually and useful for my thinking
Well done! I like this, and am eagerly anticipating the next post! It rhymes with a lot of my own thinking—curious whether the view you end up settling on will be similar to mine too.
(I sometimes phrase this as the Reverse Pascal’s Wager: the gods almost certainly exist, but if they don’t, the stakes are much higher, so you should act as if they didn’t exist.)
Ha, this is really nice! Well done.
Social agency
Ah now I get what you were going for! Yes this is great! Definitely directionally what I am saying. Thank you for the interpretive effort.
(Also, again side note for others: I’m not only talking about alignment, value erosion between human-led AI-powered factions is also a concern, the whole framing is very abstract so very detail-agnostic)
Of course, I would say it goes even deeper than that, and my argument forces even more uncertainty on us: What you call progress in the “type 2 technologies” may be confounded by the type 1 progress. i.e, moral and coordination progress may just be a side effect of our economic progress, and therefore also might not mean that much either.
It’s a huge historical question. I elaborate on this in the footnotes of the follow-up post a bit.
e.g. democracy may be a side effect of human labor being temporarily valuable (Is democracy a fad?), modernization theory in academia, average income growth being possible and therefore capacity to care about others may be a side effect of slow human reproduction (Bostrom, Hanson). (Claude gives a long list of case studies in academia to look at for this question).
In general, I feel like the importance of this issue (trying to disentangle how “real”, how much of a standalone force, moral and coordination progress is) has been underrated in EA[1] - and precisely for this reason: it makes our macro priors very different.
Anyway, unless you have a confident belief that all the type 2 progress definitely is not explained by type 1 progress (which seems hard to imagine), you should also reduce some of the update about the future you got from the type 2 progress—since it would happen when we approach AGI anyway. (Or I think in your framing, not evidence that alignment is any easier (?))
This may be a useful way to frame it, perhaps this macro question is the actual crux. Thank you.- ^
Perhaps because it’s a foundational assumption of our subculture/ideology that moral progress is a huge deal and that we can do more of it.
- ^
What? I’m not even vegan lol
“a sort of semi-random unpredictable switch from doing well to doing poorly, as a sort of inevitable cost of optimization” I mean, yeah—that’s Goodhart’s Law. I made a follow-up post explaining it more.
Clarifying the Darwinian Honeymoon
Actually, no I changed my mind. It’s still captured by the core conceptual point of the piece, there’s just no good vivid examples of it like the chicken example.
It’s still completely expected a priori that the dominant class of agent at a given stage of runaway Darwinian competition would solve most of their problems and be very powerful. This would be the case even if they were to then proceed to get outcompeted. So if someone hasn’t integrated this conceptual point, it should be an update downwards on how much evidence that outside view is for them.
There’s also the classic meta-epistemic debate here, on how to weigh empirical outside views against gears-level models (even if the “gears-level model” here is something almost tautological).
I see—yeah I agree that is different from the main thrust of the piece. I think I address this in footnote 9 a bit, where I call it the “humans will keep becoming more powerful” outside view.
My main objection to that would be that there’s not a very strong correlation between humans getting what they want and the total welfare of the world (Agricultural Revolution, factory farming), so the fact that that outside view is strong doesn’t seem very reassuring.
(How do you feel about the question of whether the total welfare of the world today is negative because of factory farming?)
Thanks Nathan.
Yeah agreed chicken welfare is likely to improve.
I seem to remember you relying on the “human progress outside view” in your reasoning about AI—maybe I’m misremembering. My point is more about how much evidence that outside view should be, on an abstract level, rather than any concrete criticism of capitalism. Like, obviously humans would do better for a while, so it cannot be that surprising, so it cannot be that much evidence.
I’m purposefully being very vague about the specifics, because I’m fundamentally only trying to make a conceptual point—about how strong the evidence from the outside view should be.
Thank you!
I hope my post can add something novel, that I haven’t come across in the pre-existing discussion (see my footnote 1) of the specter of long-term evolutionary competition: the observation that this at first predictably benefits you (an instance of Goodhart’s Law), a vivid example, and a memetic handle for the concept: The Darwinian Honeymoon.
Yay, I’m happy my tweet was able to inspire someone to do it! Nice job :)