Isnasene

Karma: 722

Isnasene 31 Oct 2019 1:41 UTC
41 points
on: The Technique Taboo
While there _is_ a technique taboo and I agree with your general observations, I think that there are a number of things going on here simultaneously that boil down to more than just a taboo on the idea that skill is a trainable attribute. For instance:
1. Many activities that appear to have taboos against training skills are just reflective of people who who are _optimizing something else_. In particular, enjoyment.
But when traditional colleges supply the labor force for a professional trade outside of academia, that’s when discussion of skill (especially rote learning) becomes taboo[1]. College students learn everything about their trade except how to do it. Then we maintain a collective silence concerning technique.
This collective skill silence isn’t necessarily a taboo—it might just be that the kind of people who choose their fields for non-practical reasons (ie not to develop professional skills) don’t really care about development their own skills that much. Instead of optimizing productive capabilities (ie skills), they might be trying to optimize consumptive capabilities (ie the ability to enjoy English literature or appreciate art or what-have-you). To elaborate:
An English major teaches you how to talk about novels, not how to write one.
This is true but, if you want to learn how to write, don’t pick the major that helps you appreciate English literature. Instead, pick the major that helps you write. If you want to write a novel (fictional), writing majors[1] are a good move—one of my friends has done this and has reams of pages of her own work. If you want to write a novel (non-fictional), you might want to try majoring directly in something like history since that directly gives you experience writing about history. On the other hand, if you want to increase your capacity to appreciate English literature, be an English major.
“Getting better at drawing” is off-topic at my weekly local drawing club too. I’ve literally never heard it discussed.
As someone who briefly ran at art club back in the day, consider that the people showing up actually might not care that much about being good at drawing; they might just enjoy it and care about the activity.
2. In competitive contexts, people don’t want to optimize their skills because it turns the situation into a race-to-the-bottom. If you’re at work and one person is actively trying to upskill, that person is putting pressure on you to do something you’d prefer not to in order to stay competitive. An extreme example of this is anti-social punishment (punishing people for being altruistic because it might create a norm where you have to be more altruistic).
This is a taboo against upskilling but it’s not about the people at the top trying to maintain a social order; it’s the people at the bottom trying to make sure they have the slack to stay where they are without losing their place.
3. In case there are many people are optimizing for enjoyment rather than upskilling (meditation is a good example of this) and there is some intructor managing the activity, the instructor is not under much pressure to have strong expertise. As long as instructors are good enough to lead the activity and ensure that people optimizing enjoyment find it valuable, they’ve done their job. Everyone goes home at the end of the day.
However, asking an instructor for advice on how to upskill puts responsibility onto the instructor.
- If the instructor gives you bad advice and you implement it with intent to upskill, the instructor has harmed you. Proper form prevents poor performance but improper form promotes it.
- If the instructor cannot give you good advice, you have harmed the instructor’s reputation. In this case, the instructor deserves that reputation hit but it’s still an incentive for them to oppose up-skilling.
The kind of dynamic between upskillers and enjoyment-optimizers also creates interesting situations. For instance, when I used to do Tae Kwon Do, there was a core of people dedicated to the practice (and would give you as much feedback and practice opportunities as you wanted) and a larger cloud of people just there to get their weekly exercise (and didn’t care very much about upskilling). Going from one group to the other dramatically changes the conversation about skill.
What links here?
- lsusr's comment on Defining “Antimeme” by lsusr (29 Dec 2019 1:17 UTC; 6 points)

Isnasene 24 Dec 2019 14:28 UTC
40 points
on: Funk-tunul’s Legacy; Or, The Legend of the Extortion War
Distinguishing CDT from FDT/TDT in intuitive cases tends to be a lot harder than it looks. And I think it’s important to be extremely careful about what we categorize as CDT+being clever versus FDT/TDT. My impression is that this story more often frequently the former.
At first, the population was composed of a humble race of agents called the ceedeetee. When two of the ceedeetee met each other, each would name the number 5, and receive a payoff of 5, and all was well.
I’m not sure it’s obvious that all ceedeetee will meet five when they meet each other.
- In an environment where there is zero information, this would be true (ie guessing >5 causes the gueesser to get outcompeted by those who will miss fewer payoffs and guessing less causes them to get outcompeted genetically by their partners in the game) but it’s clearly not true in this particular context. Instead, it seems more likely that ceedeetrees will on-net guess (and get) five based on whether their analysis of their partner tells them what they can get away with (ie A scares B so B only offers 4 and B offers 6, but B scares C so B offers 6 and C offers 4, but C scares A and so on...). I’d expect an equlibrium that’s suboptimal but has cyclical relationships between the participants.
- Since output from the game determines evolutionary fitness, any ceedeetees who get some payoffs from other sources (ie this guy I just met seems nice but that other guy didn’t so I’m gonna give a 4 to this guy and a 6 to the other guy) won’t always output five.
These points are kind of pedantic but it’s importance to notice, if this happens, nine-bots get destroyed. They always guess way too high and the inherent noise in how a population of actual ceedeetee play the game will be hard to recover from.
Then one day, a simple race of 9-bots invaded the land. The 9-bots would always name the number 9!
Where exactly would we expect the 9-bots to come from? If they were all trapped on a ship together, they would’ve just continously lost the game until they died Again, this is kind of pedantic but, as you point out, the population distributions matter.
And from that day onward, whenever Funk-tunul met a fellow ceedeetee agent—if “fellow” is the right word here, which it isn’t—she would announce that she was going to name 9, and do so. And though the ceedeetee agents’ output channels would light up with the standard inidicators of outrage and betrayal, they would reason causally, and name 1.
A very key part of what Funk-tunul is doing here is telling the ceedeetee agents beforehand that she’ll say nine. Again, it strikes me that, if a ceeteedee noticed they could cause their partners to guess numbers lower than five, they definitely would do that. Funk-tunul isn’t winning because of a better decision theory here; she’s winning because she’s more clever. at manipulating other ceedeetee.
However, in real life, this implies that Funk-tunul would not be successful. A ceedeetee would’ve, in the past, tried to credibly show that they always say nine until the population equilibrates to having a defense mechanism against this particular action.
They reasoned: suppose the fraction of ceedeetee agents in the population is p, the fraction of funk-tunul agents is q, and the fraction of 9-bots is 1−p−q. If we establish a policy of submitting to the 9-bots’ extortion, we’ll have an average payoff of 9p+5q+1⋅(1−p−q)=8p+4q+1 and the 9-bots will have an average payoff of 9p+9q. If we defy the 9-bots while continuing to extort our ceedeetee cousins, we’ll have an average payoff of 9p+5q, whereas the 9-bots will have an average payoff of 9p. Whether it’s better to submit or defy depends on the values of p and q. It’s not obviously possible for defiance to be the right choice given what we know, but if we can coordinate to meet fellow funk-tunul agents more often—if we drop the assumption of uniform random encounters—the calculus changes …
This doesn’t strike me as acausal reasoning; just long-termist reasoning. Given the (presumably exponential) population dynamics, a ceedeetee could easily predict that letting the nine-bot get nine points would help that nine-bot reproduce more nine-bots. If ceedeetee’rs are in the game to maximize fitness as opposed to utility, they’ll definitely establish a norm against helping nine-bots to protect against the exponential cost that nine-bots will have for the future. If they’re in the game to maximize their points in the game, this isn’t true (they’ll just defect against the future) but funk-tunul’s reasoning suggests that this isn’t what’s going on.
It’s not obviously possible for defiance to be the right choice given what we know, but if we can coordinate to meet fellow funk-tunul agents more often—if we drop the assumption of uniform random encounters—the calculus changes …
If we drop limiting assumptions once funk-tunul agents get involves, it seems pretty clear that the funk-tunul agents will do better than the ceedeetee previously did.
Before the two agents could name their numbers, Graddes spoke. “Please. Why are you doing this?” she pleaded. “I can’t hate the 9-bots for their extortion, for they are a simple race and could not do otherwise. But you—we’re cousins. Your lineage is a fork of mine. You know it’s not fair for your people to always name the number 9 when meeting mine. Yet you do so anyway, knowing that we have no choice but to name the number 1 if we want any payoff at all. Why?”
“Don’t hate the player,” said Tim’liss, her output channels dimming and brightening in a interpolated pattern one-third of the way between the standard indicators for sympathy and contempt. “Hate life.”
We just dropped the random-interaction assumption. Why don’t the ceedeetee just only interacting with fellow ceedeetee? Choosing only to interact with ceedeetee would get them waaaaay more points.
Also, this is evidence that the ceedeetee in the game care about stuff beyond just the scores they get in the game and reenforces my point that the events as-described don’t really make sense in evolutionary setting. Given this, it’s worth pointing out is that the actual thing Tim’liss is doing here is supporting a race to the bottom that optimizes only reproductive fitness. Engaging in a race to the bottom for reproductive fitness is Not Good timeless decision theory.

Isnasene 7 Apr 2020 15:56 UTC
29 points
0
on: Choosing the Zero Point
As an animal-welfare lacto-vegetarian who’s seen a fair number of arguments along these lines, they don’t really do it for me. In my experience, it’s not really possible to separate human peace of mind from the actions you make (the former reflect an ethical framework and the latter reflect strategies and together they form an aesthetic feedback loop) . To be explicit:
- I don’t think my moral zero-point was ever up for grabs. Moreover, it wasn’t “the world I interact with every day.” it was driven by an internal sense of what makes existing okay and what doesn’t and extrapolating that over the universe. Raising/lowering my zero-point is therefore internally connected with my heuristic for whether more beings should exist or not and in this sense, the zero-point was only a proxy for my psychological anguish pointing at this concept. If I artificially inflate/depreciate my zero-point while maintaining awareness that this has no effect on whether or not the average being existing is good or bad, it won’t actually change how I feel psychologically.
- A vast amount of my anguish around having a very low zero-point was social angst. A low zero-point (especially when due to animal welfare) not only meant that the world was bad; it meant that barely anyone cared (and in my immediate bubble, literally no one cares). This stuff occurred to me when I was very young and can result in what I now know to be institutional betrayal trauma. Had I been an ordinary kiddo that didn’t make real-time psychological corrections when my brain started acting funny, this would’ve happened to me.
  - Also, while I get what you’re saying, having a different value of something psychologically linked to a normative claim about “when it is good to exist” or “the bare standard of human decency” will gaslight people traumatized by mismatches between those claims and people’s actual actions. If you keep this zero-point alteration tool solely for the psychological benefits, it’s not a big deal. But if you talk to people about ethics and think your moral statements might be reflective of a modified zero-point, then it can be an issue. In light of this, I’d recommend preambling your ethical statements with something like “if I seem insufficiently horrified, it is only because I am deliberately modifying my definition of the bare standard of human decency/zero-point for reasons of mental well-being”. Otherwise, you’ll mess a whole bunch of people up.
- You’ve pointed out changing your zero-point gives you a number of psychological benefits. However, I think most of these psychological benefits come from the fact that people are more satisficing than utilitarian and this causes zero-point shifts to also cause nonlinear transformations of your utility function. If you’re accustomed to being internally satisfied by the world having utility over threshold X and you change your zero-point for the world without changing that threshold, you’ll predictably have more acceptance, relief and hope but this is because you’ve performed a de-facto nonlinear transformation of your utility function. Sometimes this, conditioned on being an irrational human, is a good thing to do to be more effective. Sometimes it makes you vulnerable to unbounded amounts of moral hazard. If you’re arguing in favor of zero-point moving, you need to address the concerns implied by the latter possibility.
- For evidence that these claim generalize beyond me, just look at your quote from Rob. He’s talking about a “bare standard of human decency” but note that this standard is actually a set of strategies! As you pointed out, strategies are invariant if you change your utility function’s zero point so the bare standard of human decency should be invariant too! As a non-utilitarian, this means you have four options with respect to your zero-point and each of them have their own drawbacks:
  - Not changing your zero-point and bite the bullet psychologically
  - Changing your zero-point but decoupling it from your sense of the “bare standard of human decency” which is held constant. This eliminates the psychological benefits
  - Changing your zero-point and allowing your “bare standard of human decency” to drift. This modifies your utility function.
  - Changing your zero-point and allowing your “bare standard of decency” to drift but decoupling your “bare standard of decency” from the actions you actually make. This will either eliminate the psychological benefits or break your sense of ethics

Isnasene 16 Oct 2019 2:18 UTC
LW: 29 AF: 6
AF
on: The Parable of Predict-O-Matic
Don’t mind me; just trying to summarize some of the stuff I just processed.
If you’re choosing a strategy of predicting the future based on how accurate it turns out to be, the strategy who’s output influences the future in ways that make its prediction more likely will outperform a strategy that doesn’t (all else being equal). Thus, one might think that the strategy you chose will be the strategy that most effectively balances its prediction between a) how accurate that prediction (unconditioned on the prediction being given) and b) how much the prediction itself improves the accuracy of the prediction (conditioning on the prediction). Because of this, the intern predicts that the world will be made more predictable than it would be normally.
In short, you’ll tend to choose the prediction strategies that give self-fulfilling predictions when possible over those that don’t.
However, choosing the strategy that predicts the future most accurately is also equivalent to throwing away every strategy that doesn’t predict the future the best. In the same way that self-fulfilling predictions are good for prediction strategies because they enhance accuracy of the strategy in question, self-fulfilling predictions that seem generally surprising to outside observers are even better because they lower the accuracy of competing strategies. The established prediction strategy thus systematically causes the kinds of events in the world that no other method could predict to further establish itself. Because of this, the engineer predicts that the world will become less predictable than it would be normally.
In short, you’ll tend to choose the prediction strategy that give self-fulfilling predictions which fulfill in maximally surprising ways relative to the other prediction strategies you are considering.
Oh god...
What links here?
- Random Thoughts on Predict-O-Matic by abramdemski (17 Oct 2019 23:39 UTC; 35 points)
- The Simulation Epiphany Problem by Koen.Holtman (31 Oct 2019 22:12 UTC; 15 points)

Isnasene 4 Aug 2021 17:25 UTC
28 points
on: Analysis of World Records in Speedrunning [LINKPOST]
This is cool! I like speedrunning! There’s definitely a connection between speed-running and AI optimization/misalignment (see When Bots Teach Themselves to Cheat, for example). Some specific suggestions:
- Speedrun times have a defined lower bound on the minimization problem (zero seconds). So over an infinite amount of time, the time vs speedrun time plot necessarily converges to a flat line. You can avoid this by converting to an unbounded maximization problem. For example, you might wanna try plotting Speed-Run-Time-on-Game-Release divided by Speed-Run-Time at time t vs time. Some benefits of this include
  - Intuitive meaning: This ratio tells you how many optimal speed-runs at time t could be accomplished over the course of a single speed-run at game release
  - Partially addresses diminishing returns: Say the game’s first speed-run completes the game in 60 seconds and the second speed-run is completes the game at 15 seconds (a 45 second improvement). No matter how much you work at the game, its not possible to reduce the speed-run time by more than the 45 second improvement (at most you can do 15 seconds) so diminishing returns are implied
    
    In contrast, if you look at the ratio, the first speed has a ratio of 1 (60 seconds/60 seconds), the second has a ratio of 4 (60 seconds/15 seconds), and a third one-second speed run has a ratio of 60 (60 seconds/1 second). Between the second and third speed-run, we’ve gone from a value of 4 to a value of 60 (a 15x increase!). Diminishing returns are no longer inevitable!
  - Easier to visualize: By normalizing by the initial speed-run time, all games start out with the same value regardless of how long they objectively take. This will allow you to more easily identify similarities between the trends.
  - More comparable to tech progress: Since diminishing returns aren’t inevitable by construction, this looks more like tech progress where diminishing returns also aren’t inevitable by construction. Note that they still can be in practice however
- Instead of plotting absolute dates, you plot time relative to when the first speed-run was registered. That is, set the date of the first speed run to t=0. This should help you identify trends.
- A lot of the games you review indicate that, in many cases, our best speed-run time so far isn’t even 3x as faster as the original speed-run. This implies that optimizing speed-run time (or the ratio I introduce above) is bounded and you can’t get more than a factor of 3 or 4 in terms of improvement. But obviously tech capabilities have improved by several orders of magnitude. So structurally, I don’t think speed-running can be particularly predictive of the tech advances
- Given the above, I suggest that if you want to model speed-runs, you should use functions that expect asymptotes (eg logistic equations). Combinations of logistic equations can probably capture the cascading L curves you notice in your write-up. May also be worth doing some basic analysis like counting the number of inflections in each speed-run (do this by plotting derivatives and counting the number of peaks).
  - If you do this, I strongly suggest doing a transformation like the one I suggested above since otherwise, you’re probably gonna get diminishing returns right off the bat and logistic equations don’t expect this. If you don’t transform for whatever reason, try exponential decay.
- Speed-running world records have times that, by definition, must monotonically decrease. So its expected that most of the plots will look like continuous functions. As you’re plotting things now, diminishing returns are built-in so you should also expect the derivatives to
Have fun out there!

Isnasene 26 Jan 2020 2:33 UTC
28 points
on: Material Goods as an Abundant Resource
Great series! I broadly agree with it and the approach. However, this post has given me a vagueish “no matter how many things are abundant, the economic rat-race is inescapable” vibe which I disagree with.
Towards the end, a grocer explains the new status quo eloquently:
″… not very many people will buy beans and chuck roast, when they can eat wild rice and smoked pheasant breast. So, you know what I’ve been thinking? I think what we’ll have to have, instead of a supermarket, is a sort of super-delicatessen. Just one item each of every fancy food from all over the world, thousands and thousands, all different”
I see the idea here but I disagree with it. I’m a human for goodness sake! I eat food to stay alive and to stay healthy and for the pure pleasure of eating it! Neither my time nor my money is a worthy trade-off for special unique food if it’s not going to do any of those things significantly better. I grant that there might be a niche market for this kind of thing but, the way I see it, being free of the need for material goods will free people from the rat-race: It will let them completely abandon their existing financial strategies insofar as those strategies were previously necessary to keep them alive.
This is what the FIRE community does. They save up enough money so that they only participate in the economy as much as it actually improves their lives.
Why? Because material goods are not the only economic constraints. If a medieval book-maker has an unlimited pile of parchment, then he’ll be limited by the constraint on transcriptionists. As material goods constraints are relaxed, other constraints become taut.
Broadly speaking, I agree with the description here of economic supply chains as a sequence of steps (ie potential bottle-necks. But, in general, I perceive these sequences of steps as finite. For example, the book-maker has unlimited parchment and is then limited by transcriptionists, so the book-maker automated transcription and is limited by books, so the book-maker automates writing (or it turns out the number of writers wasn’t a real bottleneck) so what then? Bookstores are shuttering. I have the internet and the last time I handed money to anyone in the book-making supply chain was because I wanted something to read on the plane.
Again, maybe there’s a niche market for more unique books or more elegantly bound collectible books but that’s a market I can opt out of. It’s superfluous to me having a good life.
Here’s one good you can’t just throw on a duplicator: a college degree.
A college degree is more than just words on paper. It’s a badge, a mark of achievement. You can duplicate the badge, but that won’t duplicate the achievement.
I didn’t get my college degree to signal social status. I got it because I wanted to get a nice job. I wanted to get a nice job so I could get money. I wanted to get money so that I could use it towards the aim of having a fulfilling life. Give me all the material goods and I would’ve probably just learned botany instead.
So, to me, college degrees (and other intangible badges of achievement) haven’t become the things they are because of abundance, they’ve become the things they are because social status will be instrumental to gaining important life-enhancing things for as long as those things are not abundant.
Social status might be vaguely zero-sum but, beyond a couple friends, it’s not critical for living a good life. Given the tools to live a good life, I imagine many people just opting out of the economy. I’m not going to work for eight hours a day to zero-sum compete for more social status alone.
But given that things have in fact become way more abundant, why haven’t we seen more of this opting out happening? Two answers:
1.
We have. Besides the FIRE community, we see it in retirees. I’ve personally seen it in a number of middle-aged adults who realize that trying to find another job in this tech’d up world just isn’t worth the hassle when they have enough to get by on.
2.
With all this talk of zero-sum games, the last piece of the post-scarcity puzzle should come as no surprise: political rent-seeking.
Once we accept that economics does not disappear in the absence of material scarcity, that there will always be something scarce, we immediately need to worry about people creating artificial scarcity to claim more wealth.
Yep. I’d generalize rent-seeking beyond just politics and into the realm of moral maze rent-seeking but yep. I’d actually view the college-corporate complex as a subtrope of this. Colleges as a whole (for reasons of inadequate equilibria) collectively own the keys long-term social stability (excluding people who want to go into trades, and who are confident that those trades won’t go away). They do this and charge a heckuva lot of money for it despite not actually providing much intrinsic value beyond fitting well into the existing incentive structure.
Remove material goods as a taut economic constraint, and what do you get? The same old rat race. Material goods no longer scarce? Sell intangible value. Sell status signals. There will always be a taut constraint somewhere.
Status symbol competition doesn’t scare me in a post-material-scarcity world; I can do just fine without it. What terrifies me is the possibility of rent-seekers (or complex incentive structures) systematically inducing artificial scarcity into material that I care about despite it not literally being scarce.

Isnasene 12 Jan 2020 23:23 UTC
25 points
on: The Rocket Alignment Problem
[Disclaimer: I’m reading this post for the first time now, as of 1/11/2020. I also already have a broad understanding of the importance of AI safety. While I am skeptical about MIRI’s approach to things, I am also a fan of MIRI. Where this puts me relative to the target demographic of this post, I cannot say.]
Overall Summary
I think this post is pretty good. It’s a solid and well-written introduction to some of the intuitions behind AI alignment and the fundamental research that MIRI does. At the same time, the use of analogy made the post more difficult for me to parse and hid some important considerations about AI alignment from view. Though it may be good (but not optimal) for introducing some people to the problem of AI alignment and a subset of MIRI’s work, it did not raise or lower my opinion of MIRI as someone who already understood AGI safety to be important.
To be clear, I do not consider any of these weaknesses serious because I believe them to be partially irrelevant to the audience of people who don’t appreciate the importance of AI-Safety. Still, they are relevant to the audience of people who give AI-Safety the appropriate scrutiny but remain skeptical of MIRI. And I think this latter audience is important enough to assign this article a “pretty good” instead of a “great”.
I hope a future post directly explores the merit of MIRI’s work on the context AI alignment without use of analogy.
Below is an overview of my likes and dislikes in this post. I will go into more detail about them in the next section, “Evaluating Analogies.”
Things I liked:
- It’s a solid introduction to AI-alignment, covering a broad range of topics including:
  - Why we shouldn’t expect aligned AGI by default
  - How modern conversation about AGI behavior is problematically underspecified
  - Why fundamental deconfusion research is necessary for solving AI-alignment
- It directly explains the value/motivation of particular pieces of MIRI work via analogy—which is especially nice given that it’s hard for the layman to actually appreciate the mathematically complex stuff MIRI is doing
- On the whole, the analogy is elegant
Things I disliked:
- Analogizing AI alignment to rocket alignment created a framing that hid important aspects of AI alignment from view and (unintentionally) stacked the deck in favor of MIRI.
  - A criticism of rocket alignment research with a plausible AI alignment analog was neglected (and could only be addressed by breaking the analogy).
  - An argument in favor of MIRI for rocket alignment had an AI analog that was much less convincing when considered in the context of AI alignment unique facts.
- The cognitive effort I spent mapping the rocket alignment problem to the AI alignment problem took more cognitive effort than just directly reading justifications of AI alignment and MIRI
- The world-building wasn’t great
  - The actual world of the dialogue is counterintuitive—imagine a situation where planes and rockets exist (or don’t exist, but are being theorized about), but no one knows calculus (despite modeling cannonballs pretty well) or how centripetal force+gravity works. It’s hard for me to parse the exact epistemic meaning of any given statement relative to the world
  - The world-building wasn’t particularly clear—it took me a while to completely parse that calculus hadn’t been invented.
- There’s a lot of asides where Beth (a stand-in for a member of MIRI) makes nontrivial scientific claims that we know to be true. While this is technically justified (MIRI does math and is unlikely to make claims that are wrong; and Eliezer has been right about about a lot of stuff and does deserve credit), it probably just feels smug and irritating to people who are MIRI-skeptics, aka this post’s probable target.
Evaluating Analogies
Since this post is intended as an analogy to AI alignment, evaluating its insights requires two steps. First, one must re-interpret the post in the context of AI alignment. Second, one must take that re-interpretation and see whether it holds up. This means that, if I criticize the content of this post—my criticism might be directly in error or my interpretation could be in error.
1. The Alignment Problem Analogy:
Overall, I think the analogy between the Rocket Alignment Problem and the AI Alignment Problem is pretty good. Structurally speaking, they’re identical and I can convert one to the other by swapping words around:
Rocket Alignment: “We know the conditions rockets fly under on Earth but, as we make our rockets fly higher and higher, we have reasons to expect those conditions to break down. Things like wind and weather conditions will stop being relevant and other weird conditions (like whatever keeps the Earth moving around the sun) will take hold! If we don’t understand those, we’ll never get to the moon!”
AI Alignment: “We know the conditions that modern AI performs under right now, but as we make our AI solve more and more complex problems, we have reason to expect those conditions to break down. Things like model overfitting and sample-size limitations will stop being relevant and other weird conditions (like noticing problems so subtle and possible decisions so clever that you as a human can’t reason about them) will take hold! If we don’t understand those, we’ll never make an AI that does what we want!”
1a. Flaws In the Alignment Problem Analogy:
While the alignment problem is pretty good, it leaves out the key and fundamentally important fact that failed AI Alignment will end the world. While it’s often not a big deal when an analogy isn’t completely accurate, missing this fact leaves MIRI-skeptics with a pretty strong counter-argument that can only exist outside of the analogy:
In Rocket Alignment terms -- “Why bother thinking about all this stuff now? If conditions are different in space, we’ll learn that when we start launching things into space and see things happen to them? This sounds more efficient than worrying about cannonballs.”
In AI Alignment terms -- “Why bother thinking about all this stuff now? If conditions are different when AI start getting clever, we’ll learn about those differences once we start making actual AI that are clever enough to behave like agents. This sounds more efficient than navel-gazing about mathematical constructs.”
If you explore this counter-argument and its counter-counter-argument deeper, the conversation gets pretty interesting:
MIRI-Skeptic: Fine okay. The analogy breaks down there. We can’t empirically study a superintelligent AI safely. But we can make AI that are slightly smarter than us but put security mechanisms around them that only AI extremely smarter than us would be expected to break. Then we can learn experimentally from the behavior of those AI about how to make clever AI safe. Again, easier than navel-gazing about mathematical constructs and we might expect this to happen because slow take-off.
MIRI-Defender: First of all, there’s no theoretical reason we would expect to be able to extrapolate the behavior of slightly clever AI to the behavior of extremely clever AI. Second, we have empirical reasons for thinking your empirical approach won’t work. We already did a test-run of your experiment proposal with a slightly clever being; we put Eliezer Yudkowsky in an inescapable box armed with only a communication tool and the guard let him out (twice!).
MIRI-Skeptic: Fair enough but… [Author’s Note: There are further replies to MIRI-Defender but this is a dialogue for another day]
Given that this post is supposed to address MIRI skeptics and that the aforementioned conversation is extremely relevant to judging the benefits of MIRI, I consider the inabillity to address this argument to be a flaw—despite it being an understandable flaw in the context of the analogy used.
2. The Understanding Intractably Complicated Things with Simple Things Analogy:
I think that this is a cool insight (with parallels to inverse-inverse problems) and the above post captures it very well. Explicitly, the analogy is this: “Rocket Alignment to Cannonballs is like AI Alignment to tiling agents.” Structurally speaking, they’re identical and I can convert one to the other by swapping words around:
Rocket Modeling: “We can’t think about rocket trajectories using actual real rockets under actual real conditions because there are so many factors and complications that can affect them. But, per the rocket alignment problem, we need to understand the weird conditions that rockets need to deal with when they’re really high up and these conditions should apply to a lot of things that are way simpler than rockets. So instead of dealing with the incredibly hard problem of modeling rockets, let’s try really simple problems using other high-up fast-moving objects like cannonballs.”
AI Alignment: “We can’t think about AI behavior using actual AI under actual real conditions because there are so many factors and complications that can affect them. But, per the AI alignment problem, we need to understand the weird conditions that AI need to deal with when they’re extremely intelligent and these conditions should apply to a lot of things that are way simpler than modern AI. So instead of dealing with the incredibly hard problem of modeling AI, let’s try the really simple problem of using other intelligent decision-making things like Tiling Agents.”
3. The “We Need Better Mathematics to Know What We’re Talking About” Analogy
I really like just how perfect this analogy is. The way that AI “trajectory” and literal physical rocket trajectory line-up feels nice.
Rocket Alignment: “There’s a lot of trouble figuring out exactly where a rocket will go at any given moment as it’s going higher and higher. We need calculus to make claims about this.”
AI alignment: “There’s a lot of trouble figuring out exactly what an AI will do at any given moment as it gets smarter and smarter (ie self-modification but also just in general). We need to understand how to model logical uncertainty to even say anything about its decisions.”
4. The “Mathematics Won’t Give Us Accurate Models But It Will Give Us the Ability to Talk Intelligently” Analogy
This analogy basically works...
Rocket Alignment: “We can’t use math to accurately predict rockets in real life but we need some of if so we can even reason about what rockets might do. Also we expect our math to get more accurate when the rockets get higher up.”
AI alignment: “We can’t use math to accurately predict AGI in real life but we need some of if so we can even reason about what AGI might do. Also we expect our math to get more accurate when the AGI gets way smarter.”
I also enjoy the way this discussion lightly captures the frustration that the AI Safety community has felt. Many skeptics have claimed their AGIs won’t become misaligned but then never specify the details of why that wouldn’t have it. And when AI Safety proponents produce situations where the AGI does become misaligned, the skeptics move the goal posts.
4a. Flaws in the “Mathematics Won’t Give Us Accurate Models But It Will Give Us the Ability to Talk Intelligently” Analogy
On a cursory glance, the above analogy seems to make sense. But, again, this analogy breaks down on the object level. I’d expect being able to talk precisely about what conditions affect movement in space to help us make better claims about how a rocket would go to the moon because that is just moving in space in a particular way. The research (if successful) completes the set of knowledge needed to reach the goal.
But being able to talk precisely about the trajectory of an AGI doesn’t really help us talk precisely about getting to the “destination” of friendly AGI for a couple reasons:
- For rocket trajectories, there are clear control parameters that can be used to exploit the predictions made by a good understanding of how trajectories work. But for AI alignment, I’m not sure what would constitute a control parameter that would exploit a hypothetical good understanding of what strategies superintelligent beings use to make decisions.
- For rocket trajectories, the knowledge set of how to get a rocket into a point in outer-space and how to predict the trajectories of objects in outer-space basically encompass the things one would need to know to get that rocket to the moon. For AGI trajectories, the trajectories depend on three things: it’s decision theory (a la logical uncertainty, tiling agents, decision theory...), the actual state of the world that the AGI perceives (which is fundamentally unknowable to us humans, since the AGI will be much more perceptive than us), and its goals (which are well-known to be orthogonal to the AGI’s actual strategy algorithms).
- Given the above, we know scenarios where we understand agent foundations but not the goals of our agents won’t work. But, if we do figure out the goals of our agents, it’s not obvious that controlling those superintelligent agents’ rationality skills will be a good use of our time. After all, they’ll come up with better strategies than we would.
  - Like I guess you could argue that we can view our goals as the initial conditions and then use our agent foundations to reason about the AGI behavior given those goals and decide if we like its choices… But again, the AGI is more perceptive than us. I’m not sure if we could capably design toy circumstances for an AGI to behave under that would reflect the circumstances of reality in a meaningful way
  - Also, to be fair, MIRI does work on goal-oriented stuff in addition to agent-oriented stuff. Corrigibility ,which the post later links to, is an example of this. But, frankly, my expectation that this kind of thing will pan out is pretty low.
In principle, the rocket alignment analogy could’ve written in a way that captured the above concerns. For instance, instead of asking the question “How do we get this rocket to the moon when we don’t understand how things move in outer-space?”, we could ask “How do we get this rocket to the moon when we don’t understand how things move in outer-space, we have a high amount of uncertainty about what exactly is up there in outer-space, and we don’t have specifics about what exactly the moon is?”
But that would make this a much different, and much more epistemologically labyrinthian post.
Minor Comments
1. I appreciate the analogizing of an awesome thing (landing on the moon) to another awesome thing (making a friendly AGI). The AI safety community is quite rationally focused mostly on how bad a misaligned AI would be but I always enjoy spending some time thinking about the positives.
2. I noticed that Alfonso keeps using the term “spaceplanes” and Beth never does. I might be reading into it but my understanding is that this is done to capture how deeply frustrating when people talk about the thing you’re studying (AGI) like it’s something superficially similar but fundamentally different (modern machine-learning but like, with better data).
However, coming into this dialogue without any background on the world involved, the apparent interchangeability of spaceplane and rocket just felt confusing.
3.
As an example of work we’re presently doing that’s aimed at improving our understanding, there’s what we call the “tiling positions” problem. The tiling positions problem is how to fire a cannonball from a cannon in such a way that the cannonball circumnavigates the earth over and over again, “tiling” its initial coordinates like repeating tiles on a tessellated floor –
Because of the deliberate choice to analogize tiling agents and tiling positions, I spent probably five minutes trying to figure out exactly what the relationship between tiling positions and rocket alignment meant about tiling agents and AI alignment. It seems to me tiling isn’t clearly necessary in the former (understanding any kind of trajectory should do the job) while it is in the latter (understanding how AI can guarantee similar behavior in agents it creates seems fundamentally important).
My impression now is that this was just a conceptual pun on the idea of tiling. I appreciate that but I’m not sure it’s good for this post. The reason I thought so hard about this was also because the Logical Discreteness/Logical Uncertainty analogy seemed deeper.

Isnasene 14 Dec 2019 6:21 UTC
23 points
on: ialdabaoth is banned
As a guy on the internet, I mostly agree with this post in the sense that I think points you bring up to warrant a ban. That said...
Suppose instead you are running a trading fund, and someone previously convicted of fraud sends you an idea for a new financial instrument. Here, it seems like you should be much more suspicious, not just of the idea but also of your ability to successfully notice the trap if there is one. It seems relevant now to check both whether the idea is true and whether or not it is manipulative. Rather than just performing a process that catches simple mistakes or omissions, one needs to perform a process that’s robust to active attempts to mislead the judging process.
...
I think the middle case is closest to the situation we’re in now, for reasons like those discussed in comments by jimrandomh and by Zack_M_Davis. Much of ialdabaoth’s output is claims about social dynamics and reasoning systems that seem, at least in part, designed to manipulate the reader, either by making them more vulnerable to predation or more likely to ignore him / otherwise give him room to operate.
Having read Affordance Widths and seeing the way that it may be used to justify awful behavior, I don’t see the risks of these kinds of posts being much higher than the lot of Less Wrong and Rationalist style writing. Less Wrong and Rationalist style writing by nature talks in the abstract about a lot of really broad ideas that can have significant implications for how someone should make decisions in real life and, unless you’re already a very skilled rationalists, you can botch those implications in really damaging ways (personally speaking, reading Eliezer’s meta-ethics sequence when I was 14 was a mistake. But scrupolosity in general can also be a mine-field). Also, epistemic learned helplessness is a thing and it’s especially a rationalist thing.
So, regarding the above justification:
- from an epistemic point of view, Ialdabaoth’s post (Affordance Widths) does not strike me as intrinsically more harmful than other posts
- from a manipulation point of view, Ialdabaoth’s post (Affordance Widths) does not strike me as intrinsically more manipulator-friendly than a lot of other posts
- while Affordance Widths is more manipulator friendly than a lot of other posts in the sense that at least one manipulator (Ialdabaoth) knows that it can be used for manipulation, I do not think this is very relevant because
[Epistemological Status: Maybe the rationalist community dynamic is unusual and I’m mis-gauging things here].
- - 1. Using Affordance Widths to manipulate people into doing things for you is basically a fancy pseudo-rationalist way of manipulating people into doing things for you by making them feel guilty and responsible. This is such a common way for people to get manipulated that, from a pragmatic perspective, I’m skeptical that Affordance Widths allows manipulators to be more dangerous than they would have otherwise been just engaging in direct emotional manipulation.
  - 2. Even the epistemics used in Affordance Widths already exist. People who have been disenfranchised in various ways do use their own personal struggles as a way to convey an implicit duty to those around them pretty frequently. In my circles, memetic immune systems have even built up against these sorts of things (ie the phrase “mental health is not an excuse”). Affordance Widths strikes me as epistemically superfluous in the context of the world’s current epistemic environment. Moreover, I could imagine good people who don’t do things like Ialdabaoth did writing a post very similar to Affordance Widths and, if someone else wrote this post, I really doubt that it would be banned.
  - 3. As you note, posts that may be both manipulative but also epistemically useful (as Affordance Widths is) merit consideration if you believe that the post’s safety can be screened. Less Wrong has a uniquely intelligent community and a pretty well-regarded comments section so my expectation would be that someone here should be around to identify epistemological traps in posts in general. If this expectation is appropriate, then accepting this kind of post shouldn’t be considered risky. If it isn’t appropriate, yall have bigger problems.
    As a caveat, it’s of course possible for someone being manipulated to gloss over the comments. But the set of people who get manipulated if and only if they are subjected to epistemologically manipulative (rather than emotionally manipulative) traps who also gloss ove the comments on an article is probably smaller than the set of such people who would read the comments and update away from Ialdabaoth’s claims. Of course someone could get emotionally manipulated enough to gloss over the comments but is somehow resistant to full manipulation in a way that can only be achieved through abstract epistemic posts but thisi a really specific trajectory compared to a lot of others
Of course, despite my dislike of the analogy and its focus on potentially harmful Less Wrong posts, I still support the ban. It’s important to have an epistemic immune system and because:
#1. Less Wrong, and any community focused on self-analysis and improvement, requires a high trust environment
#2. Ialdabaoth has demonstrated clearly manipulative behaviors in real life, causing a lot of harm
#3. We cannot separate Ialdabaoth’s real life manipulative behavior from manipulative behavior on Less Wrong
#4. Ialdabaoth should therefore be banned on Less Wrong for the sake of maintaining a high-trust environment
While posts like Affordance Widths are supporting evidence of #3, I think that, given things Ialdabaoth has done, claim #3 should really be treated as the default assumption even sans that kind of supporting evidence. And this is even more true in this particular context where Less Wrong’s community apparently overlaps so much with his real life community. We shouldn’t give people the benefit of the doubt about compartmentalizing bad behavior just to areas that don’t affect us and we definitely shouldn’t give them the benefit of the doubt when the areas with and without bad behavior aren’t mutually distinct.
What links here?
- Vaniver's comment on ialdabaoth is banned by Vaniver (15 Dec 2019 6:48 UTC; 23 points)

Isnasene 8 Jan 2020 7:49 UTC
LW: 21 AF: 7
AF
on: (Double-)Inverse Embedded Agency Problem
I thought about this for longer than expected so here’s an elaboration on inverse-inverse problems in the examples you provided:
Partial Differential Equations
Finding solutions to partial differential equations with specific boundary conditions is hard and often impossible. But we know a lot of solutions to differential equations with particular boundary conditions. If we match up those solutions with the problem at hand, we can often get a decent answer.
The direct problem: you have a function; figure out what relationships its derivatives have and its boundary conditions
The inverse problem: you know a bunch of relationships between derivatives and some boundary conditions; figure out the function that satisfies these conditions
The inverse inverse problem: you have a bunch of solutions to inverse problems (ie you can take a bunch of functions, solve the direct problem, and now you know the inverse problem that the function is a solution to), figure out which of these solutions look like the unsolved inverse problem you’re currently dealing with
Arithmetic
Performing division is hard but adding and multiplying is easy.
The direct problem: you have two numbers A and B; figure out what happens when you multiply them
The inverse problem: you have two numbers A and C; figure out what you can multiply A by to produce C
The inverse inverse problem: you have a bunch of solutions to inverse problems (ie you can take A and multiply it by all sorts of things like B’ to produce numbers like C’, solving direct problems. Now you know that B’ is a solution to the inverse problems where you must divide C’ by A. You just need to figure out out which of these inverse problem solutions look like the inverse problem at hand (ie if you find a C’ so C’ = C, you’ve solved the inverse problem)
In The Abstract
We have a problem like “Find X that produces Y” which is a hard problem from a broader class of problems. But we can produce a lot of solutions in that broader class pretty quickly by solving problems of the form “Find the Y’ that X’ produces.” Then the original problem is just a matter of finding a Y’ which is something like Y. Once we achieve this, we know that X will be something like X’.
Applications for Embedded Agency
The direct problem: You have a small model of something, come up with a thing much bigger than the model that the model is modeling well
The inverse problem: You have a world; figure out something much smaller than the world that can model it well
The inverse inverse problem: You have a a bunch of worlds and a bunch of models that model them well. Figure out which world looks like ours and see what it’s corresponding model tells us about good models for modeling our world.
Some Theory About Why Inverse-Inverse Solutions Work
To speak extremely loosely, the assumption for inverse-inverse problems is something along the lines of “if X’ solves problem Y’, then we have reason to expect that solutions X similar to X’ will solve problems Y similar to Y’ ”.
This tends to work really well in math problems with functions that are continuous/analytic because, as you take the limit of making Y’ and Y increasingly similar, you can make their solutions X’ and X arbitrarily close. And, even if you can’t get close to that limit, X’ will still be a good place to start work on finagling a solution X if the relationship between the problem-space and the solution-space isn’t too crazy.
Division is a good example of an inverse-inverse problem with a literal continous and analytic mapping between the problem-space and solution-space. Differential equations with tweaked parameters/boundary conditions can be like this too although to a much weaker extent since they are iterative systems that allow dramatic phase transitions and bifurcations. Appropriately, inverse-inversing a differential equation is much, much harder inverse-inversing division.
From this perspective, the embedded agency inverse-problem is much more confusing than ordinary inverse-inverse problems. Like differential equations, there seem to be many subtle ways of tweaking the world (ie black swans) that dramatically change what counts as a good model.
Fortunately, we also have an advantage over conventional inverse problems: Unlike multiplying numbers or taking derivatives which are functions with one solution (typically—sometimes things are undefined or weird), a particular direct problem of embedded agency likely has multiple solutions (a single model can be good at modeling multiple different worlds). In principle, this makes things easier -- it’s more Y’ (worlds that embedded agency is solved in) that we can compare to our Y (actual world).
Thoughts on Structuring Embedded Agency Problems
- Inverse-inverse problems really on leveraging similarities between an unsolved problem and a solved problem which means we need to be really careful about defining things
  - Defining what it means to be a solution (to either the direct problem or inverse problem)
    Defining a metric of good upon which we can use to compare model goodness or define worlds that models are good for. This requires us to either pick a set of goals that our model should be able to achieve or go meta and look at the model over all possible sets of goals (but I’m guessing this latter option runs into a No-Free-Lunch theorem). This is also non-trivial—different world abstractions are good for different goals and you can’t have them all
    Defining a threshold after which we treat a world as a solution to the question “find a world that this model does well at.” A Model:World pair can range a really broad spectrum of model performance
  - Defining what it means for a world to be similar to our own. Consider a phrase like “today’s world will be similar to tomorrow if nothing impacts on it.” This sort of claim makes sense to me but impact tends to be approached through Attainable Utility Preservaton
What links here?
- Gordon Seidoh Worley's comment on (Double-)Inverse Embedded Agency Problem by shminux (8 Jan 2020 22:33 UTC; 4 points)

Isnasene 16 Jan 2020 7:03 UTC
20 points
on: Go F*** Someone
I had fun reading this post. But as someone who has a number of meaningful relationships but doesn’t really bother dating, I was also confused of what to make of it.
Also, given that this is Rationalism-Land, its worth keeping in mind that many people who don’t date got there because they have an unusually low prior on the idea that they will find someone they can emotionally connect with. This prior is also often caused by painful experience that advice like “date more!” will tacitly remind them of.
Anyway, things that I agree with you on:
- Dating is hard
- Self-improvement is relatively easy compared to being emotionally vulnerable
- I hate the saying “you do you.” I emotionally interpret it as “here’s a shovel; bury yourself with it”
Things I disagree with you on:
- We aren’t more lonely because of aggressively optimizing relationships for status rather than connection; we’re more lonely because the opportunity cost of going on dates is unusually high. Many reasons for this:
  - It’s easier than ever to unilaterally do cool things (ie learn guitar from the internet, buy arts and crafts off Amazon). And, as you noted, there’s a cottage industry for making this as awesome as possible
  - It’s easier than ever to defect from your local community and hang out with online people who “get” you
  - This causes a feedback loop that reduces the people looking to date, which increases the effort it dates to date, which reduces the number of people looking to date. Everyone is else defecting so I’m gonna defect too
- I think the general conflation of “self-improvement” with “bragging about stuff on social media” is odd in the context you’re discussing. People who aren’t interested in the human connection of dates generally don’t get much out of social media. At least in my bubble, people who are into self-improvement tend to do things like delete facebook.
- If you’re struggling to build financial capital, the goal is to keep doing that until you’re financially secure. The goal very much isn’t to refocus your efforts on going on hundreds of dates to learn how to make others happy.

Isnasene 8 Jan 2020 3:57 UTC
19 points
on: Open & Welcome Thread—January 2020
Hey yall; I’ve been around for long enough—may as well introduce myself. I’ve had this account for a couple months but I’ve been lurking off-and-on for about ten years. I think it’s pretty amazing that after all that time, this community is still legit. Keep up the good work, everyone!
Things I hope to achieve through my interactions with Less Wrong:
- Accidentally move the AI Safety field sligthly forward by making a clever comment on something
- Profound discussions (Big fan of that whole thing with Internal Family Systems, also interested in object-level discussion about how to navigate Real Life $^{TM}$ )
- Friends? (yeah I know; internet rationality forums aren’t particularly conducive to this but what’re ya gonna do? I need some excuse to run away to California)
Current status (stealing Mathisco’s idea): United States, just outta college, two awesome younger cousins who I spend too much time with, AI/ML capabilities research in finance, bus-ride to work, trying to learn guitar.
Coolest thing I’ve ever done: When I was fifteen, I asked my dad for a slim jim and he accidentally tossed two at me at the same time. I raised my hand and caught one slim jim betwen my pinky and ring finger and the other between my middle and index finger, wolverine claw style.
...
PS: Is it just me or are the Open Threads kind of out of the way? My experience with Open Thread Posts has been
1. See them in the same stream as regular Less Wrong posts
2. Click on them at my leisure
3. Notice that there are only a few comments (usually introductions)
4. Forget about it until the next Open Thread
As a result, I was legitimately surprised to see the last Open Thread had ~70 comments! No idea whether this was just a personal quirk of mine or a broader site-interaction pattern.

Isnasene 3 Apr 2020 21:17 UTC
17 points
on: Has LessWrong been a good early alarm bell for the pandemic?
While I agree with the specific claims this post is making (i.e. “Less Wrong provided information about coronavirus risk similar to or just-lagging the stock market”), I think it misses the thing that matters. We’re a rationality forum, not a superintelligent stock-market-beating cohort[1]! Compared to the typical human’s response to coronavirus, we’ve done pretty well at recognizing the dangers posed by the exponential spread of pandemics and acting accordingly. Compared to the very smart people who make money by predicting the economic effects of a virus, we’ve done expectedly mediocre—after all none of us (including the stock market) really had any special information about the virus’s trajectory.
Maybe it is disappointing if we lagged the stock market instead of being perfectly on pace with it but a week of lag is a pretty small amount of time in the grand scheme of things. And I’d expect different auditing methodologies/interpretations to have about that amount in variance. In any case, I don’t really think that it’s a big deal.
[1]That is, unless you count Bitcoin, which Eliezer Yudkowsky doesn’t.

Isnasene 29 May 2020 14:01 UTC
LW: 15 AF: 8
AF
on: OpenAI announces GPT-3
A year ago, Joaquin Phoenix made headlines when he appeared on the red carpet at the Golden Globes wearing a tuxedeo with a paper bag over his head that read, “I am a shape-shifter. I can’t change the world. I can only change myself.”
-- GPT-3 generated news article humans found easiest to distinguish from the real deal.
… I haven’t read the paper in detail but we may have done it; we may be on the verge of superhuman skill at absurdist comedy! That’s not even completely a joke. Look at the sentence “I am a shape-shifter. I can’t change the world. I can only change myself.” It’s successful (whether intended or not) wordplay. “I can’t change the world. I can only change myself” is often used as a sort of moral truism (e.g. Man in the Mirror, Michael Jackson). In contrast, “I am a shape-shifter” is a literal claim about one’s ability to change themselves.
The upshot is that GPT-3 can equivocate between the colloquial meaning of a phrase and the literal meaning of a phrase in a way that I think is clever. I haven’t looked into whether the other GPTs did this (it makes sense that a statistical learner would pick up this kind of behavior) but dayum.

Isnasene 25 Mar 2020 15:39 UTC
14 points
on: Adding Up To Normality
I think the strongest version of this idea of adding p to normality is “new evidence/knowledge that contradicts previous beliefs does not invalidate previous observations.” Therefore, when one’s actions are contingent on things happening that have already been observed to happen, things add up to normality because it is already known that those things happen—regardless of any new information.But this strict version of ‘adding up to normality’ does not apply in situations where one’s actions are contingent on unobservables. In cases where new evidence/knowledge may cause someone to dramatically revise the implications of previous observations, things don’t add up to normality. Whether this is the case or not for you as an individual depends on your gears-level understanding of your observations.
So in retrospect, the main thing I’d recommend is to promise yourself to keep steering the plane mostly as normal while you think about lift
I somewhat disagree with this. I think, in these kinds of situations, the recommendation should be more along the lines of “promise yourself to make the best risk/reward trade-off you can given your state of uncertainty.” If you’re flying in a plane that has a good track record of flying, definitely don’t touch anything because its more risky to break something that has evidence of working than it is rewarding to fix things that might not actually work. But if you’re flying in the world’s first plane and realize you don’t understand lift, land it as soon as possible.
Some Reasons Things Add Up to Normality
- If you think the thing you don’t understand might be a Chesterton’s Fence, there’s a good chance it will add up to normality
- If you think the thing you don’t understand can be predicted robustly by inductive reasoning and you only care about being able to accurately predict the thing itself, there’s a good chance it will add up to normality
Some Examples where Things Don’t Add Up
Example #1 (Moral Revisionism)
You’re an eco-rights activist who has tirelessly worked to make the world a better place by protecting wildlife because you believe animals have the right to live good lives on this planet too. Things are going just fine until your friend claims that R-selection implies most animals live short horrible lives and you realize you have no idea whether animals actually live good lives in the wild. Should you immediately panic in fear that you’re making things worse?
Yes. Whether or not the claim in question is accurate, your general assumption that protecting wildlife implies improved animal welfare was not well-founded enough to address significant moral risk. You should really stop doing wildlife stuff until you get this figured out or you could actually cause bad things to happen.
Example #2 (Prediction Revisionism)
You’ve built an AGI and, with all your newfound free-time and wealth, you have a lengthy chat with a mathematician. Things are going along just fine until they point out to you that your understanding of the safety measures used to ensure alignment are wrong, and that the AGI shouldn’t be aligned from the safety measures you thought were responsible.Should you immediately panic in fear that the AGI will destroy us all?
Yes. The previous observations are not sufficient to make reliable predictions. But note that a random bystander who is uninvolved with AGI development would be justified in not panicking—their gears-level understanding hinges on believing that the people who created the AGI are competent enough to address safety, not on believing that the specific details designed to make the AGI safe actually work.
What links here?
- Adding Up To Normality by orthonormal (24 Mar 2020 21:53 UTC; 84 points)

Isnasene 18 Mar 2020 15:58 UTC
14 points
on: Assorted thoughts on the coronavirus
This is anecdotal but last week I read the article by Mr Money Mustache which you linked. As part of it, he posts this picture with the caption “I went out on the town at the peak of the scare. The reality is different from the news headlines.”
Then I went to Venkatesh Rao’s twitter and was immediately confronting with this picture. Stores empty. People are in danger. This is an exceptional case given Venkatesh’s location and the timing. Nevertheless, the simple fact that Mr Money Mustache describes the picture as being at the peak of the scare has seriously lowered my faith in him. As if it was a scare. As if it wasn’t going to get worse.
“Alas, it is hard to overreact. We did ordinary cheap preparing. We had a month’s worth of food, all our medicines and stuff like that. Initially I thought that would be the plan.”
After reading Mr. Money Mustache’s take on the coronavirus, I started having a few doubts about how bad it actually is. I didn’t realize that 2M people in America die each year of things related to “lifestyle factors”.
No. Never compare the effects of things like death from “lifestyle factors”—things that happen because people willingly trade-off having a long-time for having a good-time, things subject to hyperbolic discounting, things that (on an individual level) are really very hard to track the effects of—with an imminent risk that 1-10% of everyone dies within the next two years. Personally, covid poses little threat to me but we don’t know the end-game here: we’re fighting between potentially lengthy economic shutdowns and the possibility of containment failure and global health system collapse. And if low-income people are forced back to work due to money-needs before containment succeeds, the economy crashes and our healthcare system fails.
Is losing money really going to be that bad?
Once you have enough money, losing 50-90% of your wealth really isn’t that bad at all—which is I like the idea of earning-to-give once I’m confident in my runway. Indeed, if you’re the kind of person who reads Mr Money Mustache, you’re probably going to be fine in general.
For my low-income friends though, yes. Yes it is going to be that bad. Sometimes people don’t have jobs. Sometimes people don’t have savings. A large portion of people live paycheck to paycheck. Many people are going to die because of the virus. Many people are going to die because our healthcare systems will at least partially fail. Many people are going to die because that is what the economics imply.
What links here?
- Adam Zerner's comment on Assorted thoughts on the coronavirus by Adam Zerner (18 Mar 2020 21:59 UTC; 3 points)

Isnasene 23 Oct 2019 2:21 UTC
LW: 14 AF: 5
AF
in reply to: Stuart_Armstrong’s comment on: All I know is Goodhart
[Retracted my other reply due to math errors]
This is only true for the kind of things humans typically care about; this is not true for utility functions in general. That’s the extra info we have.
While I generally agree that there can be utility functions that aren’t subject to Goodhart, I don’t think that this strictly pertains to humans. I expect that when the vast majority of agents (human or not) use scientific methods to develop a proxy for the thing it wants to optimize, they will found that proxy to break down upon intense optimization:
-proxies are learned in a certain environment where it works to predict the utility function
-aggressively optimizing anything enough will usually change the environment dramatically
-so aggressively optimizing a given proxy will eventually violate the assumptions under which the proxy was created
-if the assumptions that justify the proxy’s design don’t hold, optimizing it further will be akin to acting randomly. This can be achieved by the “doing nothing” policy without the added spending of resources
-when the world is in a state where agentic actions have increased the value of a utility function, behaving randomly seems more likely to reduce the utility function than increasing it in the same way that randomness tends to push worlds towards states of higher entropy rather than lower ones.
The last point is kind-of handwaivey since we can have a utility function like “maximize entropy” which can provide many proxies which don’t get Goodhart’d (in the sense of optimization making things worse rather than just not making them better). Still, “Goodhart’s Law applies to agents with utility functions of relatively low entropy” is much more generic than “Goodhart’s Law applies to humans.” I’m also not sure how helpful that is. Even if we know that we should stop optimizing at some point, what metric do you actually use in making the decision to stop?

Isnasene 21 Mar 2020 23:02 UTC
11 points
on: Good News: the Containment Measures are Working
I shared this post with some of my friends and they pointed out that, as of 3/21/2020, the Italy and Spain curves no longer look as optimistic:
- On March 16, cases in Italy appeared to be leveling off. Immediately following that, they broke trend and began rising again. March 16 had ~3200 daily cases. March 20 has ~6000.
- Spain appeared to be leveling off up through March 17th (~1900 daily cases). But on March 18th, it spiked to ~3000. As of March 20th, things may be leveling off again but I wouldn’t draw any conclusions
- Iran’s daily cases have stayed flat for a pretty long period of time now—at around 1000 per day. This seems like it should be good news, tho I’m not sure how good: Since March 8, Iran’s death rate (closed cases) has been steadily rising from 8% to 17.5%

Isnasene 30 Dec 2019 13:47 UTC
11 points
on: Speaking Truth to Power Is a Schelling Point
Then the coalition faces a choice of the exact value of x. Smaller values of x correspond to a more intellectually dishonest strategy, requiring only a small inconvenience before resorting to obfuscatory tactics. Larger values of x correspond to more intellectual honesty: in the limit as x → ∞, we just get, “Speak the truth, even if your voice trembles (full stop).”
I don’t think that a one-parameter x% trade-off between truth-telling and social capital accurately reflects the coalitional map for a couple reasons
- x% is a ratio y:z between intellectual dishonesty and social capital, roughly speaking. The organization would need to reach a shared agreement about what it means y% more intellectually dishonest and what it means to get z% more social capital. Otherwise, there will be too much intra-coalition noise to separate the values of coalition members from the trade-offs they think they are making
  - This also means coalition members can strategically mis-estimate their level of honesty or the value of the gained social capital higher or lower depending on their individual values—deliberately obfuscating values in the organization
- Different coalitions have different opportunities for making x% trade-offs and people can generally freely enter and exit coalitions. My impression is that this differential pressure and the observed frequency with which you make x% trade-offs relative to alternative coalitions is what determines of the values of those who enter and exit the coalition—not x% itself. This means
  - x% isn’t a good Schelling point because I don’t really think it’s the parameter that is affecting the values of those involved in a colaition
  - slippery slopes are more likely to be caused by external things like the kind of trade-offs available to a coalition—as opposed to the values of the coalition itself
- social capital with external sources isn’t usually the main organizational bottle-neck. People might be willing to make an x% trade-off but first they would probably exhaust all opportunities that don’t require them to make such a trade-off. And attention is finite. This means that a lot of pressure has to be applied before people actually begin to notice the x% . Maybe it’s a Schelling point at equilibrium but I don’t think it moves very quickly
In the absence of distinguished salient intermediate points along the uniformly continuous trade-off between maximally accurate world-models and sucking up to the Emperor, the only Schelling points are x = ∞ (tell the truth, the whole truth, and nothing but the truth) and x = 0 (do everything short of outright lying to win grants). In this model, the tension between these two “attractors” for coordination may tend to promote coalitional schisms.
I think it’s more likely that, as you select for people who make x% trade-offs for your coalition’s benefit, you’ll also tend to select for people who make x% trade-offs against your coalitions benefit (unless your coalition is exclusively true-believers). This means that there’s a point before infinity where you have to maintain some organizational that provides coalition non-members with good world models or else your coalition members will fail to coordinate your coalition into having a good world-model itself.

Isnasene 4 Apr 2020 9:09 UTC
10 points
on: April Coronavirus Open Thread
I’ve been playing with the Kinsa Health weathermap data to get a sense of how effective US lockdowns have been at reducing US fever. The main thing I am interested in is the question of whether lockdown has reduced coronavirus’s r0 below 1 (stopping the spread) or not (reducing spread-rate but not stopping it). I’ve seen evidence that Spain’s complete lockdown has not worked so my expectation is that this is probably the case here. Also, Kinsa’s data has two important caveats:
- People who own smart thermometers are more likely to be health conscious which makes them more likely to be health conscious than the overall population. Kinsa may therefore overstate the effect of the lockdown by not effectively sampling the health apathetic people more likely to get the virus.
- Kinsa data cannot separate coronavirus fever symptoms with flu fever symptoms. At the early stages of coronavirus spread, seasonal flu illness dominates coronavirus illness and seasonal flu r0 is between 1-2. This means that a lockdown can easily eliminate symptoms caused by seasonal flu illness by reducing flu r0 below zero without reducing coronavirus’s r0 below zero.
  - I’m addressing this by comparing the largest amounts of observed atypical illness over the last month in different locations with their current total illness to get a conservative estimate of how much coronavirus %ill have changed.
With this in mind, my overall conclusion is that the Kinsa data does not disconfirm the possibility that we’ve reduced r0 below 1. Within the population of people who use smart thermometer’s, we’ve probably stopped the spread but it may/may not have stopped in the overall population. Here are my specific observations:
- The overall US %ill weakly suggests we may have reduced r0 below 1. It maxed out at around 5.1% ill compared to a range of 3.7-4.7 %ill . This indicates that 0.4-1.4% of overall illness was due to coronavirus and currently total illness is only 0.88%. This means that, for many values in that range, our lockdowns are actually cutting into the percent of people getting coronavirus and therefore that the virus is not growing.
- New York county NY %ill weakly suggests that we may have reduced r0 below 1. It maxed out at 6.4 %ill compared to a typical range of 2.75-4.32, indicating that 2.1-3.65% of people had coronavirus. Currently, total illness is 2.56%. Again, for most values in that range, it looks like we’re reducing the absolute amount of coronavirus.
- Cook county IL (Chicago) %ill is very weakly positive on reducing r0 below 1. It maxed out at 5.4 %ill with a range of 2.8-4.9 indicating that 0.5-2.6% of people had coronavirus. Currently the total is 0.92% which suggests we’ve likely cut into coronavirus illness. The range of typical values is so large though that its hard to reach a conclusion
- Essex country NJ (Newark) %ill doesn’t say much about r0. It maxed out at 6.1 compared to a typical range of 2.9-4.5 which implies a range of coronavirus %ill of 1.6-3.2 The current value is 2.63% which is closer to the higher end of the range so there’s no evidence that we’ve reduced the amount of coronavirus. Still %ill is continuing to trend down so this may change in the future.
- I also considered looking at Santa Clara County CA, Los Angeles County CA, and Orleans Parish LA (New Orleans) but their %ill never exceeded the atypical value by a large enough amount for me to perform comparison.
- On Mar28, the overall US %ill changed from a steep linear drop of ~-0.3%ill/day to a weaker linear drop of ~-0.1%ill/day. Also, on Mar28, both Newark’s and New York’s fast linear drop is broken with a slight increase in illness and it looks like we’re on our second leg down there now. Similar on Mar27, Chicago’s fast linear drop is broken with a a brief plateau and second leg down. No idea why this happened.

Isnasene 20 Dec 2019 0:49 UTC
10 points
on: Propagating Facts into Aesthetics
But there’s a problem that seems harder to me, which is how to change my mind about aesthetics. Sarah Constantin first brought this up in Naming the Nameless, and I’ve been thinking about it ever since.
I know this isn’t exactly what this post is about (and I support having more nuanced understandings of other people’s aesthetics) however...
Please be careful about changing your mind about aesthetics! Especially you currently value the aesthetic as important! And if you do choose to change your mind about aesthetics, remember to preemptively build-up a Schelling Fence to protect yourself!
Changing aesthetics in general isn’t that hard—I’ve done it myself (more explicitly, one of my core values “ate” another one of my cores values through sustained psychological warefare). Results of this process include
- Accidentally modifying aesthetics you didn’t intend to modify (since aesthetics exist as a fuzzy network of associations in a feedback loop, changing one aesthetic may interfer with the feedback loops in other aesthetic systems in unpredictable ways)
- Accidentally modifying meta-level aesthetics you didn’t intend to modify. This encomposses a number of possibilities including
  - Rendering yourself meta-level incorrigible to manage the horrifying knowledge that you can, in principle, will yourself out of existence at any time with relative ease (psychological modification doesn’t trigger the same visceral response that literal death does)
  - Or rendering yourself meta-level incorrigible by becoming intellectually indifferent to whether things actually satisfy your core values (and just having whatever core values you have at the time your brain decides to do this
  - Having really weird object-level core values because your meta-level core values and object-level core values are fuzzily interlinked
IDK, in my case, modifying my aesthetic was a good decision and you may only be psychologically capable of modifying your aesthetics in situations where it’s really necessary. But I’m uncertain about whether this is true in general.

Isnasene

Overall Summary

Evaluating Analogies

Minor Comments

Partial Differential Equations

Arithmetic

In The Abstract

Applications for Embedded Agency

Some Theory About Why Inverse-Inverse Solutions Work

Thoughts on Structuring Embedded Agency Problems

Some Reasons Things Add Up to Normality

Some Examples where Things Don’t Add Up

Please be careful about changing your mind about aesthetics! Especially you currently value the aesthetic as important! And if you do choose to change your mind about aesthetics, remember to preemptively build-up a Schelling Fence to protect yourself!