johnswentworth comments on An epistemic advantage of working as a moderate

johnswentworth 31 Aug 2025 18:47 UTC
22 points
3
Ok, I think one of the biggest disconnects here is that Eliezer is currently talking in hindsight about what we should learn from past events, and this is and should often be different from what most people could have learned at the time. Again, consider the syllogism example: just because you or I might have been fooled by it at the time does not mean we can’t learn from the obvious-in-some-sense foolishness after the fact. The relevant kind of “obviousness” needs to include obviousness in hindsight for the move Eliezer is making to work, not necessarily obviousness in advance, though it does also need to “obvious” in advance in a different sense (more on that below).
Short handle: “It seems obvious in hindsight that <X> was foolish (not merely a sensible-but-incorrect prediction from insufficient data); why wasn’t that obvious at the time, and what pattern do I need to be on the watch for to make it obvious in the future?”
Eliezer’s application of that pattern to the case at hand goes:
- It seems obvious-in-some-sense in hindsight that bio anchors and the Carlsmith thing were foolish, i.e. one can read them and go “man this does seem kind of silly”.
- Insofar as that wasn’t obvious at the time, it’s largely because people were selecting for moderate-sounding conclusions. (That’s not the only generalizable pattern which played a role here, but it’s an important one.)
- So in the future, I should be on the lookout for the pattern of selecting for moderate-sounding conclusions.
I think an important gear here is that things can be obvious-in-hindsight, but not in advance, in a way which isn’t really a Bayesian update on new evidence and therefore doesn’t strictly follow prediction rules.
Toy example:
- Someone publishes a proof of a mathematical conjecture, which enters canon as a theorem.
- Some years later, another person stumbles on a counterexample.
- Surprised mathematicians go back over the old proof, and indeed find a load-bearing error. Turns out the proof was wrong!
The key point here is that the error was an error of reasoning, not an error of insufficient evidence or anything like that. The error was “obvious” in some sense in advance; a mathematician who’d squinted at the right part of the proof could have spotted it. Yet in practice, it was discovered by evidence arriving, rather than by someone squinting at the proof.
Note that this toy example is exactly the sort where the right primary move to make afterwards is to say “the error is obvious in hindsight, and was obvious-in-some-sense beforehand, even if nobody noticed it. Why the failure, and how do we avoid that in the future?”.
This is very much the thing Eliezer is doing here. He’s (he claims) pointing to a failure of reasoning, not of insufficient evidence. For many people, the arrival of more recent evidence has probably made it more obvious that there was a reasoning failure, and those people are the audience who (hopefully) get value from the move Eliezer made—hopefully they will be able to spot such silly patterns better in the future.
- Thane Ruthenis 5 Sep 2025 12:46 UTC
  6 points
  0
  Parent
  I think an important gear here is that things can be obvious-in-hindsight, but not in advance, in a way which isn’t really a Bayesian update on new evidence and therefore doesn’t strictly follow prediction rules.
  That’s my model here as well. Pseudo-formalizing it: We’re not idealized agents, we’re bounded agents, which means we can’t actually do full Bayesian updates. We have to pick and choose what computations we run, what classes of evidence we look for and update on. In hindsight, we may discover that an incorrect prediction was caused by ours opting not to spend the resources on updating on some specific information, such that if we knew to do that, we would have reliably avoided the error even while having all the same object-level information.
  In other words, it’s a Bayesian update to the distribution over Bayesian updates we should run. We discover a thing about (human) reasoning: that there’s a specific reasoning error/oversight we’re prone to, and that we have to run an update on the output of “am I making this reasoning error?” in specific situations.
  This doesn’t necessarily mean that this meta-level error would have been obvious to anyone in the world at all, at the time it was made. Nowadays, we all may be committing fallacies whose very definitions require agent-foundations theory decades ahead of ours; fallacies whose definitions we wouldn’t even understand without reading a future textbook. But it does mean that specific object-level conclusions we’re reaching today would be obviously incorrect to someone who is reasoning in a more correct way.