TsviBT comments on Max H’s Shortform

TsviBT 14 Feb 2026 21:47 UTC
10 points
0
I agree with this literally, but I’d want to add what I think is a significant friendly amendment. Successes are much more informative than failures, but they are also basically impossible. You have to relax your criteria for success a lot to start getting partial successes; and my impression is that in practice, “partial successes” in “alignment” are approximately 0 informative.

If we have to retreat from successes to interesting failures, I agree this is a retreat, but I think it’s necessary. I agree that many/most ways of retreating are quite unsatisfactory / unhelpful. Which retreats are more helpful? Generally I think an idea (the idea?) is to figure out highly general constraints from particular failures. See here https://tsvibt.blogspot.com/2025/11/ah-motiva-3-context-of-concept-of-value.html#why-even-talk-about-values and especially the advice here https://www.lesswrong.com/posts/rZQjk7T6dNqD5HKMg/abstract-advice-to-researchers-tackling-the-difficult-core#Generalize_a_lot :

When an idea or proposal fails, try to generalize far. Draw really wide-ranging conclusions.

Also cf. here (https://www.lesswrong.com/posts/K4K6ikQtHxcG49Tcn/hia-and-x-risk-part-2-why-it-hurts#Alignment_harnesses_added_brainpower_much_less_effectively_than_capabilities_research_does), quoting the relevant part in full:

In alignment, on the other hand, you have to understand each constraint that’s known in order to even direct your attention to the relevant areas. This is analogous to the situation with the P vs. NP , where whole classes of plausible proof strategies are proven to not work. You have to understand most of those constraints; otherwise by default you’ll probably be working on e.g. a proof that relativizes and therefore cannot show P≠NP. Progress is made by narrowing the space, and then looking into the narrowed space.