jimmy comments on “Flaky breakthroughs” pervade inner work — but almost no one tracks them

jimmy 5 Jun 2025 18:23 UTC
5 points
3
I’m in agreement with a lot of this. I more or less agree with your model of the problem as well as your solution of stress testing combined with setting expectations that there might be more work to do.
I’ll expand on where I diverge a bit.
How do they know whether they’re facilitating lasting growth or flaky breakthroughs?
Even if there are a hundred more blocks you’ve achieved one percent. You say the same thing from a “glass is half empty” perspective, but I think the glass is ~~half~~ one percent full perspective is the more useful one here. It may be only one percent, but you don’t actually lose that progress or end up worse off so long as you recognize it for what it is. You figured something out. You just didn’t figure everything out, yet.
It’s like climbing a mountain when you can’t see the top. For some questions you need to know how much mountain there is left, but if you’re committed to climbing it and want to make sure you’re doing it efficiently, it’s enough to track your local altitude gain—and make sure you’re approaching the upper bound set by available power/m*g.
I still think follow up data is good (And I did follow up a few months later with the person I helped in my chronic pain post. The improvement lasted), but I think it’s going too far to suggest that without long term data you can’t know if you’re helping people grow.
So partly I diverge in thinking that this data taking isn’t fully necessary, but I also diverge a bit in thinking that it’s not fully sufficient either. It’s like testing whether your cholesterol lowering drug still works 3 months down the line… but not measuring all cause mortality. And good luck measuring the equivalent of that and getting good data (as I say this, I’m reminded of an NLP “Core Transformation” study that did try to measure “global well-being” 8 weeks after a single session, and got positive results).
I’ll give an example of how this can be hard to track. Once upon a time I tried to help an online friend with her fear of needles. There may have been some marginal improvement, but I mostly just considered it to be a failure—which was surprising, since it seemed to have gone well enough that I would have expected it to help. So what data are you going to take in a case like this? “Hey, it’s been three months. Did our lack of progress hold up, or can I take credit for a flakey lack-of-breakthrough?”?
About three months later, an online friend visited and we didn’t sit down to “solve psychological problems” at all. I showed her around, took her to a rock in the ocean to jump off, just fun stuff. Then, shortly after this visit she had to get blood drawn and wasn’t afraid.
It was the same friend, so that hypothetical 3 month checkup would have actually been positive. I’m not sure whether that work we had done earlier was a necessary precondition or whether the visit alone would have been sufficient, but either 1) it was a necessary precondition, and the right interpretation of that hypothetical 3 month follow up really is “Yeah, your explicit intervention resolved my fear issue with a 3 month delay”, or else 2) the intervention that did it had nothing to do with needles whatsoever. So again, what kind of data are you going to take? “Any seemingly unrelated improvements in your life in the last three months that I can take credit for?”?
If you tunnel vision on concrete goals it gets easier to take data, but you end up missing any value that’s not captured in your explicit metric. From my experience, I think this less legible value swamps the legible value, which is… inconvenient for clean analyses. To give an example, one friend has explicitly given me credit for “a large part of [her] current happiness”, and another seems to be on the same path for the same reasons. In neither case did I sit down with them and say “Let’s solve this unhappiness thing”, and I would not have been able to get these results if I did. They both know I’m available to help them with “psychological issues” whenever they want and that I’m generally successful with that, but the concrete goals of those are much more limited and the sum of a few things like “Help me get my daughter to take her eye drops”/”help me stop overthinking jiu jitsu” pale in comparison.
The main thing I did that actually mattered was tease and talk shit, criticize, and comfort as needed. This is illegible to the point of looking like “just normal friend stuff”—and it is—but it was also guided by psychological principles aimed at the big picture. The jump rock experience wasn’t just fun, it was intentionally a reference experience for how to handle novel and scary situations with a person you know you can trust (that’s why it was fun).
The result that was life changing is that it allowed them to form relationships with their husband/fiancé that are really good, and which they would not have been able to even begin before my collection of illegible “interventions”. I can think of a third case where I intervened directly and successfully got a couple back together, now happily married four children later—but that one has the opposite problem because without a control group I have no idea if I can actually take credit. It’s really quite likely that they’d have figured out that breaking up was stupid on their own, if I hadn’t intervened to shake some sense into them.
So with the equivalent of “all cause mortality” I think I understand the default trajectory enough to make pretty good guesses in a couple cases, but that’s relying a lot on my understanding (and their understanding) of the trajectories they were on rather than hard data. Who tf knows in the other case. And without really big samples, one death does a lot to change how the “all cause mortality” looks—so better have a damn good reason to believe we’re not trading more legible benefits for less legible but larger risks.
So yes, I agree. Take the data. Check in down the line, and all that.
At the same time, know that the data you’re not taking might contain the bulk of the story, and that if you’re aiming at what you can easily measure you might be missing high leverage points for what really matters.