Ben Goldhaber

Karma: 1,440

Good AI Epistemics as an Offramp from the Intelligence Explosion

Ben Goldhaber12 Feb 2026 19:18 UTC

22 points

0 comments3 min readLW link

Ben Goldhaber 10 Feb 2026 20:19 UTC
4 points
−2
on: bgold’s Shortform
It’s much much easier now to automate mechanisms like liquid democracy; you could run experiments inside organizations that would be a.) fun and b.) test practicality. Google ran an experiment in 2015 using this to select snacks; do it again but with AI delegates either representing your preferences or making snack decisions.

Ben Goldhaber 3 Feb 2026 16:10 UTC
6 points
0
on: Increasing AI Strategic Competence as a Safety Approach
I quite like this framing, and think Strategic Competence is a useful term and concept. I explored a related idea in Wise AI Advisors at the Hinge of History :
I posit that if:
1. We have trusted AI systems that people turn to as advisors
2. And the trust in the AI advisors is well placed because they have good epistemics
  Where “good epistemics” roughly means they consistently use reliable methods to figure out what is true and avoid self-deception
3. And the AI advisors have shared epistemics
  Where “shared epistemics” roughly means there is a shared foundation that allows different AI advisors and people to trust one another’s reasoning.
4. Then it implies that those AI Advisors would advise their Principals to avoid an Intelligence Explosion—if and only if this is in fact a real danger—and humanity could coordinate around this advice.
I expect Strategic Competence to largely track general model capabilities, whereas shared trusted epistemics requires more deliberate work on validation, auditing, and institution-building that won’t happen by default.
This overlaps with your points on improving AI philosophical competence, but with more focus on making epistemics verifiable and legible across different systems and actors, which is what I think would be needed (alongside model improvements) to enable guidance that people follow, and common knowledge needed for preference cascades, to get many actors to agree with wise AI advisors to prevent RSI takeoffs.

Ben Goldhaber 19 Jan 2026 19:45 UTC
4 points
0
on: Lifelink™: Freedom for your Child
$50,000 FRC legal defense insurance, access to our specialized attorneys, and a cryptographically signed Safety Log that has precedent in state courts as admissible evidence of your child’s continuous supervision via Lifelink
I like this idea of insurance here—it would help with some of my fears—and you could imagine the company building this giving higher defense budgets at the start to help fight court cases that would establish precedent that Lifelink is suitable monitoring.

Ben Goldhaber 16 Jan 2026 17:34 UTC
13 points
0
on: What’s going on at CFAR? (Updates and Fundraiser)
Congratulations to Anna and the team for cohering around a vision and set of experiments. I donated to the new CFAR; I hope you continue posting about what you learn through the upcoming workshops.
One {note? suggestion? “real spirit” discussion point?} - I feel like the framing of aCFAR was missing something important about the state of rationality today. Namely from this, the year 2026 onward, being more rational is unlikely to be a “human technique” only affair. It will look more like cyborgs and centaurs—humans using AI tools and agents in different configurations and ways to make better decisions.
I won’t belabor how good the AIs have gotten, and instead will just note that they are effective aids for rationalist techniques:
- I wrote a post about backchaining where I had Claude create malleable, customizable timelines. I found this to be a really effective way to “feel” at the S1 level the constraints and targets.
- They’re very good at making Fermi estimates.
- There’s ongoing research and experiments into using them as mediators and for fostering cooperation, à la double crux.
- They’re probably useful for Focusing and internal work too (I know the Jhourney team has been running experiments here, though I haven’t found it that effective personally).
I appreciate that it’s a Center for Applied Rationality, and maybe this particular center doesn’t need to think about the cyborg angle and can just focus on developing better models of “who-ness”. Maybe a different center should!
But it seems valuable to consider, to the extent you want to push forward the frontier of rationality. I suspect there’s some connection between the moments when AI meaningfully aids my real thinking, the moments when I’m doing slop-ful fake thinking where the AI is aiding my delusions, and the concept you’re defining as “who-ness.” Who-ness seems adjacent to taste, which might matter a lot for steering AI fleets towards goodness and meaningful concepts. And it’s probably the case that the general rationality techniques you’re imparting and working on with attendees could be more effective with AI assistance.

Ben Goldhaber 5 Jan 2026 15:25 UTC
2 points
0
on: bgold’s Shortform
In Jack Clark’s Import AI 439, he references a new paper Universally Converging Representations of Matter Across Scientific Foundation Models

> Do AI systems end up finding similar ways to represent the world to themselves? Yes, as they get smarter and more capable, they arrive at a common set of ways of representing the world.
> The latest evidence for this is research from MIT which shows that this is true for scientific models and the modalities they’re trained on: “representations learned by nearly sixty scientific models, spanning string-, graph-, 3D atomistic, and protein-based modalities, are highly aligned across a wide range of chemical systems,” they write. “Models trained on different datasets have highly similar representations of small molecules, and machine learning interatomic potentials converge in representation space as they improve in performance, suggesting that foundation models learn a common underlying representation of physical reality.”
> As with other studies of representation, they found that as you scale the data and compute models are trained on, “their representations converge further”.

This seems like useful empirical evidence for the Natural Abstraction Hypothesis (I haven’t been following progress on that research agenda, so I don’t know how significant of an update this is)

A Full Epistemic Stack: Knowledge Commons for the 21st Century

Oliver Sourbut and Ben Goldhaber

19 Dec 2025 22:48 UTC

41 points

7 comments11 min readLW link

(www.oliversourbut.net)

Ben Goldhaber 2 Dec 2025 14:44 UTC
3 points
0
in reply to: Saul Munn’s comment on: Explosive Skill Acquisition
Thank you for the comment Saul—I agree with a lot of your points, in particular that “explosive” periods are costly and inefficient (relative perhaps to some ideal), and that they are not in and of themselves a solution for long-term retention.
I expect if we have a crux it’s whether someone who intends to follow an incremental path vs someone who does an intense acquisition period is more likely to, ~ a year later, actually have the skill. And my guess is, for a number of reasons, it’s the later; I’d expect a lot of incrementalists to ‘just not actually do the thing’.
* My ideal strategy would be “explore lightly a number of things, to determine what you want → explode towards that for an intense period of time → establish incremental practices to maintain and improve”
* Your comment also highlighted for me, something that I had cut from the initial draft, my belief that explosive periods help overcome emotional blockers, which I think might be a big part of why people shy away from skills they say they want.

Ben Goldhaber 30 Nov 2025 19:25 UTC
2 points
0
in reply to: cousin_it’s comment on: Explosive Skill Acquisition
No worries, I appreciate the perspective. I agree that for many skills there is a consolidation and rest period that is needed. An obvious example is that you can’t cram all of the effort needed to build muscle into one week and expect the same kinds of returns that you would get over many months. Though, I do expect you could master the biomechanical skills of weightlifting much faster with that attitude!
If you have examples of the multidimensional learning schedule, I’d love to hear them. I’m imagining something like {30 minutes of spanish language shows}?

Ben Goldhaber 30 Nov 2025 18:56 UTC
2 points
0
in reply to: philip_b’s comment on: Explosive Skill Acquisition
thanks for sharing a data point on that claim

Explosive Skill Acquisition

Ben Goldhaber30 Nov 2025 17:03 UTC

52 points

10 comments5 min readLW link

(bengoldhaber.substack.com)

The First Thanksgiving

Ben Goldhaber27 Nov 2025 20:36 UTC

37 points

1 comment4 min readLW link

(bengoldhaber.substack.com)

Ben Goldhaber 26 Nov 2025 4:06 UTC
LW: 6 AF: 2
4
AF
on: Alignment will happen by default. What’s next?
Thank you for the post! I have also been (very pleasantly!) surprised by how aligned current models seem to be.
I’m curious if you expect ‘alignment by default’ to continue to hold in a regime where continuous learning is solved and models are constantly updating themselves/being updated by what they encounter in the world?

Chain of Thought not producing evidence of scheming or instrumental convergence does seem like evidence against, but it seems quite weak to me as a proxy for what to expect from ‘true agents’. CoT doesn’t run long enough or have the type of flexibility I’d expect to see in an agent that was actually learning over long time horizons, which would give it the affordance to contemplate the many ways it could accomplish its goals.
And, while just speculation, I imagine that the kind of training procedures we’re doing now to instill alignment will not be possible with Continuous Learning, or we’ll have to pay a heavy alignment tax to do that for these agents. Note: Jan’s recent tweet on his impression that it is quite hard to align large models and it doesn’t fall out for free from size.

[Question] Are there examples of communities where AI is making epistemics better now?

Ben Goldhaber17 Nov 2025 21:47 UTC

18 points

0 comments2 min readLW link

Ben Goldhaber 10 Nov 2025 4:48 UTC
5 points
0
in reply to: Mo Putera’s comment on: Unexpected Things that are People
Thank you! I had not read this Kevin Simler essay but I quite like it, and it does match my perspective.

Unexpected Things that are People

Ben Goldhaber8 Nov 2025 17:12 UTC

209 points

11 comments4 min readLW link

FLF Fellowship on AI for Human Reasoning: $25-50k, 12 weeks

Oliver Sourbut and Ben Goldhaber

19 May 2025 13:25 UTC

76 points

1 comment2 min readLW link

(www.flf.org)

Ben Goldhaber 28 Apr 2025 18:17 UTC
3 points
0
in reply to: Garrett Baker’s comment on: bgold’s Shortform
I did! and I in fact have read—well some of :) - the whitepaper. But it still seems weird that it’s not possible to Increase the Trust in the third party through financial means, dramatic PR stunts (auditor promises to commit sepuku if they are found to have lied)

Ben Goldhaber 28 Apr 2025 18:15 UTC
1 point
0
in reply to: ryan_greenblatt’s comment on: bgold’s Shortform
source needed, but I recall someone on the community notes team saying it was very similar but there are some small differences between prod and the open source version (it’s difficult to maintain exact compatibility). For the point of the comment and context I agree open source does a good job of this, though given the number of people on twitter who still allege its being manipulated, I think you need some additional juice (a whistleblower prize?)

Ben Goldhaber 26 Apr 2025 16:55 UTC
16 points
0
on: bgold’s Shortform
Why so few third party auditors of algorithms? for instance, you could have an auditing agency make specific assertions about what the twitter algorithm is doing, whether the community notes is ‘rigged’
- It could be that this is too large of a codebase, too many people can make changes, it’s too hard to verify the algorithm in production is stable. This seems unlikely to me with most modern devops stacks
- It could be that no one will trust the third party agency. I guess this seems most likely… but really, have we even tried? Could we not have some group of monk like Auditors who would rather die than lie (my impression is some cyber professionals have this ethos already)
If Elon wanted to spend a couple hundred thousand on insanely commited high integrity auditors, it’d be a great experiment

Ben Goldhaber

Good AI Epistemics as an Offramp from the In­tel­li­gence Explosion

A Full Epistemic Stack: Knowl­edge Com­mons for the 21st Century

Ex­plo­sive Skill Acquisition