Well, we’re going to be training AI anyway. If we’re just training capabilities, but not wisdom, I think things are unlikely to go well. More thoughts on this here.
Chris_Leong
I believe that Anthropic should be investigating artificial wisdom:
I’ve summarised a paper arguing for the importance of artificial wisdom with Yoshua Bengio being one of the authors.I also have a short-form arguing for training wise AI advisors and an outline Some Preliminary Notes of the Promise of a Wisdom Explosion.
By Wise AI Advisors, I mean training an AI to provide wise advice. BTW, I’ve now added a link to a short-form post in my original comment where I detail the argument for wise AI advisors further.
Props for proposing a new and potentially fruitful framing.
I would like to propose training Wise AI Advisors as something that could potentially meet your two criteria:
• Even if AI is pretty much positive, wise AI advisors would allow us get closer to maximising these benefits
• We can likely save the world if we make sufficiently wise decisions[1]
- ^
There’s also a chance that we’re past the point of no return, but if that’s the case, we’re screwed no matter what we do. Okay, it’s slightly more complicated because there’s a chance that we aren’t yet past the point of no return, but if we pursue wise AI advisors, instead of redirecting these resources to another project that we will be past the point of no return by the time we produce such advisors. This is possible, but my intuition is that it’s worth pursuing anyway.
- ^
Why the focus on wise AI advisors?[1]
I’ll be writing up a proper post to explain why I’ve pivoted towards this, but it will still take some time to produce a high quality post, so I decided it was worthwhile releasing a short-form description in the mean time.
By Wise AI Advisors, I mean training an AI to provide wise advice.
a) AI will have a massive impact on society given the infinite ways to deploy such a general technology
b) There are lots of ways this could go well and lots of ways that this could go extremely poorly (election interference, cyber attacks, development of bioweapons, large-scale misinformation, automated warfare, catastrophic malfunctions, permanent dictatorships, mass unemployment ect.)
c) There is massive disagreement on best strategy (decentralization vs. limiting proliferation, universally accelerating AI vs winning the arms race vs pausing, incremental development of safety vs principled approaches, offence-defence balance favoring the attacker or defender) or even what we expect the development of AI to look like (overhyped bubble vs machine god, business as usual vs this changes everything). Making the wrong call could prove catastrophic.
d) The AI is developing incredibly rapidly (no wall, see o3 crushing the ARC challenge!). We have limited time to act and to figure out how to act.
e) Given both the difficulty and the number of different challenges and strategic choices, humanity needs to rapidly improve its capability to navigate such situations
f) Whilst we can and should be developing top governance and strategy talent, this is unlikely to be sufficient by itself. We need every advantage we can get, we can’t afford to leave anything on the table.
g) Another way of framing this: Given the potential of AI development to feed back into itself, if it isn’t also feeding back into increased wisdom in how we navigate the world, our capabilities are likely to far outstrip our ability to handle them.
For these reasons, I think it is vitally important for society to be working on training these advisors now.Does anyone else think wise AI advisors are important?
Going slightly more general to training wise AI rather than specifically advisors, there was the competition on the Automation of Wisdom and Philosophy organised by Owen Cotton-Barrett and there’s this paper (summary) by Samuel Johnson and others incl. Yoshua Bengio, Melanie Mitchell and Igor Grossmann.LintzA listed Wise AI advisors for governments as something worth considering in The Game Board Has Been Flipped[2].
Additional Links
You may also interested in reading my 3rd prize-winning entry to the AI Impacts Competition on the Automation of Wisdom and Philosophy. It’s divided in two parts:• An Overview of “Obvious” Approaches to Training Wise AI Advisors
• Some Preliminary Notes on the Promise of a Wisdom Explosion
- ^
I previously described my agenda as Wise AI Advisors via Imitation Learning. I now see that as overly narrow. The goal is to produce Wise AI Advisors via any means and I think that Imitation Learning is underrated, but I’m sure there’s lots of other approaches that are underrated as well.
- ^
No argument included sadly.
- 10 Feb 2025 17:02 UTC; 5 points) 's comment on Nonpartisan AI safety by (
- 10 Feb 2025 17:21 UTC; 4 points) 's comment on evhub’s Shortform by (
- ^
Thanks, seems pretty good on a quick skim, I’m a bit less certain on the corrigibility section, also more issues might become apparent if I read through it more slowly.
How about “Please summarise Eliezer Yudkowsky’s views on decision theory and its relevance to the alignment problem”.
Someone should see how good Deep Research is at generating reports on AI Safety content.
Nice article, I especially love the diagrams!
In Human Researcher Obsolescence you note that we can’t completely hand over research unless we manage to produce agents that are at least as “wise” as the human developers.
I agree with this, though I would love to see a future version of this plan include an expanded analysis of the role that wise AI plays would play in the strategy of Magma, as I believe that this could be a key aspect of making this plan work.
In particular:
• We likely want to be developing wise AI advisors to advise us during the pre-hand-off period. In fact, I consider this likely to be vital to successfully navigating this period given the challenges involved.
• It’s possible that we might manage to completely automate the more objective components of research without managing to completely automating the more subjective components of research. That said, we likely want to train wise AI advisors to help us with the more subjective components even if we can’t defer to them.
• When developing AI capabilities, there’s an additional lever in terms of how much Magma focuses on direct capabilities vs. focusing on wisdom.
Fellowships are typically only for a few month and even if you’re in India, you’d likely have to move for the fellowship unless it happened to be in your exact city.
Impact Academy was doing this, before they pivoted towards the Global AI Safety Fellowship. It’s unclear whether any further fellowships should be in India or a country that is particularly generous with its visas.
I posted this comment on Jan’s blog post
Underelicitation assumes a “maximum elicitation” rather than a never-ending series of more and more layers of elicitation that could be discovered.You’ve undoubtedly spent much more time thinking about this than I have, but I’m worried that attempts to maximise elicitation merely accelerate capabilities without actually substantially boosting safety.
In terms of infrastructure, it would be really cool to have a website collecting the more legible alignment research (papers, releases from major labs or non-profits).
I think I saw someone arguing that their particular capability benchmark was good for evaluating the capability, but of limited use for training the capability because their task only covered a small fraction of that domain.
(Disclaimer: I previously interned at Non-Linear)
Different formats allow different levels of nuance. Memes aren’t essays and they shouldn’t try to be.
I personally think these memes are fine and that outreach is too. Maybe these posts oversimplify things a bit too much for you, but I expect that average person on these subs probably improves the level of their thinking from seeing these memes.
If, for example, you think r/EffectiveAltruism should ban memes, then I recommend talking to the mods.
Biden administration unveils global AI export controls aimed at China
Well done for managing to push something out there. It’s a good start, I’m sure you’ll fill in some of the details with other posts over time.
What if the thing we really need the Aligned AI to engineer for us is… a better governance system?
I’ve been arguing for the importance of having wise AI advisors. Which isn’t quite the same thing as a “better governance system”, since they could advise us about all kinds of things, but feels like it’s in the same direction.
Excellent post. It helped clear up some aspects of SLT for me. Any chance you could clarify why this volume is called the “learning coefficient?”
My take: Counterfactuals are Confusing because of an Ontological Shift:
“In our naive ontology, when we are faced with a decision, we conceive of ourselves as having free will in the sense of there being multiple choices that we could actually take. These choices are conceived of as actual and we when think about the notion of the “best possible choice” we see ourselves as comparing actual possible ways that the world could be. However, we when start investigating the nature of the universe, we realise that it is essentially deterministic and hence that our naive ontology doesn’t make sense. This forces us to ask what it means to make the “best possible choice” in a deterministic ontology where we can’t literally take a choice other than the one that we make. This means that we have to try to find something in our new ontology that roughly maps to our old one.”
We expect a straightforward answer to “What is a decision theory as a mathematical object?”, since we automatically tend to assume our ontology is consistent, but if this isn’t the case and we actually have to repair our ontology, it’s unsurprising that we end up with different kinds of objects.