Liron

Karma: 4,333

Interview with Carl Feynman on Imminent AI Existential Risk

LironJul 5, 2025, 6:49 PM

29 points

1 comment40 min readLW link

Liron May 10, 2025, 4:32 PM
2 points
0
in reply to: mishka’s comment on: Jim Babcock’s Mainline Doom Scenario: Human-Level AI Can’t Control Its Successor
I have multi-year-wide confidence intervals, which I think the authors of AI 2027 also do, so I don’t have much of a stance on whether the best guess is 2026 or 2027 or 2030 or 2035. I agree 2027 seems a bit soon given the subjective rate of progress 🤷‍♂️

Liron May 9, 2025, 2:15 PM
6 points
0
in reply to: Mitchell_Porter’s comment on: Jim Babcock’s Mainline Doom Scenario: Human-Level AI Can’t Control Its Successor
“Utility Engineering: Analyzing and Controlling Emergent Value Systems in AI”
https://www.emergent-values.ai/
I walked through this paper’s finding in detail in a previous episode of Doom Debates which IMO is one of my best episodes. Just skip straight to the chapters in the second half, timestamp 49:13:

Jim Babcock’s Mainline Doom Scenario: Human-Level AI Can’t Control Its Successor

Liron and jimrandomh

May 9, 2025, 5:20 AM

29 points

4 comments62 min readLW link

(www.youtube.com)

Liron Feb 16, 2025, 3:58 PM
8 points
6
on: The Failed Strategy of Artificial Intelligence Doomers
This article is just saying “doomers are failing to prevent doom for various reasons, and also they might be wrong that doom is coming soon”. But we’re probably not wrong, and not being doomers isn’t a better strategy. So it’s a lame article IMO.

Liron Jan 25, 2025, 11:32 PM
2 points
0
on: Mechanisms too simple for humans to design
Wow this is my favorite post in a long time, super educational. I was familiar with the basic concept from the Sequences, but this added a great level of understandable detail. Kudos.

Liron Jan 4, 2025, 12:54 AM
2 points
2
in reply to: moonlight’s comment on: Practicing Bayesian Epistemology with “Two Boys” Probability Puzzles
By your logic, if I ask you a totally separate question “What’s the probability that a parent’s two kids are both boys”, would you answer 1/3? Becuase the correct answer should be ¹⁄₄ right? So something about your preferred methodology isn’t robust.

Liron Jan 3, 2025, 2:35 PM
2 points
0
in reply to: Yair Halberstadt’s comment on: Practicing Bayesian Epistemology with “Two Boys” Probability Puzzles
I agree that frequentists are flexible about their approach to try to get the right answer. But I think your version of the problem highlights how flexible they have to be i.e. mental gymnastics, compared to just explicitly being Bayesian all along.

Liron Jan 3, 2025, 2:05 AM
2 points
−2
in reply to: Yair Halberstadt’s comment on: Practicing Bayesian Epistemology with “Two Boys” Probability Puzzles
In scenario B, where a random child runs up, I wonder if a non-Bayesian might prefer that you just eliminate (girl, girl) and say that the probability of two boys is 1/3?
In Puzzle 1 in my post, the non-Bayesian has an interpretation that’s still plausibly reasonable, but in your scenario B it seems like they’d be clowning themselves to take that approach.
So I think we’re on the same page that whenever things get real/practical/bigger-picture, then you gotta be Bayesian.

Practicing Bayesian Epistemology with “Two Boys” Probability Puzzles

LironJan 2, 2025, 4:42 AM

43 points

14 comments6 min readLW link

Liron Dec 15, 2024, 4:25 AM
13 points
5
on: Communications in Hard Mode (My new job at MIRI)
Thanks for this post.
I’d love to have a regular (weekly/monthly/quarterly) post that’s just “here’s what we’re focusing on at MIRI these days”.
I respect and value MIRI’s leadership on the complex topic of building understanding and coordination around AI.
I spend a lot of time doing AI social media, and I try to promote the best recommendations I know to others. Whatever thoughts MIRI has would be helpful.
Given that I think about this less often and less capably than you folks do, it seems like there’s a low hanging fruit opportunity for people like me to stay more in sync with MIRI. My show (Doom Debates) isn’t affiliated with MIRI, but as long as there keeps being no particular disagreement that I have with MIRI, I’d like to make sure I’m pulling in the same direction as you all.

Liron Dec 14, 2024, 3:12 AM
12 points
1
on: What is MIRI currently doing?
I’ve heard MIRI has some big content projects in the works, maybe a book.

FWIW I think having a regular stream of lower-effort content that a somewhat mainstream audience consumes would help to bolster MIRI’s position as a thought leader when they release the bigger works.

Is P(Doom) Meaningful? Bayesian vs. Popperian Epistemology Debate

LironNov 9, 2024, 11:39 PM

5 points

1 comment124 min readLW link

(www.youtube.com)

Liron Aug 1, 2024, 4:37 AM
2 points
0
in reply to: Nemoto’s comment on: The Power to Understand “God”
I’d ask: If one day your God stopped existing, would anything have any kind of observable change?
Seems like a meaningless concept, a node in the causal model of reality that doesn’t have any power to constrain expectation, but the person likes it because their knowledge of the existence of the node in their own belief network brings them emotional reward.

Liron Jul 21, 2024, 6:44 PM
1 point
0
in reply to: Liron’s comment on: What do coherence arguments actually prove about agentic behavior?

When an agent is goal-oriented, they want to become more goal-oriented, and maximize the goal-orientedness of the universe with respect to their own goal

Because expected value tells us that the more resources you control, the more robust you are to maximizing your probability of success in the face of what may come at you, and the higher your maximum possible utility is (if you have a utility function without an easy-to-hit max score).

“Maximizing goal-orientedness of the universe” was how I phrased the prediction that conquering resources involves having them aligned to your goal / aligned agents helping you control them.

Liron Jul 21, 2024, 3:55 AM
−5 points
0
in reply to: sunwillrise’s comment on: What do coherence arguments actually prove about agentic behavior?
> goal-orientedness is a convergent attractor in the space of self-modifying intelligences
This also requires a citation, or at the very least some reasoning; I’m not aware of any theorems that show goal-orientedness is a convergent attractor, but I’d be happy to learn more.
Ok here’s my reasoning:
When an agent is goal-oriented, they want to become more goal-oriented, and maximize the goal-orientedness of the universe with respect to their own goal. So if we diagram the evolution of the universe’s goal-orientedness, it has the shape of an attractor.
There are plenty of entry paths where some intelligence-improving process spits out a goal-oriented general intelligene (like biological evolution did), but no exit path where a universe whose smartest agent is super goal-oriented ever leads to that no longer being the case.

Liron Jul 19, 2024, 12:54 AM
2 points
0
in reply to: Writer’s comment on: Robin Hanson AI X-Risk Debate — Highlights and Analysis
I’m happy to have that kind of debate.
My position is “goal-directedness is an attractor state that is incredibly dangerous and uncontrollable if it’s somewhat beyond human-level in the near future”.
The form of those arguments seems to be like “technically it doesn’t have to be”. But realistically it will be lol. Not sure how much more there will be to say.

Liron Jul 15, 2024, 12:37 AM
4 points
0
in reply to: nsokolsky’s comment on: Robin Hanson AI X-Risk Debate — Highlights and Analysis
Thanks. Sure, I’m always happy to update on new arguments and evidence. The most likely way I see possibly updating is to realize the gap between current AIs and human intelligence is actually much larger than it currently seems, e.g. 50+ years as Robin seems to think. Then AI alignment research has a larger chance of working.

I also might lower P(doom) if international govs start treating this like the emergency it is and do their best to coordinate to pause. Though unfortunately even that probably only buys a few years of time.

Finally I can imagine somehow updating that alignment is easier than it seems, or less of a problem to begin with. But the fact that all the arguments I’ve heard on that front seem very weak and misguided to me, makes that unlikely.

Liron Jul 13, 2024, 2:26 PM
6 points
4
in reply to: Daniel V’s comment on: Robin Hanson AI X-Risk Debate — Highlights and Analysis
Thanks for your comments. I don’t get how nuclear and biosafety represent models of success. Humanity rose to meet those challenges not quite adequately, and half the reason society hasn’t collapsed from e.g. a first thermonuclear explosion going off either intentionally or accidentally is pure luck. All it takes to topple humanity is something like nukes but a little harder to coordinate on (or much harder).

Robin Hanson AI X-Risk Debate — Highlights and Analysis

LironJul 12, 2024, 9:31 PM

46 points

7 comments45 min readLW link

(www.youtube.com)

Liron

In­ter­view with Carl Feyn­man on Im­mi­nent AI Ex­is­ten­tial Risk

Jim Bab­cock’s Main­line Doom Sce­nario: Hu­man-Level AI Can’t Con­trol Its Successor

Prac­tic­ing Bayesian Episte­mol­ogy with “Two Boys” Prob­a­bil­ity Puzzles

Is P(Doom) Mean­ingful? Bayesian vs. Pop­pe­rian Episte­mol­ogy Debate

Robin Han­son AI X-Risk De­bate — High­lights and Analysis

Interview with Carl Feynman on Imminent AI Existential Risk

Jim Babcock’s Mainline Doom Scenario: Human-Level AI Can’t Control Its Successor

Practicing Bayesian Epistemology with “Two Boys” Probability Puzzles

Is P(Doom) Meaningful? Bayesian vs. Popperian Epistemology Debate

Robin Hanson AI X-Risk Debate — Highlights and Analysis