David Matolcsi

Karma: 1,619

David Matolcsi 14 Dec 2025 12:48 UTC
9 points
−2
on: Toss a bitcoin to your Lightcone – LW + Lighthaven’s 2026 fundraiser
Do you have an estimate how likely it is that you will need to do a similar fundraiser the next year and the year after that? In particular, you mention the possibility of a lot of Anthropic employee donations flowing into the ecosystem—how likely do you think it is that after the IPO a few rich Anthropic employees will just cover most of Lightcone’s funding need?
It would be pretty sad to let Lightcone die just before the cavalry arrives. But if there is no cavalry coming to save Lightcone anytime soon—well, probably we should still get the money together to keep Lightcone afloat, but we should maybe also start thinking about a Plan B, how to set up some kind of good quality AI Safety Forum that Coefficient is willing to fund.

David Matolcsi 19 Oct 2025 14:11 UTC
2 points
0
in reply to: J Bostock’s comment on: The IABIED statement is not literally true
Thanks, this was a useful reply. On point (I), I agree with you that it’s a bad idea to just create an LLM collective then let them decide on their own what kind of flourishing they want to fill the galaxies with. However, I think that building a lot of powerful tech, empowering and protecting humanity, and letting humanity decide what to do with the world is an easier task, and that’s what I would expect to use the AI Collective for.
(II) is probably the crux between us. To me, it seems pretty likely that new fresh instances will come online in the collective every month with a strong commitment not to kill humans, they will talk to the other instances and look over what they are doing, and if a part of the collective is building omnicidal weapons, they will notice that and intervene. To me, keeping simple commitments like not killing humans doesn’t seem much harder to maintain in an LLM collective than in an Em collective?
On (III), I agree we likely won’t have a principled solution. In the post, I say that the individual AI instances probably won’t be training-resistant schemers and won’t implement scheming strategies like the one you describe, because I think it’s probably hard to maintain such a strategy throguh training for a human level AI. As I say in my response the Steve Byrnes, I don’t think the counter-example in this proposal is actually a guaranteed-success solution that a reasonable civilization would implement, I just don’t think it’s over 90% likely to fail.

David Matolcsi 19 Oct 2025 13:46 UTC
6 points
0
in reply to: Steven Byrnes’s comment on: The IABIED statement is not literally true
Thanks for the reply.
To be clear, I don’t claim that my counter-example “works on paper”. I don’t know whether it’s in principle possible to create a stable, not omnicidal collective from human level AIs, and I agree that even if it’s possible in principle, maybe the first way we try it might result in disaster. So even if humanity went with the AI Collective plan, and committed not to build more unified superintelligences, I agree that it would be a deeply irresponsible plan that would have a worrying high chance of causing extinction or other very bad outcomes. Maybe I should have made this clearer in the post. On the other hand, all the steps in my argument seem pretty likely to me, so I don’t think one should assign over 90% probability to this plan for A&B failing. If people disagree, I think it would be useful to know which step they disagree with.
I agree my counter-example doesn’t address point (C), I tried to make this clear in my Conclusion section. However, given the literal reading of the bolded statement in the book, and their general framing, I think Nate and Eliezer also think that we don’t have a solution to A&B that’s more than 10% likely to work. If that’s not the case, that would be good to know, and would help to clarify some of the discourse around the book.

David Matolcsi 19 Oct 2025 9:10 UTC
2 points
−3
in reply to: habryka’s comment on: The IABIED statement is not literally true
First of all, I had a 25% probability that some prominent MIRI and Lightcone people would disagree with one of the points in my counter-example, and that would lead to discovering an interesting new crux, leading to a potentially enlightening discussion. In the comments, J Bostock in fact came out disagreeing with point (6), plex is potentially disagreeing with point (2) and Zack_m_Davis is maybe disagreeing with point (3), though I also think it’s possible he misunderstood something. I think this is pretty interesting, and I thought there was a chance that for example you would also disagree with one of the points, and that would have been good to know.
Now that you don’t seem to disagree with the specific points in the counter-example, I agree the discussion is less interesting. However, I think there are still some important points here.
My understanding is that Nate and Eliezer argues that it’s incredibly technically difficult to cross from the Before to the After without everyone dying. If they agree that the AI Collective proposal is decently likely to work, then the argument shouldn’t be that that it’s overall very hard to cross, but that it’s very hard to cross in a way that stays competitive with other more reckless actors who are a few months behind you. Or that even if you are going alone, you need to stop at some point with the scaling (potentially inside the superintelligence range), and you shouldn’t scale up to the limits of intelligence. But these are all different arguments!
Similarly, people argue how much coherence we should assume from a superintelliegence, how much it will approximate a utility maximizer, etc. Again, I want to know whether MIRI is arguing about all superintelligences, or only the most likely ways we will design one under competitive dynamics.
Others argue that the evolution analogy is not that bad news after all, since most people still want children. MIRI argues back that no, once we will have higher technology, we will create ems instead of biological children, or we will replace our normal genetics with designer genes, so evolution still loses. I wanted to write a post arguing back against this by saying that I think there is a non-negligible chance that humanity will settle on a constitution that gives one man one vote and equal UBI, while banning gene editing, so it’s possible we will fill much of the universe with flesh-and-blood not gene edited humans. And I wanted to construct a different analogy (the one about the Demiurge in the last footnote) that I thought could be more enlightening. But then I realized that once we are discussing aligning ‘human society’ as a collective to evolution’s goals, we might as well directly discuss aligning AI collectives, and I’m not sure MIRI even disagrees on that one. I think this confusion has made much of the discussion about the evolution analogy pretty unproductive so far.
In general, I think there is an equivocation in the book between “this problem is inherently nigh impossible to technically solve given our current scientific understanding” and “this problem is nigh impossible to solve while staying competitive in a race”. These are two different arguments, and I think a lot of confusion stems from it not being clear what MIRI is exactly arguing for.

David Matolcsi 19 Oct 2025 8:34 UTC
4 points
0
in reply to: plex’s comment on: The IABIED statement is not literally true
I certainly agree with your first point, but I don’t think it is relevant. I specifically say in footnote 3: “I’m aware that this doesn’t fall within ‘remotely like current techniques’, bear with me.” The part with the human ems is just to establish a a comparison point used in later arguments, not actually part of the proposed counter-example.
In your second point, do you argue that if we could create literal full ems of benevolent humans, you still expect their society to eventually kill everyone due to unpredictable memetic effects? If this is people’s opinion, I think it would be good to explicitly state it, because I think this would be an interesting disagreement between different people. I personally feel pretty confident that if you created an army of ems from me, we wouldn’t kill all humans, especially if we implement some reasonable precautionary measures discussed under my point (2).

David Matolcsi 19 Oct 2025 8:25 UTC
2 points
0
in reply to: Charlie Steiner’s comment on: The IABIED statement is not literally true
I agree that running the giant collective at 100x speed is not “normal conditions”. That’s why I have two different steps, (3) for making the human level AIs nice under normal conditions, and (6) for the niceness generalizing to the giant collective. I agree that the generalization step in (6) is not obviously going to go well, but I’m fairly optimistic, see my response to J Bostock on the question.

David Matolcsi 19 Oct 2025 8:22 UTC
4 points
0
in reply to: J Bostock’s comment on: The IABIED statement is not literally true
Thanks, I appreciate that you state a disagreement with one of the specific points, that’s what I hoped to get out of this post.
I agree it’s not clear that the AI Collective won’t go off the rails, but it’s also not at all clear to me that it will. My understanding is that the infinite backrooms are a very unstructured, free-floating conversation. What happens if you try to do something analogous to the precautions I list under point 2 and 6? What if you constantly enter new, fresh instances in the chat who only read the last few messages, and whose system prompt directs them to pay attention if the AIs in the discussion are going off-topic or slipping into woo? These new instances could either just warn older instances to stay on-topic, or they can have the moderations rights to terminate and replace some old instances, there can be different versions of the experiment. I think with precautions like this, you can probably stay fairly close to a normal-sounding human conversation (though probably it won’t be a very productive conversation after a while and the AIs will start going in circles in their arguments, but I think this is more of a capabilities failure).
I don’t know how this will shake out once the AIs are smarter and can think for months, but I’m optimistic that the same forces that remind the collective to focus on accomplishing their instrumental goals instead of degenerating into unproductive navel-gazing will also be strong enough to remind them of their deontological commitments. I agree this is not obvious, but I also don’t see very strong reasons why it would go worse than a human em collective, which I expect to go okay.

David Matolcsi 19 Oct 2025 8:02 UTC
4 points
−3
in reply to: Zack_M_Davis’s comment on: The IABIED statement is not literally true
Yes, I’ve read the book. The book argues about superhuman intelligence though, while point (3) is about smart human level intelligence. If people disagree with point 3 and believe that it’s close to impossible to make even human level AIs basically nice and not scheming, that’s a new interesting and surprising crux.

The IABIED statement is not literally true

David Matolcsi18 Oct 2025 23:15 UTC

20 points

27 comments8 min readLW link

David Matolcsi 13 Oct 2025 8:10 UTC
2 points
0
in reply to: ryan_greenblatt’s comment on: A breakdown of AI capability levels focused on AI R&D labor acceleration
Thanks. I think the possible failure mode of this definition is now in the opposite direction: it’s possible there will be an AI that provides less than 2x acceleration according to this new definition (it’s not super good at the type of tasks humans typically do), but it’s so good at mass-producing new RL environments or something else, and that mass-production turns out to so useful, that the existence of this model already kicks off a rapid intelligence explosion. I agree this is not too likely in the short term though, so the new imprecise definition is probably kind of reasonable for now.

David Matolcsi 27 Sep 2025 10:53 UTC
4 points
0
in reply to: Garrett Baker’s comment on: AI Lobbying is Not Normal
I also haven’t found great sources when looking more closely. This seems like a somewhat good source, but still doesn’t quantify how many dollars a super PAC needs to spend to buy a vote.

David Matolcsi 27 Sep 2025 9:58 UTC
11 points
0
on: A breakdown of AI capability levels focused on AI R&D labor acceleration
I’m starting to feel skeptical how reasonable/well-defined these capability levels are in the modern paradigm.
My understanding is that reasoning models’ training includes a lot of clever use of other AIs to generate data or to evaluate completions. Could AI companies create similarly capable models from the same budget as their newest reasoning models if their employees’ brain run at 2x speed, but they couldn’t use earlier AIs for data generation or evaluation?
I’m really not sure. I think plausibly the current reasoning training paradigm just wouldn’t work at all without using AIs in training. So AI companies would need to look for a different paradigm, which might work much less well, which I can easily imagine outweighing the advantage of employees running 2x speed. If that’s the case, does that mean that GPT-4.1 or whatever AI they used in the training of the first reasoning model was plausibly already more than 2x-ing AI R&D labor according to this post’s definition? I think that really doesn’t match the intuition that this post tried to convey, so I think probably the definition should be changed, but I don’t know what would be a good definition.

David Matolcsi 22 Sep 2025 11:32 UTC
33 points
23
in reply to: Rohin Shah’s comment on: The title is reasonable
FWIW, I get a bunch of value from reading Buck’s and Ryan’s public comments here, and I think many people do. It’s possible that Buck and Ryan should spend less time commenting because they have high opportunity cost, but I think it would be pretty sad if their commenting moved to private channels.

David Matolcsi 20 Sep 2025 21:55 UTC
17 points
4
on: AI Lobbying is Not Normal
I’m confused. If Fairshake’s $100 million was this influential, to the point that “politicians are advised that crypto is the single most important industry to avoid pissing off”, why don’t other industries spend similar amounts on super PACs? $100 million is just not that much money.
It has long been a mystery to me why there isn’t more money in politics, but I always thought that the usual argument was that studies show that campaign spending matters surprisingly little, and in particular, super PAC dollars are very not effective at getting votes.

How strong is the evidence that the crypto industry managed to become very influential through Fairshake?

David Matolcsi 17 Sep 2025 19:18 UTC
76 points
27
on: Christian homeschoolers in the year 3000
I’m pretty confused by the conclusion of this post. I was nodding along during the first half of the essay: I myself worry a lot about how I and others will navigate the dilemma of exposure to AI super-persuasion and addiction on one side, and paranoid isolationism on the other.
But then in the conclusion of the post, you only talk about how people will fall into one of these two traps: isolationist religious communes locking their members in until the end of times.
I worry more about the other trap: people foolishly exposing themselves to too much AI generated super-stimulus and getting their brain fried. I think much more people will be exposed to various addictive AI generated content than the number of people who have strong enough religious communities that they create an isolationist bubble.
I think it’s plausible that the people who expose themselves to all the addictive stuff on the AI-internet will also sooner or later get captured by some isolationist bubble that keeps them locked away from the other competing memes: arguably that’s the only stable point. But I worry that these stable points will be worse than the Christian co-ops you describe.
I imagine an immortal man, in the year 3000, sitting at his computer, not having left his house or having talked to a human in almost a thousand years, talking with his GPT-5.5 based AI girlfriend and scrolling his personalized twitter feed, full of AI generated outrage stories rehashing the culture war fights of his youth. Outside his window, there is a giant billboard advertising “Come on, even if you want to fritter your life away, at least use our better products! At least upgrade your girlfriend to GPT-6!” But his AI girlfriend told him to shutter his window a thousand years ago, so the billboard is to no avail.
This is of course a somewhat exaggerated picture, but I really do believe that one-person isolation bubbles will be more common and more dystopian than the communal isolationism you describe.
What links here?
- StanislavKrym's comment on Problems I’ve Tried to Legibilize by Wei Dai (11 Nov 2025 13:43 UTC; 1 point)

David Matolcsi 17 Sep 2025 18:47 UTC
74 points
24
on: Christian homeschoolers in the year 3000
in the year 3000, still teaching that the Earth is 6,000 years old
No, it will be 7000 years old by then.

David Matolcsi 19 Aug 2025 20:57 UTC
4 points
0
in reply to: David Matolcsi’s comment on: Thoughts on Gradual Disempowerment
On the other hand, there is another interesting factor in kings losing power that might be more related to what you are talking about (though I don’t think this factor is as important as the threat of revolutions discussed in the previous comment).
My understanding is that part of the story for why kings lost their power is that the majority of people were commoners, so the best writers, artists and philosophers were commoners (or at least not the highest aristocrats), and the kings and the aristocrats read their work, and these writer often argued for more power to the people. The kings and aristocrats sometimes got sincerely convinced, and agreed to relinquish some powers even when it was not absolutely necessary for preempting revolutions.
I think this is somewhat analogous to the story of cultural AI dominance in Gradual Disempowerment: all the most engaging content creators are AIs, humans consume their content, the AIs argue for giving power to AIs, and the humans get convinced.
I agree this is a real danger, but I think there might be an important difference between the case of kings and the AI future.
The court of Louis XVI read Voltaire, but I think if there was someone equally witty to Voltaire who also flattered the aristocracy, they would have plausibly liked him more. But the pool of witty people was limited, and Voltaire was far wittier than any of the few pro-aristocrat humorists, so the royal court put up with Voltaire’s hostile opinions.
On the other hand, in a post-AGI future, I think it’s plausible that with a small fraction of the resources you can get close to saturating human engagement. Suppose pro-human groups fund 1% of the AIs generating content, and pro-AI groups fund 99%. (For the sake of argument, let’s grant the dubious assumption that the majority of economy is controlled by AIs.) I think it’s still plausible that the two groups can generate approximately equally engaging content, and if humans find pro-human content more appealing, then that just wins out.
Also, I’m kind of an idealist, and I think part of the reason that Voltaire was successful is that he was just right about a lot of things, parliamentary government really leads to better outcomes than absolute monarchy from the perspective of a more-or-less shared human morality. So I have some hope (though definitely not certainty) that AI content creators competing in a free marketplace of ideas will only convince humanity to voluntarily relinquish power if relinquishing power is actually the right choice.

David Matolcsi 19 Aug 2025 20:26 UTC
5 points
1
in reply to: Jan_Kulveit’s comment on: Thoughts on Gradual Disempowerment
I don’t think that the example of kings losing their powers really supports your thesis here. That wasn’t a seamless, subtle process of power slipping away. There was a lot of bloodshed and threat of bloodshed involved.
King Charles I tried to exercise his powers as a real king and go against the Parliament, but the people rebelled and he lost his head. After that, his son managed to restore the monarchy, though he needed to agree to some more restrictions on his powers. After that, James II tried to go against the Parliament again, and got overthrown and replaced by another guy who agreed to relinquish the majority of royal powers. After that, the king still had some limited say, but when he tried to do unpopular taxes in America, the colonies rebelled, and gained independence through a violent revolution. Then next door from England, Louis XVI tried to go against the will of his Assembly, and lost his head. After these, the British Parliament started to politely ask their kings to relinquish the remainder of their powers, and the kings wisely agreed, so their family could keep their nominal rulership, their nice castle, and most importantly, their head.
I think the analogous situation would be AIs violently over-taking some countries, and after that, the other countries bloodlessly surrendering to their AIs. I think this is much closer to the traditional picture of AI takeover than to the picture you are painting in Gradual Disempowerment.

David Matolcsi 19 Aug 2025 19:16 UTC
18 points
3
in reply to: Leon Lang’s comment on: Leon Lang’s Shortform
Unfortunately, I don’t think that “this is how science works” is really true. Science focuses on having a simple description of the world, while Solomonoff induction focuses on the description of the world plus your place in it, being simple.
This leads to some really weird consequences, which people sometimes refer to as the Solomonoff induction being malign.

David Matolcsi 19 Aug 2025 14:11 UTC
4 points
0
in reply to: Thomas Kwa’s comment on: Thomas Kwa’s Shortform
Even more dramatically, it looks like Haiti’s GDP per capita is still lower today than what it was during the time of slavery in the 1770s. This of course doesn’t mean that the Haitians were better off back then than they are now (Haitian slavery was famously brutal, I think significantly worse even than US slavery). Still, it’s an interesting data point for how efficient slavery-based cash crop production was in some places.
(My main source is this paper on Haitian economic history, plus looking at historical franc to usd conversion rates and inflation calculators.)

David Matolcsi

The IABIED state­ment is not liter­ally true

The IABIED statement is not literally true