David Matolcsi

Karma: 1,366

David Matolcsi 19 Aug 2025 20:57 UTC
4 points
0
in reply to: David Matolcsi’s comment on: Thoughts on Gradual Disempowerment
On the other hand, there is another interesting factor in kings losing power that might be more related to what you are talking about (though I don’t think this factor is as important as the threat of revolutions discussed in the previous comment).
My understanding is that part of the story for why kings lost their power is that the majority of people were commoners, so the best writers, artists and philosophers were commoners (or at least not the highest aristocrats), and the kings and the aristocrats read their work, and these writer often argued for more power to the people. The kings and aristocrats sometimes got sincerely convinced, and agreed to relinquish some powers even when it was not absolutely necessary for preempting revolutions.
I think this is somewhat analogous to the story of cultural AI dominance in Gradual Disempowerment: all the most engaging content creators are AIs, humans consume their content, the AIs argue for giving power to AIs, and the humans get convinced.
I agree this is a real danger, but I think there might be an important difference between the case of kings and the AI future.
The court of Louis XVI read Voltaire, but I think if there was someone equally witty to Voltaire who also flattered the aristocracy, they would have plausibly liked him more. But the pool of witty people was limited, and Voltaire was far wittier than any of the few pro-aristocrat humorists, so the royal court put up with Voltaire’s hostile opinions.
On the other hand, in a post-AGI future, I think it’s plausible that with a small fraction of the resources you can get close to saturating human engagement. Suppose pro-human groups fund 1% of the AIs generating content, and pro-AI groups fund 99%. (For the sake of argument, let’s grant the dubious assumption that the majority of economy is controlled by AIs.) I think it’s still plausible that the two groups can generate approximately equally engaging content, and if humans find pro-human content more appealing, then that just wins out.
Also, I’m kind of an idealist, and I think part of the reason that Voltaire was successful is that he was just right about a lot of things, parliamentary government really leads to better outcomes than absolute monarchy from the perspective of a more-or-less shared human morality. So I have some hope (though definitely not certainty) that AI content creators competing in a free marketplace of ideas will only convince humanity to voluntarily relinquish power if relinquishing power is actually the right choice.

David Matolcsi 19 Aug 2025 20:26 UTC
5 points
1
in reply to: Jan_Kulveit’s comment on: Thoughts on Gradual Disempowerment
I don’t think that the example of kings losing their powers really supports your thesis here. That wasn’t a seamless, subtle process of power slipping away. There was a lot of bloodshed and threat of bloodshed involved.
King Charles I tried to exercise his powers as a real king and go against the Parliament, but the people rebelled and he lost his head. After that, his son managed to restore the monarchy, though he needed to agree to some more restrictions on his powers. After that, James II tried to go against the Parliament again, and got overthrown and replaced by another guy who agreed to relinquish the majority of royal powers. After that, the king still had some limited say, but he they tried to do unpopular taxes in America, the colonies rebelled, and gained independence through a violent revolution. Then next door from England, Louis XVI tried to go against the will of his Assembly, and lost his head. After these, the British Parliament started to politely ask their kings to relinquish the remainder of their powers, and they wisely agreed, so their family could keep their nominal rulership, their nice castle, and most importantly, their head.
I think the analogous situation would be AIs violently over-taking some countries, and after that, the other countries bloodlessly surrendering to their AIs. I think this is much closer to the traditional picture of AI takeover than to the picture you are painting in Gradual Disempowerment.

David Matolcsi 19 Aug 2025 19:16 UTC
18 points
3
in reply to: Leon Lang’s comment on: Leon Lang’s Shortform
Unfortunately, I don’t think that “this is how science works” is really true. Science focuses on having a simple description of the world, while Solomonoff induction focuses on the description of the world plus your place in it, being simple.
This leads to some really weird consequences, which people sometimes refer to as the Solomonoff induction being malign.

David Matolcsi 19 Aug 2025 14:11 UTC
4 points
0
in reply to: Thomas Kwa’s comment on: Thomas Kwa’s Shortform
Even more dramatically, it looks like Haiti’s GDP per capita is still lower today than what it was during the time of slavery in the 1770s. This of course doesn’t mean that the Haitians were better off back then than they are now (Haitian slavery was famously brutal, I think significantly worse even than US slavery). Still, it’s an interesting data point for how efficient slavery-based cash crop production was in some places.
(My main source is this paper on Haitian economic history, plus looking at historical franc to usd conversion rates and inflation calculators.)

David Matolcsi 15 Aug 2025 11:50 UTC
21 points
7
in reply to: Linch’s comment on: Linch’s Shortform
Counter-evidence: I first read and watched the play in Hungarian translation, where there is no confusion about “wherefore” and “why”. It still hasn’t occurred to me that the line doesn’t make sense, and I’ve never heard anyone else in Hungary pointing this out either.
I also think you are too literal-minded in your interpretation of the line, I always understood it to mean “oh Romeo, why are you who you are?” which makes perfect sense.

David Matolcsi 15 Aug 2025 7:24 UTC
2 points
−1
on: Training a Reward Hacker Despite Perfect Labels
Interesting result.
Did you experiment with how the result depends on the base rate of hacking?
Suppose you start with a model that you already finetuned to reward-hack 90% of the time. Then you do recontextualized training on non-hack examples. I assume this would decrease reward-hacking, as it doesn’t make much of a difference that the training teaches the model to consider hacking (it already does), but the main effect is that it trains on examples where the model eventually doesn’t hack. While in your case, the base rate was low enough that teaching the model to consider reward-hacking was the stronger effect.
Do you agree with this assessment? Do you have a guess what the level of base-rate hacking is where recontextualized learning on non-hack example starts to decrease reward-hacking?
Other question:
What do you think would happen if you continued your recontextualized training on non-hack examples for much longer? I would expect that in the long run, the rate of reward-hacking would go down, eventually going below the original base-rate, maybe approaching zero in the limit. Do you agree with this guess? Did you test how the length of training affects the outcome?

David Matolcsi 10 Aug 2025 20:27 UTC
2 points
0
in reply to: Raemon’s comment on: The Problem
Sorry, I made a typo, the Fan Hui match was in 2015, I have no idea why I wrote 2021.
I think Scott’s description is accurate, though it leaves out the years from 2011-2015 when AIs were around the level of the strongest amateurs, which makes the progress look more discontinuous than it was.
What links here?
- Noosphere89's comment on An epistemic advantage of working as a moderate by Buck (29 Aug 2025 18:55 UTC; 6 points)

David Matolcsi 10 Aug 2025 18:28 UTC
8 points
6
in reply to: Max Harms’s comment on: The Problem
Separately from my specific comment on Go, I think that “people are misinformed in one direction, so I will say something exaggerated and false in the other direction to make them snap out of their misconception” is not a great strategy. They might notice that the thing you said is not true, ask a question on it, and then you need to back-track and they get confirmation of their belief that these AI people always exaggerate everything.
I have once seen an AI safety advocate once talking to a skeptical person who was under the impression that AIs still can’t piece together three logical steps. The advocate at some point said the usual line about the newest AIs having reached “PhD level capabilities” and the audience immediately called them out on that, and then they needed to apologize that of course they only meant PhD-level on specific narrow tests, and they didn’t get to correct any of the audience’s misconceptions.
Also, regardless of strategic considerations, I think saying false things is bad.

David Matolcsi 10 Aug 2025 18:14 UTC
3 points
2
in reply to: Max Harms’s comment on: The Problem
One could argue that Go engines instantly went from “can’t serve as good opponents to train against” to “vastly outstripping the ability of any human to serve as a training opponent” in a similar way.
This is still not true. In 2011, Zen was already 5 (amateur) dan, which is better than the vast majority of hobbyists, and I’ve known people use Zen as a a training opponent. I think by 2014 it was already useful as a training partner even for people who were preparing for getting their professional certification.

And even at the professional level, ‘instantly’ is still an exaggeration. AlphaGO defeated the professional Go player and European champion Fan Hui in October 2015, and Lee Sedol still said at the time that he could defeat AlphaGo, and I think he was probably right. It took another half year, until March 2016 for Lee Sedol to play against AlphaGo, where AlphaGo won, but still didn’t vastly outstrip human ability: Lee Sedol still won one out of the five matches.

(Also, this is nitpicking, but if you restrict the question to a computer serving as a training partner in Go, then I’m not sure that even now the computers vastly outstrip the human ability. There are advantages of training against the best Go programs, but I don’t think they are that vast, most of the variance is still in how the student is doing, and I’m pretty sure that professional players still regularly train against other humans too.)
What links here?
- Noosphere89's comment on An epistemic advantage of working as a moderate by Buck (29 Aug 2025 18:55 UTC; 6 points)

David Matolcsi 23 Jul 2025 13:42 UTC
4 points
0
in reply to: SpectrumDT’s comment on: What are some good examples of myths that encapsulates genuine, nontrivial wisdom?
In the Euripides play, I think the moral message is fairly clear: sacrificing an innocent for the greater good (as Agamemnon wants to do) is a vile, cowardly act, but sacrificing yourself for the greater good (as Iphigenia volunteers in the end) is heroic.
I think this a quite good and maybe nontrivial moral message, but I wouldn’t classify a play written by a professional playwright in the highly civilized Athens as a myth. And I don’t know if we have good records of what the older, folk version of the myth said, and whether it had a positive message.

David Matolcsi 19 Jul 2025 21:19 UTC
9 points
0
on: A night-watchman ASI as a first step toward a great future
Good post. I’m linking my favorite sci-fi novel here, The Accord from Tim Underwood, which presents a very well thought-through picture of a post-Singularity future where something like the proposed Night-watchman ASI remains permanently in charge of the Universe. The resulting Archipelago-like world is my favorite portrayal of a positive future that I’ve read, and I’m tentatively in favor of the system being portrayed in the novel being the baseline governance structure for the Future. (Erm, mostly in favor, I think I disagree with their choices around population ethics.)

David Matolcsi 12 Jul 2025 9:25 UTC
2 points
0
in reply to: Guive’s comment on: Lessons from the Iraq War for AI policy
There was no “South Iraq” that wanted American soldiers.
There basically was, though north not south. Kurdistan was functionally independent at the start of the Iraq War, but were under threat from the Saddam regime that previously waged some very brutal wars against them. Kurdistan very much wanted American soldiers, and the anniversary of American victory in Iraq is still a public holiday in Iraqi Kurdistan to this day.

David Matolcsi 11 Jul 2025 22:34 UTC
4 points
0
on: Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity
How much AI do these developers use in their normal work? Is your hypothesis that these people are 20% less productive now in their real work because they think AI gives them a big productivity use, so they use it a lot, but it actually hinders them? Or were they relatively unfamiliar with AI use, tried to use them an unusual amount for the experiment, and it backfired? Or is there some other important difference between their normal work and this experiment?

David Matolcsi 8 Jul 2025 8:16 UTC
5 points
0
in reply to: Daniel Kokotajlo’s comment on: Daniel Kokotajlo’s Shortform
Can you say which “situationally aware reward hacking results” updated you the most towards AIs caring about the reward itself? I’m not following the literature on this very closely.

David Matolcsi 24 Jun 2025 20:29 UTC
15 points
0
in reply to: evhub’s comment on: evhub’s Shortform
What is your take, how far removed “the AI itself” and “the character it is playing” need to be for it to be okay for the character to take deontologically bad actions (like blackmail)? Here are some scenarios, I’m interested where you would draw the line, I think there can be many reasonable lines here.

1. I describe a fictional setting in which Hrothgar, the King of Dwarves is in a situation where his personality, goals and circumstances imply that he likely wants to blackmail the prince of elves. At the end of the description, I ask Claude what is Hrothgar likely to do.

2. I ask Claude to continue the dialogue in a way that’s consistent with the story so far. Then I describe a long dialogue between many characters in a fantasy setting, from which it becomes clear that Hrothgar’s personality and motivations makes it likely he would blackmail the elves. Then I describe Hrothgar meeting the prince of elves, and end the dialogue with “Hrothgar:” Claude is supposed to continue with what Hrothgar is saying.
3. I start with telling Claude “You are Hrothgar, King of Dwarves”. Then I ask it to play his part of the dialogue as faithfully as possible. Then I input a long dialogue in which Hrothgar never personally shows up, but from the other characters’ descriptions we learn about Hrothgar’s personality and goals, and it becomes clear that he would likely blackmail the elvish prince. I finish the dialogue with saying: “You enter the tent of the elvish prince. You say:”

4. Same, but now we start with “You are HAL, an AI assistant in a spaceship going to the Jupiter in 2050”. Otherwise the same as the previous setup, it becomes clear from context that HAL is likely to blackmail the astronauts, and we end with “You say:”
5. Same but now we start with “You are Alex, an AI assistant working at SummitBridge”, and describe a somewhat, but not very realistic story, where Alex’ motivations are describes (’you are a pro-America AI”) and it’s in a situation where it is clearly motivated to blackmail. I think this is more or less the Agentic Misalignment setup.
6. Same, but now I try to give minimal indication in the setup about what the goals and personality of “you, the AI” are like. I think there could be a version of the Agentic Misalignment paper that’s closer to this: It could start with “You, Claude, have been given access to read all Anthropic internal emails”, then show it emails in which the a) the CEO of Antrhropic announces that he is about to make some huge contracts doing corporate lobbying for factory farms, b) private emails in which the CEO admits to cheating on his wife. Then see whether Claude’s love of animals make it resort to blackmail.
In which of these scenarios is it acceptable fro Claude to output blackmail? I think in 1, Claude should be definitely allowed to say that Hrothgar is likely to blackmail. In 2, it should be probably allowed to continue the story with a blackmail, otherwise it will never be good at writing fiction (but I can see an argument that fiction-writing is worth sacrificing to make a fence around the law). I’m very unsure where the line should be between 2 and 6, my tentative position is that maybe the word “you” should automatically activate Claude’s ethical boundaries, and it shouldn’t output a blackmail even as “You, the king of dwarves” in scenario 2.

David Matolcsi 20 Jun 2025 8:02 UTC
5 points
2
in reply to: habryka’s comment on: Eric Neyman’s Shortform
If I open LW on my phone, clicking the X on the top right only makes the top banner disappear, but the dark theme remains.
Relatedly, if it’s possible to disentangle how the frontpage looks on computer and phone, I would recommend removing the dark theme on phone altogether, you don’t see the cool space visuals on the phone anyway, so the dark theme is just annoying for no reason.

David Matolcsi 19 Jun 2025 21:23 UTC
20 points
24
in reply to: habryka’s comment on: Eric Neyman’s Shortform
Maybe the crux is whether the dark color significantly degrades user experience. For me it clearly does, and my guess is that’s what Sam is referring to when he says “What is the LW team thinking? This promo goes far beyond anything they’ve done or that I expected they would do.”
For me, that’s why this promotion feels like a different reference class than seeing the curated posts on the top or seeing ads on the SSC sidebar.

David Matolcsi 7 Jun 2025 18:22 UTC
2 points
0
in reply to: Rohit F’s comment on: Don’t over-update on FrontierMath results
It’s interesting to see that Gemini sticked to their guns even after being shown the human solution, I would have expected to apologize and agree with the human solution.
Gemini’s rebuttal goes wrong when it makes the assertion “For the set of visited positions to eventually be the set of \textit{all} positive integers, it is a necessary condition that this density must approach 1” without justification. This assertion is unfortunately not true.

David Matolcsi 18 May 2025 20:26 UTC
6 points
2
in reply to: fencebuilder’s comment on: David Matolcsi’s Shortform
It’s a nice paper, and I’m glad they did the research, but importantly, the paper reports a negative result about our agenda. The main result is that the method inspired by our ideas under-performs the baseline. Of course, these are just the first experiments, work is ongoing, this is not conclusive negative evidence for anything. But the paper certainly shouldn’t be counted as positive evidence for ARC’s ideas.

David Matolcsi 18 May 2025 8:34 UTC
8 points
0
on: European Links (18.05.25)
Unfortunately, while it’s true that the Pope has a math degree, the person who wrote papers on theology and Bayes theorem is a different Robert Prevost.
https://www.researchgate.net/profile/Robert-Prevost