Can you share a brief description or a link to your favored metaphysics of reality-fluid? I haven’t yet come across good write-ups on this topic.
David Matolcsi
I currently think I don’t actually need to explain what I see. What is the advantage of explaing it? Under normal circumstances, it’s useful to explain why I see e.g. the washing machine not working, because that gives me useful predictions about how the washing machine will behave in the future, and knowing that is useful for fulfiling my values.
But if I have a preference like “life only has meaning if I win the lottery”, then I think it’s not action-relevant to find an explanation if I see myself winning. The day before the lottery, I was already only making follow-up plans for what to do if I win—that’s the only case where life has meaning. And it was not part of the plan to do an investigation of why I won the lottery—why would I want to waste my time on that? So when I actually win, I in fact don’t investigate it further.
To be clear, I think that caring a lot about universes where you win the lottery, at the expense of other universes, is a very sillly moral belief. If someone held this belief, I would tell them to meditate a little and play with a toddler and imagine if they would really care less about her if she was living in a universe where they don’t win the lottery. But if they come back saying that they genuinely only care about lottery-winning universes, then I can’t argue with their utility function.
I think having at least 0.1% of my caring allocated to short description-length space-time points is not that crazy of a moral belief. And once I have that, it’s reasonable to have a policy like “if I see an ordered world, I will act as if I was not a Boltzmann-brain, because these ordered situations are important for the 0.1% of my values, but are vanishingly rare and unimportant from the perspective of the 99.9%”.
I now see an ordered world, so in accordance with this policy, I act as if I was not a Boltzmann-brain. Acting as if I was not a Boltzmann-brain includes not looking for mechanistic explanations for why I’m not one, so I don’t do that.
or any metaphysics which says you can’t objectively answer the question “Which observer-moments exist or don’t exist / have more weight to them than other observer-moments?”
Why can’t I say that there is no objective answer to which observer-moments exist more, but that I subjectively care more about the observer-moments that happen in mathematically simple universes, in simple-to-describe locations, like Scott Garrabrant describes in Preferences without Existence?
In this ontology, “I expect that my brain won’t dissolve into chaos in the next moment” translates to “I will continue preparing dinner instead of meditating on the meaning of mortality, because preparing dinner helps the versions of me who live in simple locations, and meditating on mortality helps the Boltzmann-brain versions of me who are about to die, and I subjectively care more about the versions of me living in simple locations”.
With some added caveats on what exactly I care about (I hope to publish a sequence of this soon), I find this view more appealing than assuming the existence of a reality fluid that makes some moments objectively more real than others.
Sorry for the late reply. I think your description is not really right.
The system was partway between first-past-the-post per districts and nationally proportional representation, both before and after Orban’s 2011 reforms. It is true that Orban moved the system closer to first-past-the-post, but it’s not true that the old system achieved fully proportional representation: in 2010, under the old system, Orban got 68% of the seats with 52% of the vote.
It’s also not true that the new system actively unbalances the outcomes from the districts. It still brings the results closer to proportional, just not as much as it used to. For example, in 2022, Orban got 68% of the seats while winning 82% of the districts.
I also don’t think that moving a system closer to first-past-the-post by districts is inherently anti-democratic: the US and the UK are democracies.
It is true that the districts aren’t equally sized: in 2022, they ranged from 58 000 to 92 000 voters. However, it is not true that rural votes had 2x the weight. I looked at the relevant Wikipedia article and made Claude calculate the averages: the average size among all districts in 2022 was 73,207 voters; the average size in Budapest was 70,815; and the average size of districts won by Orban’s party was 73,615. So votes in Budapest actually had higher weight than average, and Orban’s party was very slightly disadvantaged by district sizes. This was already 12 years in Orban’s rule, and Western media has already been accusing him of of heavy gerrymandering at the time.
There was a redistricting before the 2026 election, which brought the average size of Budapest districts to 81,301, higher than the 71,872 national average. This change was plausibly politically motivated, but it’s far from rural voters mattering 2x more.
Overall, I think there are many things to criticize in Orban’s rule, but I think the electoral system was pretty fair.
Part of the answer is that I was dumb and I should have realized even at the time of writing the original comment that the writing is not good enough.
But the main reason is that the posts try bridge a pretty wide inferential gap. When I read the first transcripts, I felt like they were good enough because I was reading with the eyes of “this would be a reasonable explainer to myself from three months ago, I understand the points pretty well”. But on further consideration, I realized that this level of explanation will not be useful for basically any other reader. I didn’t know how to do the more careful explanation through voice recording—perhaps this is a skill issue, but I think that voice transcripts are generally much better for writing things up for yourself than for a wider audience. So I needed to rewrite the whole piece more carefully by hand. The notes from the transcription were still useful as a skeleton to build on, but I think basically every single sentence got replaced by the end.
Update: Re-reading the cleaned-up transcripts, I’ve found them basically useless, and now I’m rewriting everything by hand. I think this is largely not Claude’s fault—I’m trying to explain complicated concepts in my posts, and my dictation was just not detailed enough to get everything across.
In any case, I wanted to write this update not to keep up this false data-point here.
I think that the situation needs to be quite extreme for my argument not to work. I think it’s quite likely I will never get to the point where I think that a decision is particularly high-stakes or universal in the grand scheme of things. I think it’s plausible that until negentropy runs out, I will always think that there is an even larger an more complicated distribution of logical counter-factual worlds out there that I haven’t explored yet, compared to which I’m only a tiny speck. So I think plausibly I will always think that I should bet 50-50 when I know nothing about something, because that’s the right policy overall.
I agree though that it’s not entirely impossible that I will come to a point where I no longer have uncertainty about what’s outside the distribution I already explored; I believe that my decision is very high stakes and doesn’t correlate with many other different decisions in my logical distribution; and I believe that worlds where T1 is false are so inconceivable that they can’t be part of my trade coalition of logically counter-factual worlds.
But I think that’s also the point where normal probabilities and betting rules entirely break down for me.
When I make a bet about a 1⁄4 probability even, I imagine it that I’m making decisions for four subagents, representing beliefs in the four different outcomes. Normally, when I bet on coinflips and other mundane questions, these four subagents love each other, and they are utilitarian about maximizing the sum of their resources. So they are okay with making bet on one outcome, which means transferring the money of three subagents to the fourth.
But if I believe that once I learn that T1 is true, I will consider in inconceivable that T1-false worlds can ever be part of my coalition, that’s a different situation. In that case, I think my T1-true and T1-false subagents don’t love each other and are indifferent to each other’s well-being. If I’m offered a bet, that’s equivalent to three subagent transferring their wealth to the fourth, and they will refuse to do that. So if I’m only offered one possible bet (betting on the conjunction of T1 and T2), I think I will bet one-fourth of my wealth on it, independently of the odds.
I agree this sounds a bit like an epicycle, but belief-representing subagents negotiating in a moral parliament is an important part of my world-view for other reasons too, (I will soon send a doc about this to you), so this solution feels quite natural to me. And it’s not like I otherwise have great intuitions about what to do at the point of meta-logical near-omniscience where I am able to tell that my current decision is high-stakes within the entire multiverse of logically counterfactual worlds.
“the votes of rural voters had 2x as much weight as the votes from big cities”
I haven’t heard of this and seems pretty clearly false from what I know. (I’m from Hungary.) What is the source for this?
Thinking more about it, I think I don’t stand by my original reply. It seems possible to have some theorems whose result I currently feel 50-50 about, but which are important enough that I’m at least uncertain if I will ever be able to build a broad enough coalition of logically counter-factual beings that include people where the opposite of the theorem is true.
I think the same problem arises for some empirical questions too—T1 and T2 can be questions like “is iron’s atomic number 26 or 27?” I would have been roughly 50-50 before looking it up, but I’m uncertain if I should try to cooperate with people living in worlds where the atomic number of iron is 27 - I don’t know if those worlds are compatible with life.
However, thinking through these examples, I think I now reject the premise that updateful EDT bets wrongly in your example of the two theorems or in Paul’s original calculator example.
I think in both cases the decision-correlational reference class you should take into account is not just you learning T1 is true and you learning T2 is true within this particular experiment. It’s every instance across the multiverse where beings similar to you need to make bets about questions they have no clue about. Taking all these correlations into account, the correct thing to do is to bet with 50-50.
(As an example: when I’m betting on the atomic number of iron, I shouldn’t think of myself as cooperating with versions of myself who live in a world where iron has 27 protons. Those worlds might not exist. But I’m cooperating with instances where the game-master decided to ask if iron has 25 or 26 protons.)
Separately, at the end of the days, I still want to do acausal trade with a broad coalition of worlds which might or might not include ones where iron has 27 protons and the T1 theorem is false. But I now think that this is a separate question, and updatelessness might not be required in our mortal life.
Yes, I agree the elegant construction will need to rely on some logical arguments, but I think tha’t not that bad.
The way I imagine trade to work is that I propose a distribution of chips among different universes which I would be happy to trade under. For example “every universe in the quantum multiverse each getting chips proportional to what the Born-rule prescribes” is a system I would be happy to trade under. Then I can see which other universes are willing to trade under this chip-distribution, and then we trade with each other using our chips.
I think this extends to trade among logically counter-factual worlds at least in some toy-examples. If an important historical event turned on someone making a bet on the billionth digit of pi, then the logically counterfactual worlds which were identical except the billionth digit of was different can probably make a trade deal among each other because they can all imagine this narrow logically counter-factual distribution, and they all recognize it as a Schelling-point.
I think we can probably go broader than that, and figure out an elegant distribution of chips among logical universes which a) we find fair in the sense that we value the resources of the other universes in proportion to their chips (just like in the Born-rule case), so we are happy to trade under this distribution and b) we think that many other universes in the distribution will share enough core features of our logic to recognize this distribution of chips as an elegant Schelling-point, and something they consider fair under their values. Then we trade with everyone who is willing to trade. Once we are done, we try to construct a broader coalition.
I think it’s likely that we won’t be able to expand the trade coalitions enough to cover all possible logic systems (what would that even mean?) Maybe we will never be able to deal with the guys living in the 1+1=3 universe, because we can’t imagine them, they can’t imagine us, and there is no distribution we both recognize as elegant and fair. In that case, we will leave some value on the table by not being able to trade with each other, but that’s life. I don’t think this means that we need to throw out this decision theory—it was nice enough if we got beneficial trades in as broad circles as we could.
---
Most concretely, I don’t see how you get Dutch-booked here. I tentatively think that any betting that a malicious Dutchman can come up with to get money out of me will be based on simple enough logical counter-factuals that I can form a mutually recognized distribution of logic systems among the affected parties and we will be fine.
Can you give a concrete example of how someone can pump money out of me?
I don’t know, I still don’t see this as that bad of a sign for EDT. Yes, in the far future you will need to trade with people in you confusing and incoherent prior over logics.
But I think this is basically equivalent to handling the “what if you are in a simulation where the simulators intentionally messed with your brain to believe false things about certain logical statements” question. Admittedly, it’s a hard question, but I think everyone will need to deal with something like this.
Maybe I’m confused about how much you believe that my actual life history matters. I think in the case of empirical updatelessness, my life history doesn’t really matter—I will eventually try to trade with people in proportion to something like their measure in the Solomonoff prior, and not with worlds where Austria and Australia are the same country, even though I was uncertain about this empirical fact when I was 5. (Do you agree with this, or do you think life history also matters for empirically updateless trade?)
I expect that logical updatelessness is similar—I will try to use some elegant construction like the Solomonoff prior to put a weight on different logical counterfactuals, and it won’t matter how my prior was constructed in my childhood.
I might be missing something, but the situation doesn’t seem that bad to me.
I tentatively think that we should bullet-bite and be logically and empirically updateless EDT. Admittedly, it’s rough that I don’t really know how my prior looks like, but I think we can deal with that.
The way I imagine it is that following ECL comes in two separate stages. First, we need to take the best actions under ECL that we can as confused mortals. Second, we need need to decide how to use the resources of the universe once we are already surrounded by superintelligent AI advisors and had time to do a Long Reflection.
I think during the mortal phase, it’s okay that we don’t understand the prior very well. I think the only axioms we need to be in place before we become logically updateless are “it’s generally good if agents within the prior try to pursue their goals” and “we should try to follow ECL”. After these are in place, I think the best overall policy is for each agent to look around in their world and try to figure out how to get more optionality for themselves and the ECL coalition, while behaving in a way that feels like it should have robustly good correlated actions across the multiverse.In our case, we can increase optionality by trying to make sure the stars don’t pointlessly burn out; agents following ECL eventually get in control of a big chunk of the world; there are good processes for growth and reflection. Meanwhile, it’s probably good to do less lying and backstabbing, and be merciful towards the weak, because these seem somewhat likely to have correlations with other actions across the multiverse that makes it more likely that ECL agents can follow their goals.
Occasionally, there is a toy example like Sleeping Beauty or the ones you listed in your post where we can explicitly reason about EDT, but usually we just follow the heuristics I described above. I don’t think we have a right to hope for much more clarity as mortals, and thankfully none of this requires knowing how the prior looks like.
In the second stage, when we stabilized the situation, figured out some good growth process, have superintelligent advisors and sent out probes to put out the fires of the distant stars, we still need to decide what we do with the resources. It seems plausible that, as you say “what values we benefit in the future may be primarily determined by their frequency in our prior”.But figuring out the frequency of different beings in our prior doesn’t seem to me an especially intractable problem compared to the already scary question of what values we want to terminally pursue on reflection.
One idea I like for figuring out the prior is building a bigger and bigger coalition in an onion-like manner. We first run simulations of nearby quantum branches and pull out the cooperative people and AIs from there. We learn to live together, share our different perspectives, then we decide on a next broader distribution of quantum branches that we want to trade with, and so on.
I think logical updatelessness commits us to trying to trade with the logically counter-factual beings too at the end. That requires finding a Schelling-point distribution of logically counter-factual universes that the other universes will also agree on. But by the time we get there, we have already reflected on a lot of things, and learned to live together with dinosaur-people we pulled out of simulations. I feel that this giant coalition of humans, AIs and dinosaur-people, all using superintellignet advisors, can pull their ideas together and figure out how they imagine the logical prior and who they should trade with next. This doesn’t feel harder to me than other questions about the meaning of life that we will need to deal with.
Yes, I considered mentioning Ashoka, but I’m worried that his story is largely legendary. (And Chandragupta is likely even more legendary.)
And even in the likely largely legendary story of Ashoka, I think it’s pretty bad that he didn’t resign or at least try harder to compensate his victims.
Hiring someone to do torture for you, then torturing him to death for following your orders, while you retain your crown yourself is a pretty contemptible behavior!
Repentance seems to be very rare among the powerful.
I tried to search with multiple LLMs and in other ways for examples where a king or a dictator realized the evilness of some of their past actions, realized their rule is not justified, and voluntarily resigned. I have not found a single example of this happening.There are some examples of kings and dictators voluntarily resigning, but it’s usually motivated by being tired of ruling (often for health reasons), and very occasionally genuine support for democracy. But as far as I can tell, it’s never because a ruler realized the evil of their ways.
I also searched for crime bosses, warlords and successful large-scale fraudsters who voluntarily gave up their evil ways due to repentance. Again, there were hardly any examples. People sometimes repent in prison; people sometimes turn themselves in to the police when they see they will soon get caught anyway; and people sometimes retire from crime to a safer life-style, keeping their ill-gotten gains to themselves.
I only found two examples of successful criminals changing their ways while still successful due to a change of heart—Nicky Cruz who was a gang leader in New York, and General Butt Naked, a Liberian warlord. And even there, I’m a bit suspicious—many of General Butt Naked’s stories of his previous horrific atrocities seem false, and I wonder what else is false in his story.
I’m interested if people can find better examples of evil leaders and successful criminals repenting while still in power, I would be relieved to see more examples of this happening.
I find the rarity of repentance of the powerful a very sad fact about human nature, and it makes me less optimistic that current dictators and unscrupulous politicians will significantly change for the better if given superintelligent AI advisors. Of course, one can still be an okay or even maybe a good ruler without ever repenting their evil actions in the past, but I still don’t feel great about this.
I think if Kim Jong Un lived for a million years, and had the smartest AI advisors, and access to intelligence augmentation techniques, he would probably still never come to admit that murdering his brother was an evil thing to do. Maybe most of his subjects would still have an okay life under his rule in a post-scarcity AI world, but I think there are limits to how good one’s values can get without facing one’s past sins.
(I’m partially responding to habryka’s recent post on Putin here, but you should mostly treat this post independently of the Putin discussion, I have been planning to write this shortform since a while now. I’m not trying to argue against habryka’s main claim in his post that Putin’s rule would probably be still much better than extinction.)
The claim is that he was not allowed to appear on state television, the one channel that is funded by the state. There are other private TV channels, with much higher viewership, where he could appear. And even the state TV channel didn’t fully ban him, there was a big debate between the main candidates of the 2024 European Parliament election on state TV which significantly contributed to Magyar’s rise.
Yes, the state media was highly biased, which is bad, and the government party used their governmental power to help their political campaign in a number of other very unfair ways, which would be unacceptable in most Western democracies.But to the best of my knowledge, the votes were always fairly counted in every election, there were never any censorship laws (except some rule on LGBTQ topics) and freedom of assembly was almost always respected (except unsuccessfully trying to ban Pride).
I think Western media has been consistently overstating how authoritarian Hungary was, and I think the fact that Magyar managed to win is significant evidence for that.
I’m generally sympathetic to Scott’s positions in this discussion, but I think he is probably very wrong about Ilya.
To the best of my knowledge, Safe Superintelligence has never published a single word about what they plan to do move alignment forward, which is pretty damning. in my opinion.
I have not heard of anyone who is known to be thoughtful about AI safety to have been hired to SSI, and I have not seen any position being advertised to AI safety people. People should correct me if I missed someone good joining SSI, but I think this is also a very bad sign.
My impression is that people who worked with Ilya at OpenAI don’t remember him as being particularly thoughtful about alignment, e.g. much less so than Jan Leike. This is a low confidence, third-hand impression, people can correct me if I’m wrong.
My impression is that the available evidence suggests that Ilya mostly took part in Altman’s firing for (perhaps justified) office politics grievances, and not primarily due to safety concerns. I also think that evidence points to his behavior during and after the incident being kind of cowardly. (I haven’t looked deeply into the details of the battle of the board, and it’s possible I’m wrong on this point, in which case I apologize to Ilya.) I’m also doubtful of how self-sacrificing think actions were—my best guess is that his current net worth is higher (at least on paper) than it would be if he stayed at OpenAI.
I expect that at some point SSI’s investors will grow impatient, and then SSI will start coming out with AI products (perhaps open-source to be cooler), just like everyone else. I don’t expect them to contribute too much to safety, though maybe Ilya will sometimes make some noises about the importance of safety in public speeches, which is nice I guess.
I’m pretty confident in my first two points, much less so in the next two, but I felt someone should respond to Scott on this point. Perhaps @Buck or someone else who expressed skepticism of Ilya’s project can add more information.
Why? Reputational benefits? Avoiding lawsuits?
I don’t think Unsong fits the pattern.
Aaron doesn’t take over the world alone. He merges with seven other wildly different minds, including the villainous Dylan Alvarez. “In William Blake’s prophecies, Albion was the entity formed at the end of time, when all of the different aspects of the human soul finally came together to remake the world”, as one of them says.
And I don’t think the ending is about recreating the world as some kind of rationalist utopia (how would you do that with Dylan an Erica on the team?) - I interpret it more as a “cycle continues” ending where they carry forward God’s already perfect plan into a new world.
See for example this point in the Tosefta, where Scott explains all the Easter eggs:
“As for THARMAS, seven of the ten towers were smoking ruins; the other three were heavily scarred. In the epilogue, THARMAS is going to be used to make the new universe. Seven of ten towers destroyed plus the rest damaged = seven of ten sephirot cracked plus the rest damaged, indicating the new universe will work the same as our own.”
You say
EDT double-counting can be resolved by foregoing the anthropic update (with a variant of minimum-reference-class SSA called “L-zombie anthropics”). However, this fix leads to other strange consequences and is IMO philosophically suspicious.
Can you say more about the strange consequences and unsatisfactoriness, or link to a discussion on this point? My current understanding was that antrhopics, and the concept of probabilities in general, are a lossy abstraction, and the double-counting problem is easily resolved just by not updating. I’m probably missing something here.
Ok, but are there specific chapters or pages he or others could link to? Linking to long novels is not very useful.