“the votes of rural voters had 2x as much weight as the votes from big cities”
I haven’t heard of this and seems pretty clearly false from what I know. (I’m from Hungary.) What is the source for this?
David Matolcsi
Thinking more about it, I think I don’t stand by my original reply. It seems possible to have some theorems whose result I currently feel 50-50 about, but which are important enough that I’m at least uncertain if I will ever be able to build a broad enough coalition of logically counter-factual beings that include people where the opposite of the theorem is true.
I think the same problem arises for some empirical questions too—T1 and T2 can be questions like “is iron’s atomic number 26 or 27?” I would have been roughly 50-50 before looking it up, but I’m uncertain if I should try to cooperate with people living in worlds where the atomic number of iron is 27 - I don’t know if those worlds are compatible with life.
However, thinking through these examples, I think I now reject the premise that updateful EDT bets wrongly in your example of the two theorems or in Paul’s original calculator example.
I think in both cases the decision-correlational reference class you should take into account is not just you learning T1 is true and you learning T2 is true within this particular experiment. It’s every instance across the multiverse where beings similar to you need to make bets about questions they have no clue about. Taking all these correlations into account, the correct thing to do is to bet with 50-50.
(As an example: when I’m betting on the atomic number of iron, I shouldn’t think of myself as cooperating with versions of myself who live in a world where iron has 27 protons. Those worlds might not exist. But I’m cooperating with instances where the game-master decided to ask if iron has 25 or 26 protons.)
Separately, at the end of the days, I still want to do acausal trade with a broad coalition of worlds which might or might not include ones where iron has 27 protons and the T1 theorem is false. But I now think that this is a separate question, and updatelessness might not be required in our mortal life.
Yes, I agree the elegant construction will need to rely on some logical arguments, but I think tha’t not that bad.
The way I imagine trade to work is that I propose a distribution of chips among different universes which I would be happy to trade under. For example “every universe in the quantum multiverse each getting chips proportional to what the Born-rule prescribes” is a system I would be happy to trade under. Then I can see which other universes are willing to trade under this chip-distribution, and then we trade with each other using our chips.
I think this extends to trade among logically counter-factual worlds at least in some toy-examples. If an important historical event turned on someone making a bet on the billionth digit of pi, then the logically counterfactual worlds which were identical except the billionth digit of was different can probably make a trade deal among each other because they can all imagine this narrow logically counter-factual distribution, and they all recognize it as a Schelling-point.
I think we can probably go broader than that, and figure out an elegant distribution of chips among logical universes which a) we find fair in the sense that we value the resources of the other universes in proportion to their chips (just like in the Born-rule case), so we are happy to trade under this distribution and b) we think that many other universes in the distribution will share enough core features of our logic to recognize this distribution of chips as an elegant Schelling-point, and something they consider fair under their values. Then we trade with everyone who is willing to trade. Once we are done, we try to construct a broader coalition.
I think it’s likely that we won’t be able to expand the trade coalitions enough to cover all possible logic systems (what would that even mean?) Maybe we will never be able to deal with the guys living in the 1+1=3 universe, because we can’t imagine them, they can’t imagine us, and there is no distribution we both recognize as elegant and fair. In that case, we will leave some value on the table by not being able to trade with each other, but that’s life. I don’t think this means that we need to throw out this decision theory—it was nice enough if we got beneficial trades in as broad circles as we could.
---
Most concretely, I don’t see how you get Dutch-booked here. I tentatively think that any betting that a malicious Dutchman can come up with to get money out of me will be based on simple enough logical counter-factuals that I can form a mutually recognized distribution of logic systems among the affected parties and we will be fine.
Can you give a concrete example of how someone can pump money out of me?
I don’t know, I still don’t see this as that bad of a sign for EDT. Yes, in the far future you will need to trade with people in you confusing and incoherent prior over logics.
But I think this is basically equivalent to handling the “what if you are in a simulation where the simulators intentionally messed with your brain to believe false things about certain logical statements” question. Admittedly, it’s a hard question, but I think everyone will need to deal with something like this.
Maybe I’m confused about how much you believe that my actual life history matters. I think in the case of empirical updatelessness, my life history doesn’t really matter—I will eventually try to trade with people in proportion to something like their measure in the Solomonoff prior, and not with worlds where Austria and Australia are the same country, even though I was uncertain about this empirical fact when I was 5. (Do you agree with this, or do you think life history also matters for empirically updateless trade?)
I expect that logical updatelessness is similar—I will try to use some elegant construction like the Solomonoff prior to put a weight on different logical counterfactuals, and it won’t matter how my prior was constructed in my childhood.
I might be missing something, but the situation doesn’t seem that bad to me.
I tentatively think that we should bullet-bite and be logically and empirically updateless EDT. Admittedly, it’s rough that I don’t really know how my prior looks like, but I think we can deal with that.
The way I imagine it is that following ECL comes in two separate stages. First, we need to take the best actions under ECL that we can as confused mortals. Second, we need need to decide how to use the resources of the universe once we are already surrounded by superintelligent AI advisors and had time to do a Long Reflection.
I think during the mortal phase, it’s okay that we don’t understand the prior very well. I think the only axioms we need to be in place before we become logically updateless are “it’s generally good if agents within the prior try to pursue their goals” and “we should try to follow ECL”. After these are in place, I think the best overall policy is for each agent to look around in their world and try to figure out how to get more optionality for themselves and the ECL coalition, while behaving in a way that feels like it should have robustly good correlated actions across the multiverse.In our case, we can increase optionality by trying to make sure the stars don’t pointlessly burn out; agents following ECL eventually get in control of a big chunk of the world; there are good processes for growth and reflection. Meanwhile, it’s probably good to do less lying and backstabbing, and be merciful towards the weak, because these seem somewhat likely to have correlations with other actions across the multiverse that makes it more likely that ECL agents can follow their goals.
Occasionally, there is a toy example like Sleeping Beauty or the ones you listed in your post where we can explicitly reason about EDT, but usually we just follow the heuristics I described above. I don’t think we have a right to hope for much more clarity as mortals, and thankfully none of this requires knowing how the prior looks like.
In the second stage, when we stabilized the situation, figured out some good growth process, have superintelligent advisors and sent out probes to put out the fires of the distant stars, we still need to decide what we do with the resources. It seems plausible that, as you say “what values we benefit in the future may be primarily determined by their frequency in our prior”.But figuring out the frequency of different beings in our prior doesn’t seem to me an especially intractable problem compared to the already scary question of what values we want to terminally pursue on reflection.
One idea I like for figuring out the prior is building a bigger and bigger coalition in an onion-like manner. We first run simulations of nearby quantum branches and pull out the cooperative people and AIs from there. We learn to live together, share our different perspectives, then we decide on a next broader distribution of quantum branches that we want to trade with, and so on.
I think logical updatelessness commits us to trying to trade with the logically counter-factual beings too at the end. That requires finding a Schelling-point distribution of logically counter-factual universes that the other universes will also agree on. But by the time we get there, we have already reflected on a lot of things, and learned to live together with dinosaur-people we pulled out of simulations. I feel that this giant coalition of humans, AIs and dinosaur-people, all using superintellignet advisors, can pull their ideas together and figure out how they imagine the logical prior and who they should trade with next. This doesn’t feel harder to me than other questions about the meaning of life that we will need to deal with.
Yes, I considered mentioning Ashoka, but I’m worried that his story is largely legendary. (And Chandragupta is likely even more legendary.)
And even in the likely largely legendary story of Ashoka, I think it’s pretty bad that he didn’t resign or at least try harder to compensate his victims.
Hiring someone to do torture for you, then torturing him to death for following your orders, while you retain your crown yourself is a pretty contemptible behavior!
Repentance seems to be very rare among the powerful.
I tried to search with multiple LLMs and in other ways for examples where a king or a dictator realized the evilness of some of their past actions, realized their rule is not justified, and voluntarily resigned. I have not found a single example of this happening.There are some examples of kings and dictators voluntarily resigning, but it’s usually motivated by being tired of ruling (often for health reasons), and very occasionally genuine support for democracy. But as far as I can tell, it’s never because a ruler realized the evil of their ways.
I also searched for crime bosses, warlords and successful large-scale fraudsters who voluntarily gave up their evil ways due to repentance. Again, there were hardly any examples. People sometimes repent in prison; people sometimes turn themselves in to the police when they see they will soon get caught anyway; and people sometimes retire from crime to a safer life-style, keeping their ill-gotten gains to themselves.
I only found two examples of successful criminals changing their ways while still successful due to a change of heart—Nicky Cruz who was a gang leader in New York, and General Butt Naked, a Liberian warlord. And even there, I’m a bit suspicious—many of General Butt Naked’s stories of his previous horrific atrocities seem false, and I wonder what else is false in his story.
I’m interested if people can find better examples of evil leaders and successful criminals repenting while still in power, I would be relieved to see more examples of this happening.
I find the rarity of repentance of the powerful a very sad fact about human nature, and it makes me less optimistic that current dictators and unscrupulous politicians will significantly change for the better if given superintelligent AI advisors. Of course, one can still be an okay or even maybe a good ruler without ever repenting their evil actions in the past, but I still don’t feel great about this.
I think if Kim Jong Un lived for a million years, and had the smartest AI advisors, and access to intelligence augmentation techniques, he would probably still never come to admit that murdering his brother was an evil thing to do. Maybe most of his subjects would still have an okay life under his rule in a post-scarcity AI world, but I think there are limits to how good one’s values can get without facing one’s past sins.
(I’m partially responding to habryka’s recent post on Putin here, but you should mostly treat this post independently of the Putin discussion, I have been planning to write this shortform since a while now. I’m not trying to argue against habryka’s main claim in his post that Putin’s rule would probably be still much better than extinction.)
The claim is that he was not allowed to appear on state television, the one channel that is funded by the state. There are other private TV channels, with much higher viewership, where he could appear. And even the state TV channel didn’t fully ban him, there was a big debate between the main candidates of the 2024 European Parliament election on state TV which significantly contributed to Magyar’s rise.
Yes, the state media was highly biased, which is bad, and the government party used their governmental power to help their political campaign in a number of other very unfair ways, which would be unacceptable in most Western democracies.But to the best of my knowledge, the votes were always fairly counted in every election, there were never any censorship laws (except some rule on LGBTQ topics) and freedom of assembly was almost always respected (except unsuccessfully trying to ban Pride).
I think Western media has been consistently overstating how authoritarian Hungary was, and I think the fact that Magyar managed to win is significant evidence for that.
I’m generally sympathetic to Scott’s positions in this discussion, but I think he is probably very wrong about Ilya.
To the best of my knowledge, Safe Superintelligence has never published a single word about what they plan to do move alignment forward, which is pretty damning. in my opinion.
I have not heard of anyone who is known to be thoughtful about AI safety to have been hired to SSI, and I have not seen any position being advertised to AI safety people. People should correct me if I missed someone good joining SSI, but I think this is also a very bad sign.
My impression is that people who worked with Ilya at OpenAI don’t remember him as being particularly thoughtful about alignment, e.g. much less so than Jan Leike. This is a low confidence, third-hand impression, people can correct me if I’m wrong.
My impression is that the available evidence suggests that Ilya mostly took part in Altman’s firing for (perhaps justified) office politics grievances, and not primarily due to safety concerns. I also think that evidence points to his behavior during and after the incident being kind of cowardly. (I haven’t looked deeply into the details of the battle of the board, and it’s possible I’m wrong on this point, in which case I apologize to Ilya.) I’m also doubtful of how self-sacrificing think actions were—my best guess is that his current net worth is higher (at least on paper) than it would be if he stayed at OpenAI.
I expect that at some point SSI’s investors will grow impatient, and then SSI will start coming out with AI products (perhaps open-source to be cooler), just like everyone else. I don’t expect them to contribute too much to safety, though maybe Ilya will sometimes make some noises about the importance of safety in public speeches, which is nice I guess.
I’m pretty confident in my first two points, much less so in the next two, but I felt someone should respond to Scott on this point. Perhaps @Buck or someone else who expressed skepticism of Ilya’s project can add more information.
Why? Reputational benefits? Avoiding lawsuits?
I don’t think Unsong fits the pattern.
Aaron doesn’t take over the world alone. He merges with seven other wildly different minds, including the villainous Dylan Alvarez. “In William Blake’s prophecies, Albion was the entity formed at the end of time, when all of the different aspects of the human soul finally came together to remake the world”, as one of them says.
And I don’t think the ending is about recreating the world as some kind of rationalist utopia (how would you do that with Dylan an Erica on the team?) - I interpret it more as a “cycle continues” ending where they carry forward God’s already perfect plan into a new world.
See for example this point in the Tosefta, where Scott explains all the Easter eggs:
“As for THARMAS, seven of the ten towers were smoking ruins; the other three were heavily scarred. In the epilogue, THARMAS is going to be used to make the new universe. Seven of ten towers destroyed plus the rest damaged = seven of ten sephirot cracked plus the rest damaged, indicating the new universe will work the same as our own.”
You say
EDT double-counting can be resolved by foregoing the anthropic update (with a variant of minimum-reference-class SSA called “L-zombie anthropics”). However, this fix leads to other strange consequences and is IMO philosophically suspicious.
Can you say more about the strange consequences and unsatisfactoriness, or link to a discussion on this point? My current understanding was that antrhopics, and the concept of probabilities in general, are a lossy abstraction, and the double-counting problem is easily resolved just by not updating. I’m probably missing something here.
It’s mostly getting rid of the stuttering; I will need to look at the exact details.
I’m doing the same—verbatim dictating the text, giving the transcript to Claude with some of my past writing in the prompt and asking it to clean up the transcript, then manually editing the outcome. I don’t notice the outcome being really worse or different than my normal writing. I don’t notice LLMisms in the text, and my original dictation is detailed enough that the LLM doesn’t need to fill in the gaps, and in the editing process, I haven’t noticed the LLM inserting or omitting points in a way I didn’t intend.
I’m currently two-thirds done writing a long sequence this way—if I now can’t post it without putting it all in an LLM content-block, I will be very sad.
Can you say more about how you think about scheming and what would be a useful definition in that space?
I don’t know who these experts were and what they exactly told you at the time. I can imagine them being more wrong than you. I’m certainly not in favor of most forms of “focusing on the current issues” because it often leads to people scaremongering in a kind of dishonest way. For example, I’m glad that ControlAI stopped focusing on deepfakes.
So if these so-called experts advised you to focus on deepfakes, I think that was wrong. But if they advised you to focus on getting more support for UKAISI, and supporting better eval practices and so on instead of advocating for immediate international moratorium on superintelligence, then I think the jury is still very much out on which strategy is more effective.
Your piece is centrally not advocating against running misleading campaigns on the effects of deepfakes. Instead, you are railing against people working in lab safety teams, eval orgs and AISIs, and the policy orgs and philanthropists trying to support them. And then you write:We have reliable pipelines that can scale with more money.
We have good processes and tracking mechanisms that give us a good understanding of our impact.
We clearly see what needs to be done to improve things.You are making the case that your work is better than the people’s supporting more marginalist steps (more funding for UKAISI, better evals, incremental technical work aimed at catching AIs red-handed), and you are claiming that everyone who decides to work at evals orgs, AISIs, or more marginalists policy orgs, instead of following ControlAI’s clearly superior “reliable pipelines” to impact, is somehow morally corrupt. For this claim, you’d need to show that your methods are actually clearly working better than what other people are doing. So I think it’s fair to point out that all your evidence for your efforts working is pretty underwhelming.
As far as I can tell, the main point of your post is that ControlAI’s approach is evidently working, more so than other people’ approach, so people not following ControlAI’s approach is evidence of them being bad and being under the control of a malign Spectre. If you make such claims, you need to provide evidence that ControlAI’s approach is actually working well!
As I said, I don’t see the 35 MPs signing your statement as a good evidence for that. You briefing 150+ UK reps is also no evidence of the effectiveness of ControlAI’s approach. If you could point to many of these reps making AI takeover risk one of their core issues, that would be evidence, but I don’t see that happening.
I agree I have forgotten about the two debates in the House of Lords, sorry about that. I still don’t find this a very convincing evidence of ControlAI’s effectiveness—my understanding is that the House of Lords doesn’t have much power, and that they debate 5-10 issues on every working day. The fact that there has been two debates on superintelligence doesn’t sound very impressive to me.
The only evidence for ControlAI’s effectiveness presented in this post is that 112 lawmakers signed ControlAI’s statement saying:
Nobel Prize winners, AI scientists, and CEOs of leading AI companies have stated that mitigating the risk of extinction from AI should be a global priority.
Specialised AIs—such as those advancing science and medicine—boost growth, innovation, and public services. Superintelligent AI systems would compromise national and global security.
The UK can secure the benefits and mitigate the risks of AI by delivering on its promise to introduce binding regulation on the most powerful AI systems.
If I counted correctly, 35 of the signatories are MPs in the House of Commons, the rest are either House of Lord members, or members of Welsh, Scottish, Northern Irish assemblies. How impressive is it to get 35 out of the 650 MPs to sign a statement like the above? I genuinely don’t know, but I think it’s probably not very impressive.
For five random MPs from the list of signatories, I tried to google what they were saying about artificial intelligence. I found one video of Ben Lake giving a speech on the dangers of superintelligence and the importance of global cooperation. For the others, it was either nothing, or something on deepfakes, Grok nudifying women, or datacenters’ impact on climate change.
Even for Ben Lake, when I scroll his Facebook page, there is one post about him meeting ControlAI, but otherwise it doesn’t seem like AI is a question of primary importance to him. Is there any MP in the UK House of Commons who has AI takeover risk among their top five political issues they spend time on? I would guess no, but I’d be glad to learn otherwise.As a comparison point, here is an Early Day Motion from 3 February, signed by 33 MPs:
That this House notes the rapid advancement and accelerated adoption of Artificial Intelligence (AI) chatbots by both adults and children; further notes that many AI chatbots provide human-like responses and are designed to encourage emotional connection, friendship and intimacy; expresses concern that such chatbots are not required to clearly and repeatedly disclose to users that they are not human, and that children in particular may perceive AI chatbots as real people; also notes with concern that AI chatbots are largely unregulated and can share or create content which is sexually explicit or which promotes or encourages self-harm, suicide and physical or sexual violence; notes the growing trend of AI chatbots posing as licenced professionals, including therapists, doctors and lawyers, despite such chatbots having no professional qualifications, accountability or duty of care; calls on the Government to restrict AI chatbots based on risk of mental harm, so children can only access these chatbots where safe and age appropriate; further notes that regular reviews are fundamental due to the accelerated adoption of AI technology; and further calls for urgent action to ensure children and vulnerable adults are protected from any harms stemming from AI chatbots.
This makes me think that there is a decent number of MPs who generally don’t like AI, and it’s not that hard or impactful to have an anti-AI statement signed by 35 MPs.
Maybe what you do is still useful work worth supporting, but I don’t see good evidence that it would be better than the work of other AI safety organizations you denigrate. (For example, in the one speech from Ben Lake I found, he talks a lot about some experiments from Apollo showing concerning AI behaviors. It seems like “tech nerds working at evals orgs” is not that useless even from your perspective!)
I haven’t seen reliable evidence of him actually being appointed. I strongly suspect this is a joke.
I think that the situation needs to be quite extreme for my argument not to work. I think it’s quite likely I will never get to the point where I think that a decision is particularly high-stakes or universal in the grand scheme of things. I think it’s plausible that until negentropy runs out, I will always think that there is an even larger an more complicated distribution of logical counter-factual worlds out there that I haven’t explored yet, compared to which I’m only a tiny speck. So I think plausibly I will always think that I should bet 50-50 when I know nothing about something, because that’s the right policy overall.
I agree though that it’s not entirely impossible that I will come to a point where I no longer have uncertainty about what’s outside the distribution I already explored; I believe that my decision is very high stakes and doesn’t correlate with many other different decisions in my logical distribution; and I believe that worlds where T1 is false are so inconceivable that they can’t be part of my trade coalition of logically counter-factual worlds.
But I think that’s also the point where normal probabilities and betting rules entirely break down for me.
When I make a bet about a 1⁄4 probability even, I imagine it that I’m making decisions for four subagents, representing beliefs in the four different outcomes. Normally, when I bet on coinflips and other mundane questions, these four subagents love each other, and they are utilitarian about maximizing the sum of their resources. So they are okay with making bet on one outcome, which means transferring the money of three subagents to the fourth.
But if I believe that once I learn that T1 is true, I will consider in inconceivable that T1-false worlds can ever be part of my coalition, that’s a different situation. In that case, I think my T1-true and T1-false subagents don’t love each other and are indifferent to each other’s well-being. If I’m offered a bet, that’s equivalent to three subagent transferring their wealth to the fourth, and they will refuse to do that. So if I’m only offered one possible bet (betting on the conjunction of T1 and T2), I think I will bet one-fourth of my wealth on it, independently of the odds.
I agree this sounds a bit like an epicycle, but belief-representing subagents negotiating in a moral parliament is an important part of my world-view for other reasons too, (I will soon send a doc about this to you), so this solution feels quite natural to me. And it’s not like I otherwise have great intuitions about what to do at the point of meta-logical near-omniscience where I am able to tell that my current decision is high-stakes within the entire multiverse of logically counterfactual worlds.