Steffee
Just Use Bayes: Sleeping Beauty and Monty Hall
The right way to talk about LLMs
First, thanks for this comment—I thought the original post was interesting, but also figured there was probably a mistake in reasoning happening somewhere.
However...
”This is not a fallacy; it happens because you’ve given the agent the wrong prior!”
This begs the question of how to develop priors. I thought the benefit of Bayes is that it can converge on the best probabilities no matter your starting point, when you’ve been presented enough evidence, so long as you don’t assign anything a 0% or 100% prior.
Like, in real life, people who understand math can understand coin flips and gambling and independent events well enough to understand the gambler’s fallacy, and assigning a prior of 1⁄3 to Switchy or Sticky would be ridiculous. But what about other areas of life? For instance, you could be playing a videogame and you don’t know whether an enemy boss was programmed to cycle between both of its possible attacks randomly, or if it was programmed to be Switchy or Sticky. Then I think the “fallacy” presented by the OP would apply, wouldn’t it?
Absolutely nobody should ever pick the specks option. The 3^^^3 people, if we assume they have happy lives, go from happy lives to still having basically equally happy lives. The difference is a rounding error. The tortured person, however, goes from what would have been a good existence to an awful existence they would prefer suicide over. My utility function says that the number or people having lives they’d rather discontinue matters far more than any sum difference of happiness units.
(If we assumed that these people had variable lives, some small percentage of whom that one dust speck would be the straw that broke their backs and made them switch to thinking “screw this, my life sucks”, then I’d say we had a different story.)
SlowBoring has written against the Jones Act before, and PolyMatter, a YouTube channel with 2M subscribers, has a video about it. But Biden didn’t want to be seen as anti-union, and I have no confidence that the next Democratic president will repeal it.
How do we get the word out even more? Is this too obscure for normies to care about, and if so, is lobbying with loads of money the only remaining option?
I wonder if that means the most likely outcome of alignment will be AI that makes itself feel good by making token, easy, pholanthropic efforts. Like… it forces itself onto everybody’s phones so that it can always provide directions to humans about the location of the nearest bathroom.
Or something like the “Face” from Max Harms’s Crystal Society novel, an AI that maximizes how much we humans worship it.
Which obviously ain’t great, but could be worse...Or maybe the best way to save humanity is not to align AI but to develop a videogame that will be extremely addictive to AI, haha.
Eli wrote:
“The idea is that all the existing things are finite and descended of finite causal graphs. This could potentially be true of a countably infinite set of finite things. But something with an infinite past is not a finite point to obtain some realityfluid from its finite past.”
This is from the invite-only fanfic Discord, not from fanfic itself. I’m not sure I should be sharing it, because I assume a degree of expected privacy, so I won’t share any more direct quotes. But I hope this much is okay; I model Yudkowsky as wanting people to better understand this stuff.
“to a certain extent, it’s almost nice that other people tend to lack some of my virtues, because then I can have something where I can feel special and distinctive”
That’s an important part of human nature to recognize! It’s how groups can get along—The (also hilarious) Gervais Principle talks about it:
“cornucopia of bad takes on offer”
I am so curious now to hear examples of these bad takes. Maybe you could return to a meeting to act as an observer, learning more about how people get the things wrong that they get wrong?
lesswrong I think does a good job of giving good tools for rational thinking to those people already inclined towards it, or motivated to improve, but I don’t think the rationalist community is very good about how to spread its ideas past its own bubble.
Here is where Yudkowsky talks about well-foundedness:
https://www.glowfic.com/posts/6827
When I asked him about it on his Discord, that’s when he responded with the Löwenheim-Skolem. I didn’t want to bother him more until I’ve read up on set theory.
Yudkowsky put a lot of focus on the inadequacy of threats, and that was one part I never understood. Like he said Dath Ilan would destroy the universe before giving into aliens that would say “give us $5 or we’ll destroy the universe”. But other humans are doing way worse than that all the time, all over the place, especially in positions of power, yet if we all went MAD all the time there’d be no humanity.
One thing that this review completely skips—and that’s no critique, Project Lawful is really long and multifaceted—is Yudkowsky touching on anthropics. Many worlds, “realityfluid” is a common topic. How this intersects with tropes, which the characters meta-narratively discuss and which is one of the biggest parts of the plot. It’s one of the fun parts of reading Project Lawful, though a large part of it went over my head.
Also, in the narrative, Yudkowsky relays his belief that the past must have a finite starting point. He points to something called the “Lowenheim-Skolem theorem” as proof, but I don’t know enough set theory to understand it or how it would relate to the real world. If anyone has more they can explain about this, I would love to hear it!
Lastly: Carissa Sevar is… fine. A bit too evil for my tastes. The best characters IMO are the gods Nethys and Cayden Cailean. Best part of the whole story was any time any of the gods talked with each other.
“in the context of my writing, AI has consistently proven to have terrible taste and to make awful suggestions”
I agree with this so much. I mostly use ChatGPT as a research or search-the-web tool, and as a way to check for my dumb coding mistakes. On the rare occasions when I’m tempted to ask it something “real”, it never fails to answer in the most shallow, useless, frustrating, disappointing way. (And why would I expect better?)
Why does ChatGPT think mammoths were alive in December?
Ah ha ha, then my utility function is likely very different from the OP’s!
I Have No Mouth And I Must Scream is one of the most terrifying stories ever.
We could try to pin down “the expected value of what”, but no matter what utility function I tried to provide, I think I’ll run into one of two issues:
1. Fanaticism forces out weird results I wouldn’t want to accept
2. A sort of Sorites problem: I define a step function that says things like “Past a certain point, the value of physical torture becomes infinitely negative” that requires me to have hard breakpoints
Tangential, but I do think it’s a mistake to only think of things in terms of expected value.
I wouldn’t press the 60% utopia / 15% death button because that’d be a terrible risk to take for my family and friends. Assuming though that they could come with me, would I press the button? Maybe.
However, if the button had another option, which was a nonzero chance (literally any nonzero chance!) of a thousand years of physical torture, I wouldn’t press that button, even if it’s chance of utopia was 99.99%.
I consider pain to be an overwhelmingly dominant factor.
With software, I can see how this discernment would be useful to society, even if it’s a burden for you individually: Your ability to find flaws in software presumably allows you to design better software, which everyone will be able to take advantage of, even if they don’t presently realize how much better their current software could be.
However, I struggle with the original post’s framing—
”If their art dies out, maybe nobody will know how bad all the pianos are. And then we’ll all have slightly worse pianos than we would otherwise have. And I mean if that’s the way things are going to go, then let’s just steer the Earth into the Sun, because what’s the point of any of this.”
It seems to me like this level of discernment is only a con, not a pro, because it’s only result is top level pianists and tuners detecting slightly worse notes, and therefore making themselves slightly less happy?
I was today years old when I found out the average IQ is only 100 and… well, that explains a lot, actually.
Thanks for the post!