WilliamKiely

Karma: 998

WilliamKiely 29 Nov 2025 16:16 UTC
1 point
0
on: Information Hygiene
You can’t fact check everything you hear and read; you literally don’t have the time, energy, or knowledge needed.
I’ve long thought that it’s also true that an entrpreneur could build a tool that allows people to easily see whether virtually everything they read or see on the internet is true.
On LessWrong if a reader thinks something someone says in a post is false they can highlight the sentence and Disagree-react it. Then everyone else reading the post can see that the sentence is highlighted and see who said they disagreed with it. This is great for epistemics.
I envision a system (could be as simple as a browser extension) that allows users to frictionlessly report their feedback/beliefs when reading any content online, noting when things they read seem true or false or definitely false, etc. The system crowdsources all of this epistemic feedback and then uses the data to estimate whether things actually are true or false, and shares this insight with other users.
Then no longer will someone have to read a news article or post that 100 or more other people have already read and be left to their own devices to determine what parts are true or not.
Perhaps some users might not trust the main algorithm’s judgment and would prefer to choose a set of other users who they trust have good judgment, and have their personalized algorithm give these people’s epistemic feedback extra weight. Great, the system should have this feature.
Perhaps some users mark something as false and later other users come along and show that it is true. Then perhaps the first users should have an epistemic score that goes down as a consequence of their mistaken/bad epistemic feedback.
Perhaps the system should track how good of judgment users have over time to ascertain which users give reliable feedback and which users have bad epistemics and largely just contribute noise.
There are a lot of features that could be added to such a system. But the point is that I’ve read far too many news articles and posts on the broader internet and had the experience of noting a mistake or inaccuracy or outright falsehood, and then moved on without sharing the insight with anyone due to there being no efficient way to do so.
Surely also there are many inaccuracies that I miss, and I’d benefit from being informed by others who did catch them noting that they were there in a way I could just believe as a non-expert on the claim.
First, environment: if you want to believe true things, try not to spend too much time around people who are going to sneeze false information or badly reasoned arguments into your face. You can’t fact check everything you hear and read; you literally don’t have the time, energy, or knowledge needed. Cultivate a social network that cares about true things.
This is good advice, but I really wish (and think it possible) that some competent entrepreneurs made it much less needed by creating epistemic tools that enhance the ability of anyone to discern what’s true out in the wild where people do commonly sneeze false information in your face.

WilliamKiely 4 Nov 2025 22:19 UTC
26 points
18
on: Leaving Open Philanthropy, going to Anthropic
What’s more, I think no private company should be in a position to impose this kind of risk on every living human, and I support efforts to make sure that no company ever is.
I don’t see your name on the Statement on Superintelligence when I search for it. Assuming you didn’t sign it, why not? Do you disagree with it?
It seems like an effort to make sure that no company is in the position to impose this kind of risk on every living human:
We call for a prohibition on the development of superintelligence, not lifted before there is
1. broad scientific consensus that it will be done safely and controllably, and
2. strong public buy-in.
(Several Anthropic, OpenAI, and Google DeepMind employees signed.)

WilliamKiely 20 Oct 2025 23:13 UTC
14 points
0
on: Consider donating to Alex Bores, author of the RAISE Act

WilliamKiely 26 Sep 2025 17:12 UTC
7 points
2
on: IABIED Misc. Discussion Thread
Chapter 5, “Its Favorite Things,” starts with Yudkowsky’s “Correct-Nest parable” about intelligent aliens who care a lot about the exact number of stones found in their nests.
Immediately after the parable, on page 82:
Most alien species, if they evolved similarly to how known biological evolution usually works, and if given a chance to have things the way they liked them most, probably would not choose a civilization where all their homes contained a large prime number of stones. There are just a lot of other ways to be; there are a lot of other directions one could steer. Much like predicting that your next lottery ticket won’t be a winning one, this is an easy call.
Similarly, most powerful artificial intelligences, created by any method remotely resembling the current methods, would not choose to build a future full of happy, free people. We aren’t saying this because we get a kick out of being bleak. It’s just that those powerful machine intelligences will not be born with preferences much like ours.
This is just a classic “counting argument” against alignment efforts being successful, right?
I recall Alex Turner (TurnTrout) arguing that (at least some) counting arguments (that are often made) are wrong (Many arguments for AI x-risk are wrong) and quoting Nora Belrose and Quintin Pope arguing the same (Counting arguments provide no evidence for AI doom). Some people in the comments, such as Evan Hubinger, seem to disagree, but as a layperson the discussion became too technical for me to understand.
In any case, the version of the counting argument in the book seems simple enough that as a layperson I can tell that it’s wrong. To me it seems like it clearly proves too much.
Insofar as Yudkowsky and Soares are saying here that an ASI created by any method remotely resembling the current method will likely not choose to build a future full of happy, free people because there are many more possible preferences that an ASI could have than the narrow subset of preferences that would lead to it building a future of happy, free people, then I think the argument is wrong.
It seems like this counting observation is a reason to think (so maybe I think the “no evidence” in the above linked post title is too strong) that the preferences an ASI ends up having might not be the preferences its creators try training into it, because the target preferences are indeed a narrow target and narrow targets are easier to miss than broad targets. But surely this counting observation is not sufficient to conclude that ASI creators will fail to hit their narrow target. It seems like you would need more reasons to conclude that.

IABIED Misc. Discussion Thread

WilliamKiely26 Sep 2025 16:22 UTC

5 points

5 comments1 min readLW link

WilliamKiely 22 Sep 2025 16:13 UTC
4 points
1
in reply to: Random Developer’s comment on: Contra Collier on IABIED
Agreed that current models fail badly at alignment in many senses.
I still feel like the bet that OP offered Collier in response to her stating that currently available techniques do a reasonably good job of making potentially alien and incomprehensible jealous ex-girlfriends like “Sydney” very rare was inappropriate, as the bet was clearly about a different claim than her claim about the frequency of Sydney-like behavior.
A more appropriate response from OP would have been to say that while current techniques may have successfully reduced the frequency of Syndey-like behavior, they’re still failing badly in other respects, such as your observation with Claude Code.

WilliamKiely 22 Sep 2025 16:03 UTC
1 point
0
in reply to: Davidmanheim’s comment on: Contra Collier on IABIED
But the way you are reading it seems to mean her “strawmann[ed]” point is irrelevant to the claim she made!
I agree.

WilliamKiely 20 Sep 2025 17:39 UTC
16 points
14
on: Contra Collier on IABIED
(I only skimmed your review / quickly read about half of it. I agree with some of your criticisms of Collier’s review and disagree with others. I don’t have an overall take.)
One criticism of Collier’s review you appeared not to make that I would make is the following.
Collier wrote:
By far the most compelling argument that extraordinarily advanced AIs might exist in the future is that pretty advanced AIs exist right now, and they’re getting more advanced all the time. One can’t write a book arguing for the danger of superintelligence without mentioning this fact.
I disagree. I think it was clear decades before the pretty advanced AIs of today existed that extraordinarily advanced AIs might exist (and indeed probably would exist) eventually. As such, the most compelling argument that extraordinarily advanced AIs might or probably will exist in the future is not that pretty advanced AIs exist today, but the same argument one could have made (and some did make) decades ago.
One version of the argument is that the limits of how advanced AI could be in principle seem extraordinarily advanced (human brains are an existence proof and human brains have known limitations relative to machines) and it seems unlikely that AI progress would permantently stall before getting to a point where there are extraordinarily advanced AIs.
E.g. I.J. Good foresaw superintelligent machines, and I don’t think he was just getting lucky to imagine that they might or probably would come to exist at some point. I think he had access to compelling reasons.
The existence of pretty advanced AIs today is some evidence and allows us to be a bit more confident that extraordinarily advanced AIs will eventually be built, but their existence is not the most compelling reason to expect significantly more capable AIs to be created eventually.

WilliamKiely 20 Sep 2025 17:17 UTC
2 points
2
on: Contra Collier on IABIED
[C]urrently available techniques do a reasonably good job of addressing this problem. ChatGPT currently has 700 million weekly active users, and overtly hostile behavior like Sydney’s is vanishingly rare.
Yudkowsky and Soares might respond that we shouldn’t expect the techniques that worked on a relatively tiny model from 2023 to scale to more capable, autonomous future systems. I’d actually agree with them. But it is at the very least rhetorically unconvincing to base an argument for future danger on properties of present systems without ever mentioning the well-known fact that present solutions exist.
It is not a “well-known fact” that we have solved alignment for present LLMs. If Collier believes otherwise, I am happy to make a bet and survey some alignment researchers.
I think you’re strawmanning her here.
Her “present solutions exist” statement clearly refers to her “techniques [that] do a reasonably good job of addressing this problem [exist]” from the previous paragraph that you didn’t quote (that I added in the quote above). I.e. She’s clearly not claiming that alignment for present LLMs is completely solved, just that solutions that work “reasonably well” exist such that overtly hostile behavior like Bing Sydney’s is rare.

WilliamKiely 18 Sep 2025 23:05 UTC
8 points
2
on: IABIED Review—An Unfortunate Miss
Fair review. As I’ve now said elsewhere, after listening to IABIED I think your book Uncontrollable is probably still the best overview of AI risk for a general audience. More people should definitely read your book. I’d be down to write a more detailed comparison in a week or two once I have hardcopies of each book (still in the mail).

WilliamKiely 18 Sep 2025 21:36 UTC
5 points
0
in reply to: Buck’s comment on: I enjoyed most of IABIED
FWIW Darren’s book Uncontrollableis my current top recommended book on AI.
While I expected (75% chance) IABIED to overtake it, after listening to the audiobook Tuesday I don’t think IABIED is better (though I’ll wait until I receive and reread my hardcopy to declare that definitively).
As I wrote on Facebook 10 months ago:
The world is not yet as concerned as it should be about the impending development of smarter-than-human AI. Most people are not paying enough attention.
What one book should most people read to become informed and start to remedy this situation?
“Uncontrollable: The Threat of Artificial Superintelligence and the Race to Save the World” by Darren McKee is now my top recommendation, ahead of:
- “Superintelligence” by Nick Bostrom,
- “Human Compatible” by Stuart Russell, and
- “The Alignment Problem” by Brian Christian
It’s a short, easy read (6 hours at ~120wpm / 2x speed on Audible) covering all of the most important topics related to AI, from what’s happening in the world of AI, to what risks from AI humanity faces in the near future, to what each and everyone one of us can do to help with the most important problem of our time.

WilliamKiely 18 Sep 2025 18:57 UTC
2 points
0
on: I enjoyed most of IABIED
I’m less worried about this after reading the book, because the book was good enough that it’s hard for me to imagine someone else writing a much better one.
I was really hoping you’d say “after reading the book, I updated toward thinking that I could probably help a better book get written.”
My view is still that a much better Intro to AI risk can still get written.
I currently lean toward Darren McKee’s Uncontrollable still being a better intro than IABIED, though I’m going to reread IABIED once my hardcopy arrives before making a confident judgment.

WilliamKiely 18 Sep 2025 18:50 UTC
2 points
0
on: I enjoyed most of IABIED
I independently had this same thought when listening to the book on Tuesday, and think it’s worth emphasizing:
I again think they’re inappropriately reasoning about what happens for arbitrarily intelligent models instead of reasoning about what happens with AIs that are just barely capable enough to count as ASI. Their arguments (that AIs will learn goals that are egregiously misaligned with human goals and then conspire against us) are much stronger for wildly galaxy-brained AIs than for AIs that are barely smart enough to count as superhuman.

WilliamKiely 18 Sep 2025 18:49 UTC
3 points
0
on: I enjoyed most of IABIED
“If anyone builds it (with techniques like those available today), everyone dies”
One could argue that the parenthetical caveat is redundant if the “it” means something like “superintelligent AI built with techniques like those available today”.
I also listened to the book and don’t have the written text available yet, so I’ll need to revisit it when my hardcopy arrives to see if I agree that there are problematic uncaveated versions of the title throughout the text.
(At first I disliked the title because it seemed uncaveated, but again, the “it” in the title is ambiguous and can be interpreted as including the caveats, so now I’m more neutral about the title.)

WilliamKiely 18 Sep 2025 18:40 UTC
3 points
3
on: Interview with Eliezer Yudkowsky on Rationality and Systematic Misunderstanding of AI Alignment
I listened to the book and watched your interview and read Buck’s review and I think it makes sense for Buck’s review to have more karma. I think link-posts get less karma in general and people who want to enjoy your podcast episodes don’t rely on LW posts to discover them.
My comment here is in response to your tweet: https://x.com/liron/status/1968499070395154942
LessWrong is a circular firing squad.
My deep dive with founder-of-site-&-associated-movement to promote his potentially world-saving book — 83 votes
Buck’s “tentatively supportive” review of said book, questioning whether someone more reasonable could’ve written it — 152 votes
83 karma from 41 votes isn’t like you were heavily down-voted or anything. It’s a good amount of karma and seems appropriate. (I didn’t vote on either post.)

WilliamKiely 22 Aug 2025 15:56 UTC
1 point
0
in reply to: WilliamKiely’s comment on: WilliamKiely’s Shortform
-5 agreement karma from 3 people, but I have no indication of why people disagree. The point of writing this up was to find out why people disagree, so it’d be helpful if someone offered an explanation for their view.

WilliamKiely’s Shortform

WilliamKiely21 Aug 2025 18:01 UTC

5 points

2 comments1 min readLW link

WilliamKiely 21 Aug 2025 18:01 UTC
5 points
−4
on: WilliamKiely’s Shortform
P(extinction by 2050 | doom) > 75% is way too high, right?
I commented about this here.

WilliamKiely 19 Aug 2025 18:09 UTC
1 point
−1
on: My views on “doom”
In Vitalik Buterin’s recent appearance on Liron Shapira’s Doom Debates podcast, Vitalik stated that his p(doom) is currently about 12% and his p(extinction by 2050) is at least 9%. This means Vitalik’s p(extinction by 2050 | doom) > 75%, which I think is way too high.
My reading of Paul’s views stated here is that Paul’s p(extinction by 2050 | doom) is probably < 30%.

I referenced Paul’s views in my comment on the Doom Debates YouTube video, which I’m copying here in case someone thinks my reading of Paul’s views is mistaken, so they can correct me:
Vitalik’s p(extinction by 2050 | doom) > 75%.
Vitalik’s numbers concretely: 9+% / 12% > 75%.
I think this >75% is significantly too high.
In other words, I think Vitalik seems to be overconfident in near-term extinction conditional on AI causing doom eventually.
I’m not the only person with high p(doom) (note: my p(doom) is ~60%) who thinks that Vitalik is overconfident about this.
For example, Paul Christiano, based on his numbers from his 2023 “My views on “doom”” post on LessWrong, thinks p(extinction by 2050 | doom) < 43%, and probably < 30%. Paul’s numbers:
1. “Probability that most humans die within 10 years of building powerful AI (powerful enough to make human labor obsolete): 20%”
2. “Probability that humanity has somehow irreversibly messed up our future within 10 years of building powerful AI: 46%”.
So Paul’s p(extinction by 2050 | doom) < 20%/46% = 43%.
Paul’s p(extinction by 2050 | doom) is probably actually significantly lower than 43%, perhaps < 30% because “most humans die” does not necessarily mean extinction.
Note: Paul also assigns substantial probability mass to scenarios where less than half of humans die—He thinks (as of his 2023 post) at least half of humans die in only 50% of takeover scenarios and in only 37% of non-takeover scenarios in which humanity has irreversibly messed up it’s future within 10 years of building powerful AI (doom). So I’d guess that his p(all humans die | most humans die within 10 years of building powerful AI) is < 70%, hence why I said his p(extinction by 2050 | doom) is probably < 30% (math: 70% of 43% is 30%).
Another important factor that potentially brings Paul’s conditional even lower than 30%: We might not get powerful AI before 2040. So for example, if we get powerful AI in 2045, then even if it causes extinction say 9 years later in 2054, then extinction has not occurred by 2050, even though doom and near-term extinction occur.
So Vitalik’s p(extinction by 2050 | doom) > 75% is strongly in disagreement with e.g. Paul Christiano, even though Paul’s p(doom) is much higher than Vitalik’s.
I suspect that upon consideration of these points Vitalik would raise his unconditional p(doom) while keeping his p(extinction by 2050) roughly the same, but I’d like to actually hear a more detailed discussion with him on this.
My views, approximately:
P(extinction by 2050) = 10%
P(doom) = 60%
P(extinction by 2050 | doom) = 10%/60% = 17%, i.e. way less than 75%.
What links here?
- WilliamKiely's comment on WilliamKiely’s Shortform by WilliamKiely (21 Aug 2025 18:01 UTC; 5 points)

WilliamKiely 16 Aug 2025 18:18 UTC
2 points
0
in reply to: loops’s comment on: Why did interest in “AI risk” and “AI safety” spike in June and July 2025? (Google Trends)
Great observation. If I remove “AI” from the searches, something strange is still happening the last week of July, but it seems you’re right that what’s happening is not AI safety / AI risk specific, so my interest in the phenomenon is now much reduced.

(I don’t know why this is showing up an an Answer rather than just a reply to a comment.)

WilliamKiely

IABIED Misc. Dis­cus­sion Thread

Willi­amKiely’s Shortform

IABIED Misc. Discussion Thread

WilliamKiely’s Shortform