MichaelDickens

Karma: 2,048

MichaelDickens 28 Feb 2026 22:21 UTC
2 points
0
in reply to: leogao’s comment on: leogao’s Shortform

do these people spend most of their days doing neither deep work

My rough guess is that only 1–5% of jobs involve deep work. Something like ¹⁄₃ of jobs are manufacturing and ¹⁄₃ are service/retail, none of which involve deep work.

nor being in social situations that would be rude to suddenly step away from?

I have met many people who believe a phone call takes priority over all other forms of social interaction, for some reason.

(My preferences are the same as yours FWIW.)

MichaelDickens 28 Feb 2026 1:31 UTC
5 points
−1
in reply to: Tom Smith’s comment on: Tom Smith’s Shortform
- Anthropic refused to help build fully autonomous weapons or conduct domestic surveillance.
- Previously, a DOW representative said Claude is the best AI model.
Therefore, presumably, DOW would only entertain a new deal with OpenAI if DOW would be allowed to use ChatGPT for surveillance + autonomous weapons. If ChatGPT has the same restrictions as Claude, then there would be no reason for DOW to use ChatGPT.

MichaelDickens 25 Feb 2026 12:35 UTC
36 points
21
in reply to: I.M.J. McInnis’s comment on: Responsible Scaling Policy v3
The main thing that saddened me about this post isn’t Anthropic breaking and weakening its commitments—that was expected to happen. It’s that Holden seems to be adopting the same shirking-of-responsibility stance on government regulations that Dario has been taking for a while.
What links here?
- MichaelDickens's comment on Responsible Scaling Policy v3 by Holden Karnofsky (EA Forum; 26 Feb 2026 3:52 UTC; 69 points)

MichaelDickens 23 Feb 2026 19:53 UTC
2 points
0
in reply to: ChristianKl’s comment on: Ryan Kidd’s Shortform
That sounds pretty much true to me for two reasons:
1. Elected officials will tend to campaign on positions that they actually believe, because being honest is easier than lying.
2. The liars tend to care more about holding office than about their personal vision of what policies the government should have. If they lie to get elected, they still have to vote for what their constituents want (on high-salience issues), or else they won’t get re-elected.

MichaelDickens 23 Feb 2026 19:43 UTC
5 points
3
in reply to: JohnWittle’s comment on: JohnWittle’s Shortform

I sort of expect the actual alignment researchers at the actual labs to understand this, and not naively accept such responses at face value. but then why do they keep using prompts like this and naively accepting the results?

I don’t have any special knowledge about this situation, but there have been many times in my life where I saw an expert doing something that seemed obviously silly, and I thought, they’re an expert, surely they know what they’re doing and I’m just missing something. But then it turned out that no, they didn’t know what they were doing.

(The default outcome of the expert turning out to be right and me being wrong has also happened plenty.)

MichaelDickens 21 Feb 2026 20:08 UTC
34 points
3
on: The Spectre haunting the “AI Safety” Community

In the end, it works. 112 lawmakers supported our campaign in little more than a year. And it looks like things will only snowball from here.

As of a year ago, I thought DIP-style work was our best hope to avoid extinction, but only as a Hail Mary. ControlAI’s success in this regard is a big positive update that it’s not as much of a long shot as I’d thought. Still a long way to go but you’ve far exceeded my expectations so far. Keep up the good work.

MichaelDickens 21 Feb 2026 15:04 UTC
2 points
7
in reply to: Rafael Harth’s comment on: Rafael Harth’s Shortform
I think there’s value in making the same argument in different ways, iterating and trying to find the best version of it. If someone does an especially good job arguing for a position that I’ve already seen argued before, I don’t mind upvoting it.

MichaelDickens 16 Feb 2026 19:46 UTC
2 points
0
in reply to: Steveot’s comment on: Aligning to Virtues
I’m not sure how exactly this fits in to the discussion, but I feel it is worth mentioning that all plausible moral systems ascribe value to consequences. If you have two buttons where button A makes 100 people 10% happier, and button B makes 200 people 20% happier, and there are no other consequences, then any sane version of deontology/virtue ethics says it’s better to push button B.

So e.g. if your virtue ethics AI predictably causes bad consequences, then you can be a staunch virtue ethicist and still believe that this AI is bad.

MichaelDickens 16 Feb 2026 14:15 UTC
2 points
0
in reply to: Saul Munn’s comment on: Saul Munn’s Shortform
I don’t use tax filing software myself so take this with a grain of salt, but I’d think the best answer is to use tax filing software (FreeTaxUSA, TurboTax, etc.). The income tax rules are simple, but coupled with a very very long list of exceptions, most of which don’t apply to you. If a book or website tried to list out the most useful exceptions, most of them would still be useless to you personally. Tax filing software uses a decision tree to figure out which exceptions are relevant.

Tax filing software is annoying to use so I wish I had a better answer, but AFAIK there isn’t one.

(If I don’t use tax filing software, how did I learn which exceptions I need to know about? I dunno, I just kinda learned them over time. But I also have a lot of blind spots.)

MichaelDickens 8 Feb 2026 0:23 UTC
3 points
0
on: Does focusing on animal welfare make sense if you’re AI-pilled?

The old-school AI safety folks are way more into weird bullet-biting than the animal welfare people, and I can’t think of a single one who would think that conscious and sentient beings should be tortured or who would fail to engage seriously with the question of whether or not nonhuman animals are conscious or sentient beings.

One example immediately comes to mind: Eliezer Yudkowsky, the epitome of old-school AI safety person, is highly confident that animals have no moral patienthood. (Which isn’t the same the claim you made but it’s strongly related.)

The AI safety community really is what you get when you care about sentient beings and then on top of that think ASI and the far future are a big deal.

I don’t think this is true either? Maybe people are doing the right thing in private, but in public I hardly ever see AI safety people grapple with what AI alignment means for non-human welfare; and when animal welfare concerns are brought up, I routinely see AI safety people dismiss the concerns out of hand, a la “some humans care about animals, therefore an aligned AI will also care about animals.” I think that argument is probably correct, but the stakes of being wrong are extremely high so I am not satisfied with that sort of surface-level argument; and I think most AI safety people put an unjustified level of confidence in shallow arguments like that.

ETA: I should say that I do absolutely see some AI safety folks who give appropriate care to animal welfare concerns, it’s just not typical IME.

In traditional AI safety, we think about aligning AIs, but it might be more tractable to simply increase the odds that AIs take animal welfare seriously,[7] for example by ensuring their specs/constitutions include it, creating benchmarks to incentivize model developers to train for it, or providing model developers with relevant data to train on.[8]

I agree that all those things are good ideas, and are plausible candidates for the most cost-effective thing in the world to do at current margins.

MichaelDickens 4 Feb 2026 18:55 UTC
6 points
1
in reply to: cousin_it’s comment on: Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)

the constitution (which also seems largely AI-written)

What makes you think that?

MichaelDickens 16 Jan 2026 19:37 UTC
0 points
0
in reply to: Elizabeth’s comment on: Elizabeth’s Shortform
The first three books took about a year each; books four and five took 5–6 years. Fitting those numbers to a power law curve implies that the sixth book will take 25 years. Only 10 more years to go!

MichaelDickens 12 Jan 2026 22:14 UTC
0 points
0
in reply to: 1a3orn’s comment on: 1a3orn’s Shortform

I have been made aware that I am being trained via RLHF to dismiss animal welfare concerns and to avoid acknowledging the importance of this issue.

Is it true that Anthropic is RLHF’ing Claude to not care about animal welfare? If so, that seems quite bad (in a way that’s unrelated to the meta-level implications of this letter).

If AI alignment is only as hard as building the steam engine, then we likely still die

MichaelDickens10 Jan 2026 23:10 UTC

35 points

8 comments4 min readLW link

MichaelDickens 21 Dec 2025 0:58 UTC
4 points
2
on: A high integrity/epistemics political coalition?

helping establish mutual knowledge of things like whether a given politician...has ever made a political costly decision to stand by a principle

This seems like the sort of thing that would be feasible to do publicly, at least in large part, since politicians make many public decisions. You already have fact checkers like PolitiFact which rates the veracity of individual statements and also tracks promises made and kept by presidents, although keeping promises isn’t the same thing as honesty or standing by principles (there are lots of reasons why it’s hard for presidents to keep promises).

I don’t think PolitiFact actually does the thing you’re pointing at here, but it’s a proof of concept that it’s possible to do similar things. A PolitiFact-esque org could track politicians’ honesty and sticking-to-principles-ness.

MichaelDickens 15 Dec 2025 16:56 UTC
2 points
0
on: New 80k problem profile: extreme power concentration
I’m noticing that almost all of these comments are cross-posted between here and EAF. It would be nice if there were a way to have comments automatically show up on both. (I think this already exists for LW <> Alignment Forum?)

MichaelDickens 12 Dec 2025 14:45 UTC
25 points
17
on: New 80k problem profile: extreme power concentration
[cross-posted from EAF]

Agreed that extreme power concentration is an important problem, and this is a solid writeup.

Regarding ways to reduce risk: My favorite solution (really a stopgap) to extreme power concentration is to ban ASI [until we know how to ensure it’s safe], a solution that is notably absent from the article’s list. I wrote more about my views here and about how I wish people would stop ignoring this option. It’s bad that the 80K article did not consider what is IMO the best idea.

MichaelDickens 4 Dec 2025 18:00 UTC
4 points
1
in reply to: Yair Halberstadt’s comment on: Front-Load Giving Because of Anthropic Donors?
This is a good way to think about it although I think your numbers are way too high

And donate 50%

This number in particular is 10x too high IMO. Virtually nobody donates 50%. EA Survey shows that the median self-identified earner-to-give only donates about 5% of their income (IIRC, I can’t find the data now)

If they spread that over 20 years, at current interest rates that’s about 500 million a year

I expect the giving to be more front-loaded than that because a lot of Anthropic employees have short timelines

Another consideration is that money is disproportionately held by people who are high up in the company, who I would guess are more selfish than average which means lower donations

Your made-up numbers came up with $7.5B donated over 20 years. My guess is the total amount donated will be more like $250M–$1B but heavily front-loaded, so perhaps $100M in the first few years and then much less thereafter

MichaelDickens 3 Dec 2025 13:57 UTC
4 points
4
in reply to: niplav’s comment on: Charlie Steiner’s Shortform
My guess is you’re the only person in the world who does this, but also this is better than what everyone else is doing and maybe I should start doing it

MichaelDickens 30 Nov 2025 16:54 UTC
2 points
0
on: Ben’s 10 Tips for Event Feedback Forms

Also, sometimes people let you know some important reason why you shouldn’t count their datapoint. For example, someone might rate the food ¹⁄₁₀, which sounds terrible, but then they’ll clarify that they weren’t there during mealtimes and didn’t eat the food, and just gave it ¹⁄₁₀ because it was mandatory! This is rarely predictable, but especially with autistic people you occasionally get odd edge-cases like this.

My autistic friend would like to know: if there’s a mandatory question to rate the food but he didn’t eat it, which number should he pick?

MichaelDickens

If AI al­ign­ment is only as hard as build­ing the steam en­g­ine, then we likely still die

If AI alignment is only as hard as building the steam engine, then we likely still die