MichaelDickens

Karma: 1,105

MichaelDickens Apr 13, 2025, 6:04 PM
12 points
0
on: MichaelDickens’s Shortform
Is Claude “more aligned” than Llama?

Anthropic seems to be the AI company that cares the most about AI risk, and Meta cares the least. If Anthropic is doing more alignment research than Meta, do the results of that research visibly show up in the behavior of Claude vs. Llama?

I am not sure how you would test this. The first thing that comes to mind is to test how easily different LLMs can be tricked into doing things they were trained not to do, but I don’t know if that’s a great example of an “alignment failure”. You could test model deception but you’d need some objective standard to compare different models on.

And I am not sure how much you should even expect the results of alignment research to show up in present-day LLMs.

MichaelDickens Apr 2, 2025, 11:50 PM
1 point
0
in reply to: Zach Stein-Perlman’s comment on: lc’s Shortform
Hmm I wonder if this is why so many April Fools posts have >200 upvotes. April Fools Day in cahoots with itself?

MichaelDickens Mar 31, 2025, 8:29 PM
1 point
−2
in reply to: Buck’s comment on: Why do many people who care about AI Safety not clearly endorse PauseAI?
isn’t your squiggle model talking about whether racing is good, rather than whether unilaterally pausing is good?

Yes the model is more about racing than about pausing but I thought it was applicable here. My thinking was that there is a spectrum of development speed with “completely pause” on one end and “race as fast as possible” on the other. Pushing more toward the “pause” side of the spectrum has the ~opposite effect as pushing toward the “race” side.

I wish you’d try modeling this with more granularity than “is alignment hard” or whatever
1. I’ve never seen anyone else try to quantitatively model it. As far as I know, my model is the most granular quantitative model ever made. Which isn’t to say it’s particularly granular (I spent less than an hour on it) but this feels like an unfair criticism.
2. In general I am not a fan of criticisms of the form “this model is too simple”. All models are too simple. What, specifically, is wrong with it?
I had a quick look at the linked post and it seems to be making some implicit assumptions, such as
1. the plan of “use AI to make AI safe” has a ~100% chance of working (the post explicitly says this is false, but then proceeds as if it’s true)
2. there is a ~100% chance of slow takeoff
3. if you unilaterally pause, this doesn’t increase the probability that anyone else pauses, doesn’t make it easier to get regulations passed, etc.
I would like to see some quantification of the from “we think there is a 30% chance that we can bootstrap AI alignment using AI; a unilateral pause will only increase the probability of a global pause by 3 percentage points; and there’s only a 50% chance that the 2nd-leading company will attempt to align AI in a way we’d find satisfactory, therefore we think the least-risky plan is to stay at the front of the race and then bootstrap AI alignment.” (Or a more detailed version of that.)

MichaelDickens Mar 31, 2025, 3:44 PM
6 points
0
in reply to: Buck’s comment on: Why do many people who care about AI Safety not clearly endorse PauseAI?
I think it would probably be bad for the US to unilaterally force all US AI developers to pause if they didn’t simultaneously somehow slow down non-US development.

It seems to me that to believe this, you have to believe all of these four things are true:
1. Solving AI alignment is basically easy
2. Non-US frontier AI developers are not interested in safety
3. Non-US frontier AI developers will quickly catch up to the US
4. If US developers slow down, then non-US developers are very unlikely to also slow down—either voluntarily, or because the US strong-arms them into signing a non-proliferation treaty, or whatever
I think #3 is sort-of true and the others are probably false, so the probability of all four being simultaneously true is quite low.

(Statements I’ve seen from Chinese developers lead me to believe that they are less interested in racing and more concerned about safety.)

I made a quick Squiggle model on racing vs. slowing down. Based on my first-guess parameters, it suggests that racing to build AI destroys ~half the expected value of the future compared to not racing. Parameter values are rough, of course.

MichaelDickens Mar 31, 2025, 3:23 PM
1 point
0
in reply to: Davidmanheim’s comment on: Why do many people who care about AI Safety not clearly endorse PauseAI?

That’s kind-of what happened with the anti-nuclear movement, but it ended up doing lots of harm because the things that could be stopped were the good ones!

The global stockpile of nuclear weapons is down 6x since its peak in 1986. Hard to attribute causality but if the anti-nuclear movement played a part in that, then I’d say it was net positive.

(My guess is it’s more attributable to the collapse of the Soviet Union than to anything else, but the anti-nuclear movement probably still played some nonzero role)

MichaelDickens Mar 31, 2025, 3:17 PM
4 points
0
in reply to: Davidmanheim’s comment on: Why do many people who care about AI Safety not clearly endorse PauseAI?
Yeah I actually agree with that, I don’t think it was sufficient, I just think it was pretty good. I wrote the comment too quickly without thinking about my wording.

MichaelDickens Mar 30, 2025, 8:13 PM
24 points
3
on: Why do many people who care about AI Safety not clearly endorse PauseAI?
I feel kind of silly about supporting PauseAI. Doing ML research, or writing long fancy policy reports feels high status. Public protests feel low status. I would rather not be seen publicly advocating for doing something low-status. I suspect a good number of other people feel the same way.

(I do in fact support PauseAI US, and I have defended it publicly because I think it’s important to do so, but it makes me feel silly whenever I do.)

That’s not the only reason why people don’t endorse PauseAI, but I think it’s an important reason that should be mentioned.

MichaelDickens Mar 30, 2025, 7:55 PM
5 points
−4
in reply to: 1a3orn’s comment on: Why do many people who care about AI Safety not clearly endorse PauseAI?

Well—I’m gonna speak broadly—if you look at the history of PauseAI, they are marked by belief that the measures proposed by others are insufficient for Actually Stopping AI—for instance the kind of policy measures proposed by people working at AI companies isn’t enough; that the kind of measures proposed by people funded by OpenPhil are often not enough; and so on.

They are correct as far as I can tell. Can you identify a policy measure proposed by an AI company or an OpenPhil-funded org that you think would be sufficient to stop unsafe AI development?

I think there is indeed exactly one such policy measure, which is SB 1047, supported by Center for AI Safety which is OpenPhil-funded (IIRC), which most big AI companies lobbied against, and Anthropic opposed the original stronger version and got it reduced to a weaker and probably less-safe version.

When I wrote where I was donating in 2024 I went through a bunch of orgs’ policy proposals and explained why they appear deeply inadequate. Some specific relevant parts: 1, 2, 3, 4

Edit: Adding some color so you don’t have to click through– when I say the proposals I reviewed were inadequate, I mean they said things like (paraphrasing) “safety should be done on a completely voluntary basis with no government regulations” and “companies should have safety officers but those officers should not have final say on anything”, and would simply not address x-risk at all, or would make harmful proposals like “the US Department of Defense should integrate more AI into its weapon systems” or “we need to stop worrying about x-risk because it’s distracting from the real issues”.

MichaelDickens Mar 30, 2025, 7:47 PM
5 points
−1
in reply to: 1a3orn’s comment on: Why do many people who care about AI Safety not clearly endorse PauseAI?
If you look at the kind of claims that PauseAI makes in their risks page, you might believe that some of them seem exaggerated, or that PauseAI is simply throwing all the negative things they can find about AI into big list to make it see bad. If you think that credibility is important to the effort to pause AI, then PauseAI might seem very careless about truth in a way that could backfire.

A couple notes on this:
- AFAICT PauseAI US does not do the thing you describe.
- I’ve looked at a good amount of research on protest effectiveness. There are many observational studies showing that nonviolent protests are associated with preferred policy changes / voting patterns, and ~four natural experiments. If protests backfired for fairly minor reasons like “their website makes some hard-to-defend claims” (contrasted with major reasons like “the protesters are setting buildings on fire”), I think that would show up in the literature, and it doesn’t.

MichaelDickens Mar 30, 2025, 7:40 PM
4 points
−2
in reply to: Zach Stein-Perlman’s comment on: Why do many people who care about AI Safety not clearly endorse PauseAI?

B. “Pausing AI” is indeed more popular than PauseAI, but it’s not clearly possible to make a more popular version of PauseAI that actually does anything; any such organization will have strategy/priorities/asks/comms that alienate many of the people who think “yeah I support pausing AI.”

This strikes me as a very strange claim. You’re essentially saying, even if a general policy is widely supported, it’s practically impossible to implement any specific version of that policy? Why would that be true?

For example I think a better alternative to “nobody fund PauseAI, and nobody make an alternative version they like better” would be “there are 10+ orgs all trying to pause AI and they all have somewhat different goals but they’re all generally pushing in the direction of pausing AI”. I think in the latter scenario you are reasonably likely to get some decent policies put into place even if they’re not my favorite.

MichaelDickens Mar 30, 2025, 2:27 PM
1 point
0
in reply to: Zygi Straznickas’s comment on: Tormenting Gemini 2.5 with the [[[]]][][[]] Puzzle
I don’t think you could refute it. I believe you could construct a binary polynomial function that gives the correct answer to every example.

For example it is difficult to reconcile the cases of 3, 12, and 19 using a reasonable-looking function, but you could solve all three cases by defining E E as the left-associative binary operation
```
f(x, y) = -1/9 x^2 + 32/9 x - 22/9 + y
```

MichaelDickens Mar 28, 2025, 2:26 AM
3 points
2
in reply to: Mis-Understandings’s comment on: AI #109: Google Fails Marketing Forever
You could technically say Google is a marketing company, but Google’s ability to sell search ads doesn’t depend on being good at marketing in the traditional sense. It’s not like Google is writing ads themselves and selling the ad copy to companies.

MichaelDickens Mar 28, 2025, 2:25 AM
2 points
1
in reply to: Linch’s comment on: Will Jesus Christ return in an election year?
I believe the correct way to do this, at least in theory, is to simply have bets denominated in the risk-free rate—and if anyone wants more risk, they can use leverage to simultaneously invest in equities and prediction markets.

Right now I don’t know if it’s possible to use margin loans to invest in prediction markets.

MichaelDickens Mar 26, 2025, 10:11 PM
1 point
1
in reply to: MichaelDickens’s comment on: AI “Deep Research” Tools Reviewed
I looked through ChatGPT again and I figured out that I did in fact do it wrong. I found Deep Research by going to the “Explore GPTs” button in the top right, which AFAICT searches through custom modules made by 3rd parties. The OpenAI-brand Deep Research is accessed by clicking the “Deep research” button below the chat input text box.

MichaelDickens Mar 26, 2025, 5:51 PM
3 points
0
in reply to: Daniel Kokotajlo’s comment on: On (Not) Feeling the AGI
I don’t really get the point in releasing a report that explicitly assumes x-risk doesn’t happen. Seems to me that x-risk is the only outcome worth thinking about given the current state of the AI safety field (i.e. given how little funding goes to x-risk). Extinction is so catastrophically worse than any other outcome* that more “normal” problems aren’t worth spending time on.

I don’t mean this as a strong criticism of Epoch, more that I just don’t understand their worldview at all.

*except S-risks but Epoch isn’t doing anything related to those AFAIK

MichaelDickens Mar 25, 2025, 10:16 PM
11 points
3
in reply to: Linch’s comment on: Will Jesus Christ return in an election year?

for example, by having bets denominated in S&P 500 or other stock portfolios rather than $s

Bets should be denominated in the risk-free rate. Prediction markets should invest traders’ money into T-bills and pay back the winnings plus interest.

I believe that should be a good enough incentive to make prediction markets a good investment if you can find positive-EV bets that aren’t perfectly correlated with equities (or other risky assets).

(For Polymarket the situation is a bit more complicated because it uses crypto.)

MichaelDickens Mar 25, 2025, 7:35 PM
1 point
−1
on: AI “Deep Research” Tools Reviewed
Thanks, this is helpful! After reading this post I bought ChatGPT Plus and tried a question on Deep Research:

Please find literature reviews / meta-analyses on the best intensity at which to train HIIT (i.e. maximum sustainable speed vs. leaving some in the tank)

I got much worse results than you did:
- ChatGPT misunderstood my question. Its response answered the question “is HIIT better than MICT for improving fitness”.
- Even allowing that we’re talking about HIIT vs. MICT: I was previously aware of 3 meta-analyses on that question. ChatGPT cited none of those 3 and instead cited 7 other studies, 1 of which was hallucinated, and 3 of which were individual studies not meta-analyses.
- It made some claims but did not say which claims came from which sources, and there are some claims that look like they couldn’t have been in any of the sources (but I didn’t go through all of them).
In fact my results are so much worse that I suspected I did something wrong.

Link to my chat: https://chatgpt.com/share/67e30421-8ef4-8011-973f-2b39f0ae58a4

Last week I asked something similar on Perplexity (I don’t have the chat log saved) and it correctly understood what I wanted, and it reported that there were no studies that answered my question. I believe Perplexity is correct because I also could not find any relevant studies on Google Scholar.

MichaelDickens Mar 24, 2025, 7:43 PM
2 points
0
on: Will Jesus Christ return in an election year?

[Time Value of Money] The Yes people are betting that, later this year, their counterparties (the No betters) will want cash (to bet on other markets), and so will sell out of their No positions at a higher price.

How does this strategy compare to shorting bonds? Both have the same payoff structure (they make money if the discount rate goes up) but it’s not clear to me which is a better deal. I suppose it depends on whether you expect Polymarket investors to have especially high demand for cash.

MichaelDickens Mar 22, 2025, 2:50 PM
2 points
0
in reply to: Mo Putera’s comment on: Mo Putera’s Shortform
I’m glad to hear that! I often don’t hear much response to my essays so it’s good to know you’ve read some of them :)

MichaelDickens Mar 21, 2025, 3:45 PM
4 points
1
in reply to: Mo Putera’s comment on: Mo Putera’s Shortform
I don’t have a mistakes page but last year I wrote a one-off post of things I’ve changed my mind on.