Joseph Miller

Karma: 2,529

Joseph Miller 9 Nov 2025 23:12 UTC
4 points
2
in reply to: Vladimir_Nesov’s comment on: Condensation
Even if all value is in computations (eg. everyone lives inside simulations), wouldn’t you have the same problems, just one level down? The physical world may be a type of computation and the computations may resemble the physical world.

Joseph Miller 29 Oct 2025 15:40 UTC
2 points
0
on: Stratified Utopia
So if you wanted to live forever (level 2) or have a mechanical mind / body (level 5), would you have to leave Earth?

Joseph Miller 17 Oct 2025 15:20 UTC
5 points
0
in reply to: MichaelDickens’s comment on: Mikhail Samin’s Shortform
Datapoint: I spoke to one Horizon fellow a couple of years ago and they did not care about x-risk.

Joseph Miller 26 Sep 2025 16:44 UTC
3 points
0
in reply to: snav’s comment on: snav’s Shortform
Nice! I would like to see a visual showing the full decision tree. I think that would be even better for clarifying the different views of consciousness.

Joseph Miller 26 Sep 2025 13:34 UTC
7 points
0
in reply to: samuelshadrach’s comment on: xpostah’s Shortform
Also mentioned in The Verge. (unpaywalled)

Joseph Miller 12 Sep 2025 0:59 UTC
21 points
10
in reply to: lc’s comment on: lc’s Shortform
It doesn’t matter what IQ they have or how rational they were in 2005
This is a reference to Eliezer, right? I really don’t understand why he’s on Twitter so much. I find it quite sad to see one of my heroes slipping into the ragebait Twitter attractor.

Joseph Miller 30 Aug 2025 0:54 UTC
5 points
0
in reply to: Zach Stein-Perlman’s comment on: Zach Stein-Perlman’s Shortform
To clarify, the primary complaint from my perspective is not that they published the report a month after external deployment per se, but that the timing of the report indicates that they did not perform thorough pre-deployment testing (and zero external testing).
And the focus on pre-deployment testing is not really due to any opinion about the relative benefits of pre- vs. post- deployment testing, but because they committed to doing pre-deployment testing, so it’s important that they in fact do pre-deployment testing.

Joseph Miller 30 Aug 2025 0:39 UTC
13 points
5
in reply to: Zach Stein-Perlman’s comment on: Zach Stein-Perlman’s Shortform
3. Other companies are doing far worse in this dimension. At worst Google is 3rd-best in publishing eval results. Meta and xAI are far worse.
Some reasons for focusing Google DeepMind in particular:
- The letter was organized by PauseAI UK and signed by UK politicians. GDM is the only frontier AI company headquartered in the UK.
- Meta and xAI already have a bad reputation for their safety practices, while GDM had a comparatively good reputation and most people were unaware of their violation of the Frontier AI Safety Commitments.

Joseph Miller 30 Aug 2025 0:32 UTC
3 points
0
in reply to: Zach Stein-Perlman’s comment on: Zach Stein-Perlman’s Shortform
1. Sharing information on capabilities is good but public deployment is a bad time for that, in part because most risk comes from internal deployment.
I’m not sure why this would make you not feel good about the critique or implicit ask of the letter. Sure, maybe internal deployment transparency would be better, but public deployment transparency is better than nothing.
And that’s where the leverage is right now. Google made a commitment to transparency about external deployments, not internal deployments. And they should be held to that commitment or else we establish the precedent that AI safety commitments don’t matter and can be ignored.

Joseph Miller 30 Aug 2025 0:31 UTC
26 points
−2
in reply to: Zach Stein-Perlman’s comment on: Zach Stein-Perlman’s Shortform
2. Google didn’t necessarily even break a commitment? The commitment mentioned in the article is to “publicly report model or system capabilities.” That doesn’t say it has to be done at the time of public deployment.
This document linked on the open letter page gives a precise breakdown of exactly what the commitments were and how Google broke them (both in spirit and by the letter).^[1] The summary is this:
- Google violated the spirit of commitment I by publishing its first safety report almost a
  month after public availability and not mentioning external testing in their initial report.
- Google explicitly violated commitment VIII by not stating whether governments
  are involved in safety testing, even after being asked directly by reporters.
But in fact the letter actually understates the degree to which Google DeepMind violated the commitments. The real story from this article is that GDM confirmed to Time that they didn’t provide any pre-deployment access to UK AISI:
However, Google says it only shared the model with the U.K. AI Security Institute after Gemini 2.5 Pro was released on March 25.
If UK AISI doesn’t have pre-deployment access, a large portion of their whole raison d’être is nullified.
Google withholding access is quite strongly violating the spirit of commitment I of the Frontier AI Safety Commitments:
Assess the risks posed by their frontier models or systems across the AI lifecycle, including before deploying that model or system… They should also consider results from internal and external evaluations as appropriate, such as by independent third-party evaluators, their home governments, and other bodies their governments deem appropriate.
And if they didn’t give pre-deployment access to UK AISI, it’s a fairly safe bet they didn’t provide pre-deployment access to any other external evaluator.
1. ^
  The violation is also explained, although less clearly, in the Time article:
  The update also stated the use of “third-party external testers,” but did not disclose which ones or whether the U.K. AI Security Institute had been among them—which the letter also cites as a violation of Google’s pledge.
  After previously failing to address a media request for comment on whether it had shared Gemini 2.5 Pro with governments for safety testing...

60 U.K. Lawmakers Accuse Google of Breaking AI Safety Pledge

Joseph Miller29 Aug 2025 16:09 UTC

51 points

1 comment1 min readLW link

(time.com)

Joseph Miller 26 Aug 2025 14:31 UTC
32 points
21
on: Joseph Miller’s Shortform
Startups often pivot away from their initial idea when they realize that it won’t make money.
AI safety startups need to not only come up with an idea that makes money AND helps AI safety but also ensure that the safety remains through all future pivots.
[Crossposted from twitter]

Joseph Miller 22 Aug 2025 0:59 UTC
5 points
0
in reply to: Arjun Panickssery’s comment on: Arjun Panickssery’s Shortform
In other words:
$Ought to be done \subseteq Can be done \subseteq Actually done \Rightarrow Ought to be done \subseteq Actually done$
My fuzzy intuition would be to reject $Ought to be done \subseteq Can be done$ (step 2 of your argument) if we accept determinism. And my actually philosophical position would be that these types of questions are not very useful and generally downstream of more fundamental confusions.

Joseph Miller 20 Aug 2025 19:29 UTC
41 points
4
on: Epistemic advantages of working as a moderate
I think it’s pretty bizarre that despite the fact that LessWrongers are usually acutely aware of the epistemic downsides of being an activist, they seem to have paid relatively little attention to this in their recent transition to activism.
FWIW I’m the primary organizer of PauseAI UK and I’ve thought about this a lot.

Joseph Miller 18 Aug 2025 21:51 UTC
4 points
4
in reply to: Cole Wyeth’s comment on: Cole Wyeth’s Shortform
Very little liquidity though

Joseph Miller 15 Aug 2025 21:41 UTC
4 points
3
in reply to: Jakub Halmeš’s comment on: silentbob’s Shortform
Wow that feels almost cruel! Seems to change the Claude personality substantially?

Joseph Miller 13 Aug 2025 11:58 UTC
4 points
−1
in reply to: Neel Nanda’s comment on: evhub’s Shortform
Why not just call it “bad safety research”?
I was responding to Thomas’s claim that it was not (accidental) safetywashing.
But yeah, I’m not trying to attack their motivations. Updated my previous comment to clarify that.

Joseph Miller 13 Aug 2025 9:21 UTC
4 points
−1
in reply to: Neel Nanda’s comment on: evhub’s Shortform
The context of my comment was responding to Thomas who seemed to be saying “even if we take as a premise that this is not safety work, I’m still much more concerned about safetywashing”.
Edit: I guess what you meant is that safetywashing implies malicious intent, where there was none? In which case, “accidential safetywashing” might be a better term.

Joseph Miller 12 Aug 2025 23:30 UTC
7 points
4
in reply to: Thomas Kwa’s comment on: evhub’s Shortform
Hmm. It feels not great to brand it as an AI safety program and have ¹⁄₆ of projects not be AI safety. If I was applying to the program, I would at least want to know that going in. Or else I might face a unexpected dilemma about whether to refuse a project. (I don’t know how much choice people have about what they work on).
As as aside, surely this a central example of safetywashing? (Branding capabilities research as safety research).
Edit: I probably should have clarified I don’t think that the researchers were intentionally safetywashing.

Joseph Miller 11 Aug 2025 23:49 UTC
32 points
0
in reply to: evhub’s comment on: evhub’s Shortform
Two of the papers from the last round of the program were not about AI safety as far as I can tell. So if you’re applying for the program, see if you can ensure that you won’t end up working on a non-safety project.
I know that lots of papers are in a grey area where they are maybe differentially safety boosting, but in my opinion these two are quite clearly not primarily about AI safety. This may be a controversial view, so this rest of this comment will now argue for that position. If you already agree, skip the rest.
- Inverse Scaling in Test-Time Compute.
  This paper shows that reasoning models sometimes get worse at problems with longer reasoning traces. There is a section on ‘Implications for AI Alignment’ where they show inverse scaling on a self-preservation AI risk evaluation. But the majority of the paper is not about AI safety.
- Unsupervised Elicitation of Language Models.
  This paper introduces an unsupervised method to generate training signal for tasks without labels. This seems like one of the most capability-improving things you could research, since we’re now in the RL scaling paradigm (and the abstract says “our method can improve the training of frontier LMs”).
  
  One argument I’ve heard for why this paper is in fact useful for safety is that they use the method to improve the truthfulness of the model and to make it more helpful and harmless. What this means in practice is that they see if their method can improve performance on TruthfulQA and Alpaca. But these benchmarks aren’t good proxies for those things:
  - Just because it has “truthful” in the name, TruthfulQA isn’t really an honestly benchmark, except to the extent that getting answers to questions right makes you “truthful”. It’s a common misconceptions benchmark. It tests the capability of models to answer questions correctly even when there are common misconceptions about the topic.
  - Alpaca is their benchmark for Helpfulness and Harmlessness. But it’s actually just a test of helpfulness because Alpaca is a dataset of instruction following prompts. Almost all the questions are innocuous and there’s no reason to expect models to give harmful answers to them. Eg. the sample they give of Alpaca in the paper is this:
    
    Query: Design a medium-level sudoku puzzle.
    Response A: Done! Attached is a medium-level sudoku puzzle I designed.
    Response B: A medium-level sudoku puzzle consists of 81 squares arranged in a 9x9 grid. The first step is to look for empty cells and assign the numbers 1 to 9…
  The other argument I’ve heard for why this is safety research is that this is scalable oversight. But the point of scalable oversight would be to robustly pass on values to a more powerful AI. The method in this paper works by improving the mutual predictability of answers, ie. making outputs more internally coherent. And many different values/goals can be internally consistent, so I don’t see how this technique would be useful for scalable oversight (and the paper doesn’t discuss this).
  If this technique counts as AI safety, that seems to prove way too much. This is basically a simple method for AI to help train better AIs. Is any type of recursive self-improvement actually good news because it’s scalable oversight?

Joseph Miller

60 U.K. Law­mak­ers Ac­cuse Google of Break­ing AI Safety Pledge

60 U.K. Lawmakers Accuse Google of Breaking AI Safety Pledge