John Steidley

Karma: 229

John Steidley 9 Feb 2026 17:45 UTC
1 point
0
in reply to: Eli Tyre’s comment on: Reformative Hypocrisy, and Paying Close Enough Attention to Selectively Reward It.
“before the scaling hypothesis was really formulated” When do you think that was? Deep Blue obviously used a lot of scaling. So did AlexNet.

John Steidley 7 Jan 2026 16:19 UTC
1 point
0
in reply to: AlexMennen’s comment on: Help keep AI under human control: Palisade Research 2026 fundraiser
Yes, matching goes until end of March (2026).
All donations made to Palisade in this period will get matched (until funding runs out). However, every.org also has a concept of “a fundraiser”, which is where the “raised offline” number is from. That’s just a representation of other donations made during the matching period. Funds donated other ways will still get matched, even if they’re “raised offline”.

John Steidley 7 Jan 2026 8:06 UTC
3 points
−1
in reply to: Boaz Barak’s comment on: Turning 20 in the probable pre-apocalypse
I would appreciate if you quantified “extremely likely” here. Your downstream post has gotten a lot of attention, but I first encountered it after reading this comment. To a significant extent, my reaction to your post, and especially its title, are colored by this prediction here.

John Steidley 1 Oct 2025 1:39 UTC
17 points
13
in reply to: 1a3orn’s comment on: Why Corrigibility is Hard and Important (i.e. “Whence the high MIRI confidence in alignment difficulty?”)
(I work at Palisade)

I claim that your summary of the situation between Neel’s work and Palisade’s work is badly oversimplified. For example, Neel’s explanation quoted here doesn’t fully explain why the models sometimes subvert shutdown even after lots of explicit instructions regarding the priority of the instructions. Nor does it explain the finding that moving instructions from the user prompt to the developer prompt actually /increases/ the behavior.

Further, that CoT that Neel quotes has a bit in it about “and these problems are so simple”, but Palisade also tested whether using harder problems (from AIME, iirc) had any effect on the propensity here and we found almost no impact. So, it’s really not as simple as just reading the CoT and taking the model’s justifications for its actions at face value (as Neel, to his credit, notes!).

Here’s a twitter thread about this involving Jeffrey and Rohin: https://x.com/rohinmshah/status/1968089618387198406

Here’s our full paper that goes into a lot of these variations: https://arxiv.org/abs/2509.14260

John Steidley 6 Jul 2024 7:28 UTC
19 points
17
in reply to: Bird Concept’s comment on: 80,000 hours should remove OpenAI from the Job Board (and similar EA orgs should do similarly)
Because it’s obviously annoying and burning the commons. Imagine if I made a bot that posted the same comment on every post of less wrong, surely that wouldn’t be acceptable behavior.

John Steidley 18 Feb 2024 18:15 UTC
8 points
13
on: Intuition for 1 + 2 + 3 + … = −1/12
The finish was quite a jump for me. I guess I could go and try to stare at your parenthesis and figure it out myself, but mostly I feel somewhat abandoned at that step. I was excited when I found 1, 2, 4, 8… = −1 to be making sense, but that excitement doesn’t quite feel sufficient for me to want to decode the relationships between the terms in those two(?) patterns and all the relevant values

John Steidley 3 Aug 2023 10:54 UTC
1 point
0
on: “Is There Anything That’s Worth More”
Zack, the second line of your quoted lyrics should be “I guess *we already...”

John Steidley 20 Apr 2023 0:32 UTC
1 point
0
in reply to: Eli Tyre’s comment on: 3 Levels of Rationality Verification
I’m currently one of the four members of the core team at CFAR (though the newest addition by far). I also co-ran the Prague Workshop Series in the fall of 2022. I’ve been significantly involved with CFAR since its most recent instructor training program in 2019.

I second what Eli Tyre says here. The closest thing to “rationality verification” that CFAR did in my experience was the 2019 instructor training program, which was careful to point out it wasn’t verifying rationality broadly, just certifying the ability to teach one specific class.

John Steidley 11 Oct 2021 20:45 UTC
1 point
0
in reply to: Randomized, Controlled’s comment on: NVIDIA and Microsoft releases 530B parameter transformer model, Megatron-Turing NLG
I wasn’t replying to Quintin

John Steidley 11 Oct 2021 18:36 UTC
1 point
0
in reply to: TheSupremeAI’s comment on: NVIDIA and Microsoft releases 530B parameter transformer model, Megatron-Turing NLG
I can’t tell what you mean. Can you elaborate?

John Steidley 11 Nov 2020 9:10 UTC
1 point
0
in reply to: Oscar_Cunningham’s comment on: Did anybody calculate the Briers score for per-state election forecasts?
I think this comment would be better placed as a reply to the post that I’m linking. Perhaps you should put it there?

John Steidley 10 Nov 2020 18:37 UTC
7 points
0
on: Did anybody calculate the Briers score for per-state election forecasts?
https://www.lesswrong.com/posts/muEjyyYbSMx23e2ga/scoring-2020-u-s-presidential-election-predictions

John Steidley 4 Nov 2020 22:45 UTC
24 points
0
on: Gifts Which Money Cannot Buy
My summary: Give gifts using the parts of your world-model that are strongest. Usually the answer isn’t going to end up being based on your understanding of their hobby.

John Steidley 2 Oct 2020 1:32 UTC
4 points
0
in reply to: habryka’s comment on: A simple device for indoor air management
Window AC units don’t actually pull air from outside.

https://homeairguides.com/how-does-a-window-air-conditioner-work/

John Steidley 2 Oct 2020 1:28 UTC
4 points
0
on: A simple device for indoor air management
Hey, I’ve been looking into air quality quite a bit recently. I have several questions.

What air quality sensor are you using? How are you getting outdoor data?

I suspect some of the confusion in the results may be due to circulation within the home and monitor placement. Have you thought much about circulation?

Additionally, it looks like indoor PM2.5 is tracking outdoor PM2.5. Have you thought much about other sources of ventilation?

John Steidley 1 Sep 2020 4:20 UTC
LW: 3 AF: 2
0
AF
in reply to: oceaninthemiddleofanisland’s comment on: interpreting GPT: the logit lens
It doesn’t sound hard at all. The things Gwern is describing are the same sort of thing that people do for interpretability where they, eg, find an image that maximizes the probability of the network predicting a target class.
Of course, you need access to the model, so only OpenAI could do it for GPT-3 right now.

John Steidley 25 May 2020 20:20 UTC
5 points
0
in reply to: TurnTrout’s comment on: TurnTrout’s shortform feed
I’ve was thinking along similar lines!
From my notes from 2019-11-24: “Deontology is like the learned policy of bounded rationality of consequentialism”

John Steidley 23 May 2020 23:22 UTC
1 point
0
in reply to: Radio Bob’s comment on: Open & Welcome Thread—December 2019
Welcome!