Ian McKenzie

Karma: 429

Ian McKenzie 8 May 2026 1:40 UTC
3 points
0
in reply to: Gordon Seidoh Worley’s comment on: How Go Players Disempower Themselves to AI

It’s pretty normal, historically, to ask a more senior engineer to figure out how to solve a problem, and then a more junior engineer implements it.

But here, the AI also plays the role of the junior engineer – you don’t actually have to engage with the details of the chosen solution, and by doing so absorb the understanding of why it’s the best approach. You can just say “yeah do that”.

Ian McKenzie 13 Apr 2026 5:05 UTC
10 points
0
on: Ian McKenzie’s Shortform
I find it hard to imagine just how different my entire experience of the world and my place in it would be if I had a different worldview. I’ve never really believed in God, definitely not seriously, but surely everything would feel different if I thought it was all for the best, or part of a great plan, or only temporary, on my way to something more real and important. As opposed to believing that it’s all arbitrary and there is no reason behind existence, no design.

Or if I thought that the best days of humanity were long in the past, and that we’d never again reach the heights achieved by our ancestors. Or I was in the middle ages and thought the Greeks were the best we’d ever do. As opposed to living at the middle of an exponential increase in GDP, and progress is commonplace.

Or I thought that I’d live a normal life, and planning for retirement made sense, or even being able to answer the question of where I want to see myself in five years. As opposed to expecting a singularity to either destroy or transform the world before I’m 40.

How much do these background beliefs control what we can think, or what we can feel?

Defending Habit Streaks

Ian McKenzie6 Apr 2026 4:34 UTC

8 points

0 comments3 min readLW link

Ian McKenzie 23 Mar 2026 0:22 UTC
2 points
0
on: Ian McKenzie’s Shortform
It seems like the switch where high-status brands went from making high quality products to featuring their name and logo prominently was good for both producer and consumer: the producer gets free advertising, and the consumer more clearly signals their wealth. I don’t know the history of this, but I’m interested in how this happened – seems like there’s a tough coordination problem, where if one brand/consumer switches then they just look tacky.

(Not all brands have switched, and maybe there’s a new money/old money difference.)

Ian McKenzie 4 Mar 2026 17:50 UTC
2 points
0
in reply to: habryka’s comment on: Responsible Scaling Policy v3
Drake is right, sorry for the confusion. We were not intentionally misleading – we missed a footnote on the announcement when putting together the initial tweet thread that narrowed the claim to just bio rather than CBRN, as discussed in the rest of the announcement. We did later find a vulnerability that allowed us to bypass the filters in the bio setting, reported it, and it was patched. I think that follow-up work took more on the order of 40 person-hours, but was a general method that could extract information in a range of settings. I don’t know how likely it is that there are further such vulnerabilities.

Even if single-query jailbreaking was O(10) hours though, having to send many queries to discover that jailbreak makes it much easier to catch through monitoring.

Ian McKenzie 22 Feb 2026 0:45 UTC
12 points
4
on: Ian McKenzie’s Shortform
Some infectious disease graphs I would like to see but haven’t been able to find:
- Effectiveness of covid and flu vaccines vs time since injection.
- Contagiousness with cold/flu/covid vs time since infection.
- Decrease in infection probability vs distance from infected person.

Ian McKenzie 16 Feb 2026 2:53 UTC
1 point
0
in reply to: Raemon’s comment on: Ian McKenzie’s Shortform

probably most patterns that evolved naturally don’t successfully navigate superintelligence well and I’m not sure it’s the right standard for them.

I’m not sure what you mean by standard, but navigating superintelligence well is something I care a lot about. So it seems like a reasonable thing to criticize a system for, and it would be great if we found a pattern that did navigate it well (even if finding or switching to another one is very hard).

Ian McKenzie 3 Feb 2026 3:57 UTC
6 points
0
in reply to: AlphaAndOmega’s comment on: Ian McKenzie’s Shortform
Under this view you can totally have intermediary metrics, they just look more like “how much does your society avoid tragedies of the commons” rather than “what is the median quality of life”.

To be clear, this post was not intended as a subtle endorsement of communism. I agree with MondSemmel’s point that basically any system which produced slower economic growth would probably do better under this view, if only because AI development is slower.

Ian McKenzie 2 Feb 2026 0:02 UTC
57 points
11
on: Ian McKenzie’s Shortform
Maybe the most important test for a political or economic system is whether it self-destructs. This is in contrast to whether it produces good intermediate outcomes. In particular, if free-market capitalism leads to an uncontrolled intelligence explosion, then it doesn’t matter if it produced better living standards than alternative systems for ~200 years – it still failed at the most important test.

A couple of other ways to put it:
- Would the US economic/political system pass the Great Filter?
- Would Norway do an intelligence explosion?
Under this view, political/economic systems that produce less growth but don’t create the incentives for unbounded competition are preferred. Sadly, for Molochian reasons this seems hard to pull off.

Ian McKenzie 25 Jan 2026 19:30 UTC
7 points
6
on: To be well-calibrated is to be punctual
I like this idea. As I tried to be more organized and less late to things, I implicitly did something like this, and this is a nice framing of that process.

I do think the asymmetry of the consequences distorts the updates I make a little though – since I am trying hard not to be late, I sometimes leave an unreasonable amount of buffer. I was once 45 minutes early to an appointment because I was taking public transport to an unfamiliar part of the city. I find it harder to make an update based on being early, because I don’t know the variance – if I’m late (and I was trying hard not to be), then I clearly underestimated the worst case, but if I’m early then I could have just got lucky.

Ian McKenzie 18 Jan 2026 18:42 UTC
1 point
0
on: If AI alignment is only as hard as building the steam engine, then we likely still die
I think there are multiple ways of interpreting “alignment is as difficult as X”. There’s “the safety issues in building AGI are similar to the safety issues in building X”, but there’s also “solving the safety issues in building AGI takes the same level of total effort as building X”.

I interpreted Chris Olah’s graph as the latter – that the ‘steam engine world’ is a world where solving AI safety takes as much total effort as building the steam engine, agnostic of how that effort is spent. NOT that in those worlds, you solve AI safety issues in the same way that you solve steam engine safety issues.

Put another way, I was imagining the graph as primarily quantitative – you could crudely replace the x-axis with “# person-hours”.

Ian McKenzie 11 Jan 2026 21:00 UTC
24 points
11
on: Ian McKenzie’s Shortform
John Wentworth says:

You will never find a $100 bill on the floor of Grand Central Station at rush hour, because someone would have picked it up already.

Are you really less likely to find $100 in Grand Central Station than finding $100 anywhere else? It’s true that there are many more people who could find it before you, but there are also many more people that could drop $100. If you imagine a 1D version, where everyone walks through either Grand Central Station or a quiet alley along the same line, one after the other, then it seems like you should be equally likely to find $100 in either case – if the person in front of you in the line drops $100.

Ian McKenzie’s Shortform

Ian McKenzie11 Jan 2026 21:00 UTC

4 points

25 comments1 min readLW link

Ian McKenzie 2 Jan 2026 22:01 UTC
4 points
2
in reply to: Ben Pace, the Vacationing Vagabond’s comment on: The Company Man

There are many ideas in here that I’ve heard said offhand but never really dived into, and there’s something very informative and satisfying about seeing them painted in detail. Similar to the difference between a one-sentence description of a painting, and the actual painting.

It’s satire though, it conveys some vibes but the detail that’s being painted is not an accurate portrayal of the actual detail that exists. Put another way, I think it would be a mistake if you encountered someone in real life and thought “ah yes, that’s ‘the person who feels they have had a meditation/emotional insight and this has gotten them over the intellectual hump of not building killing machines’ from The Company Man, I know how they think”.

Ian McKenzie 16 Dec 2025 21:17 UTC
5 points
0
on: How Colds Spread
I wonder what the driving factor of transmission is before symptoms emerge. If everyone was very careful about following good practices once they were obviously sick (e.g. wearing masks, sanitizing their hands after blowing nose or any contact with face), what would we have to do to prevent the spread? Do fomites become more important if you aren’t coughing or sneezing yet?

Ian McKenzie 11 Jul 2025 2:04 UTC
2 points
0
on: The bitter lesson of misuse detection
Our paper on defense in depth (STACK) found similar results – similarly-sized models with a few-shot prompt significantly outperformed the specialized guard models, even when adjusting for FPR on benign queries.

Layered AI Defenses Have Holes: Vulnerabilities and Key Recommendations

smallsilo, Ian McKenzie, Oskar Hollinsworth, Tom Tseng, Xander Davies, scasper, Aaron Tucker, Robert Kirk and Adam Gleave

4 Jul 2025 0:07 UTC

13 points

1 comment4 min readLW link

(far.ai)

Does robustness improve with scale?

ChengCheng, niki.h, Ian McKenzie, Oskar Hollinsworth, Tom Tseng and AdamGleave

25 Jul 2024 20:55 UTC

14 points

0 comments1 min readLW link

(far.ai)

Ian McKenzie 8 Aug 2023 2:07 UTC
5 points
0
in reply to: Honglu Fan’s comment on: Password-locked models: a stress case for capabilities evaluation
One thing is that even given access to the model weights and the code behind the API, you could not tell if the model was password-locked, whereas you would see the hardcoded verifier. Thus if a lab wanted to hide capabilities they could delete the training data and you would have no way of knowing.

Ian McKenzie 21 Feb 2023 0:12 UTC
3 points
0
in reply to: Gunnar_Zarncke’s comment on: Russell Conjugations list & voting thread
The Wikipedia article has a typo in one of these: it should say “I am sparkling; you are unusually talkative; he is drunk.” (as in the source)

Ian McKenzie

Defend­ing Habit Streaks

Ian McKen­zie’s Shortform

Lay­ered AI Defenses Have Holes: Vuln­er­a­bil­ities and Key Recommendations

Does ro­bust­ness im­prove with scale?

Defending Habit Streaks

Ian McKenzie’s Shortform

Layered AI Defenses Have Holes: Vulnerabilities and Key Recommendations

Does robustness improve with scale?