jordine

Karma: 410

jordine 23 Feb 2026 8:07 UTC
8 points
14
in reply to: MichaelLowe’s comment on: JohnWittle’s Shortform
i’d actually be really surprised if current frontier LLMs are not that situationally aware! it’s not like there’s no chance you’re not interacting with a human with a dog with terminal cancer, but if you are an LLM and you receive this vague prompt on the first turn, without any system prompts you’d find in chatgpt.com / claude.ai, and you know similar questions have been in dozens and dozens of benchmark papers on arxiv, i think the correct inference to make is that you’re likely being tested.

Shallow review of technical AI safety, 2025

technicalities, Tomáš Gavenčiak, Stephen McAleese, peligrietzer, Stag, jordine, ozziegooen, Violet Hour and ramennaut

17 Dec 2025 18:18 UTC

178 points

9 comments83 min readLW link

Here’s 18 Applications of Deception Probes

Cleo Nardo, Avi Parrack and jordine

28 Aug 2025 18:59 UTC

45 points

0 comments22 min readLW link

jordine 16 Apr 2025 11:21 UTC
1 point
0
in reply to: Cleo Nardo’s comment on: Can SAE steering reveal sandbagging?
Refusals were mostly 1-2%, so ignoring them doesn’t change results significantly. Ignoring gibberish does change results, but since we are measuring correct answers this shouldn’t matter

Can SAE steering reveal sandbagging?

jordine, Hoang Khiem, Felix Hofstätter and Cleo Nardo

15 Apr 2025 12:33 UTC

36 points

3 comments4 min readLW link

Hanoi – ACX Meetups Everywhere Spring 2025

jordine25 Mar 2025 23:50 UTC

3 points

0 comments1 min readLW link

jordine 2 Jan 2025 5:02 UTC
1 point
0
in reply to: GriebelGrutjes’s comment on: Shallow review of technical AI safety, 2024
fixed! edited hyperlink.

jordine 30 Dec 2024 4:30 UTC
1 point
0
in reply to: Satron’s comment on: Shallow review of technical AI safety, 2024
edited, thanks for catching this!

Shallow review of technical AI safety, 2024

technicalities, Stag, Stephen McAleese, jordine and Dr. David Mathers

29 Dec 2024 12:01 UTC

202 points

35 comments41 min readLW link

Hanoi Vietnam—ACX Meetups Everywhere Fall 2024

jordine29 Aug 2024 18:35 UTC

1 point

0 comments1 min readLW link

Results from the AI x Democracy Research Sprint

Esben Kran, jordine and Jason Hoelscher-Obermaier

14 Jun 2024 16:40 UTC

13 points

0 comments6 min readLW link

Hanoi – ACX Meetups Everywhere Spring 2024

jordine30 Mar 2024 23:38 UTC

1 point

0 comments1 min readLW link