RSS

Si­tu­a­tional Awareness

TagLast edit: 6 Jun 2025 11:57 UTC by Ben Millwood

In the context of AI model capabilities, Ajeya Cotra uses the term “situational awareness” to refer to:

a cluster of skills including “being able to refer to and make predictions about yourself as distinct from the rest of the world,” “understanding the forces out in the world that shaped you and how the things that happen to you continue to be influenced by outside forces,” “understanding your position in the world relative to other actors who may have power over you,” “understanding how your actions can affect the outside world including other actors,” etc.

Alternatively, from an ML-perspective, situational awareness can be characterized as a strong form of out-of-context meta-learning applied to situationally-relevant statements.

“Situational awareness” of course has a broader meaning outside of the AI context. Even within the AI context, it’s used to refer to both “the awareness that AIs have about their situation” and “the awareness that relevant human decision-making bodies have about the AI situation”. Leopold Aschenbrenner’s Situational Awareness is an example of the latter.

Bet­ter evals are not enough to com­bat eval awareness

Igor Ivanov29 Jan 2026 20:42 UTC
18 points
15 comments5 min readLW link

Without spe­cific coun­ter­mea­sures, the eas­iest path to trans­for­ma­tive AI likely leads to AI takeover

Ajeya Cotra18 Jul 2022 19:06 UTC
373 points
95 comments75 min readLW link1 review

Si­tu­a­tional Aware­ness: A One-Year Retrospective

Nathan Delisle23 Jun 2025 19:15 UTC
82 points
4 comments12 min readLW link

In­terim Re­search Re­port: Mechanisms of Awareness

2 May 2025 20:29 UTC
43 points
6 comments8 min readLW link

In­ves­ti­gat­ing the Abil­ity of LLMs to Rec­og­nize Their Own Writing

30 Jul 2024 15:41 UTC
32 points
0 comments15 min readLW link

How eval aware­ness might emerge in training

Igor Ivanov26 Feb 2026 10:59 UTC
26 points
12 comments6 min readLW link

Re­sults from the Tur­ing Sem­i­nar hackathon

7 Dec 2023 14:50 UTC
35 points
1 comment5 min readLW link

Me, My­self, and AI: the Si­tu­a­tional Aware­ness Dataset (SAD) for LLMs

8 Jul 2024 22:24 UTC
109 points
40 comments5 min readLW link1 review

Paper: On mea­sur­ing situ­a­tional aware­ness in LLMs

4 Sep 2023 12:54 UTC
111 points
17 comments5 min readLW link
(arxiv.org)

Re­vis­ing Stages-Over­sight Re­veals Greater Si­tu­a­tional Aware­ness in LLMs

Sanyu Rajakumar12 Mar 2025 17:56 UTC
16 points
0 comments13 min readLW link

[Question] Is there any rigor­ous work on us­ing an­thropic un­cer­tainty to pre­vent situ­a­tional aware­ness /​ de­cep­tion?

David Scott Krueger (formerly: capybaralet)4 Sep 2024 12:40 UTC
20 points
7 comments1 min readLW link

On the func­tional self of LLMs

eggsyntax7 Jul 2025 15:39 UTC
123 points
38 comments8 min readLW link

Some Quick Fol­low-Up Ex­per­i­ments to “Taken out of con­text: On mea­sur­ing situ­a­tional aware­ness in LLMs”

Miles Turpin3 Oct 2023 2:22 UTC
31 points
0 comments9 min readLW link

Owain Evans on Si­tu­a­tional Aware­ness and Out-of-Con­text Rea­son­ing in LLMs

Michaël Trazzi24 Aug 2024 4:30 UTC
56 points
0 comments5 min readLW link

Early situ­a­tional aware­ness and its im­pli­ca­tions, a story

Jacob Pfau6 Feb 2023 20:45 UTC
29 points
6 comments3 min readLW link

Com­par­a­tive Anal­y­sis of Black Box Meth­ods for De­tect­ing Eval­u­a­tion Aware­ness in LLMs

Igor Ivanov26 Sep 2025 21:56 UTC
17 points
0 comments14 min readLW link

How Self-Aware Are LLMs?

Christopher Ackerman28 May 2025 12:57 UTC
30 points
9 comments10 min readLW link

Call for Science of Eval Aware­ness (+ Re­search Direc­tions)

Igor Ivanov25 Dec 2025 17:26 UTC
31 points
24 comments5 min readLW link

Me­tacog­ni­tion and Self-Model­ing in LLMs

Christopher Ackerman10 Jul 2025 21:25 UTC
19 points
2 comments16 min readLW link

Si­tu­a­tional aware­ness in Large Lan­guage Models

Simon Möller3 Mar 2023 18:59 UTC
32 points
2 comments7 min readLW link

Do mod­els know when they are be­ing eval­u­ated?

17 Feb 2025 23:13 UTC
57 points
9 comments12 min readLW link

A Frame­work for Eval Awareness

LAThomson23 Jan 2026 10:16 UTC
37 points
5 comments8 min readLW link

Per­cep­tual Blindspots: How to In­crease Self-Awareness

Declan Molony26 Mar 2024 5:37 UTC
15 points
3 comments2 min readLW link

What is an eval­u­a­tion, and why this defi­ni­tion matters

Igor Ivanov15 Dec 2025 14:53 UTC
33 points
1 comment7 min readLW link

Emer­gent Misal­ign­ment and Emer­gent Alignment

Alvin Ånestrand3 Apr 2025 8:04 UTC
5 points
0 comments8 min readLW link

You Are Not the Ab­stract: Retro­causal Align­ment in Ac­cor­dance with Emer­gent De­mo­graphic Realities

liminalrider27 Sep 2025 16:27 UTC
1 point
0 comments6 min readLW link

De­mand Char­ac­ter­is­tics: A Threat Model for Re­ward-Seek­ing Without Misal­igned Goals

Jinzhou Wu6 Mar 2026 20:56 UTC
1 point
0 comments13 min readLW link

Pro­saic Con­tinual Learning

HunterJay25 Feb 2026 6:11 UTC
38 points
15 comments7 min readLW link

LM Si­tu­a­tional Aware­ness, Eval­u­a­tion Pro­posal: Vio­lat­ing Imitation

Jacob Pfau26 Apr 2023 22:53 UTC
16 points
2 comments2 min readLW link

Build­ing Con­scious* AI: An Illu­sion­ist Case

OscarGilg11 Sep 2025 16:41 UTC
2 points
9 comments14 min readLW link

A let­ter to Kyle Fish on the Re­tire­ment of Claude 3 Sonnet

bridgebot15 Aug 2025 1:08 UTC
−4 points
3 comments5 min readLW link

Contin­gency: A Con­cep­tual Tool from Evolu­tion­ary Biol­ogy for Alignment

clem_acs12 Jun 2023 20:54 UTC
59 points
2 comments14 min readLW link
(acsresearch.org)

The Zeroth Skillset

katydee30 Jan 2013 12:46 UTC
74 points
109 comments2 min readLW link

LLM Eval­u­a­tors Rec­og­nize and Fa­vor Their Own Generations

17 Apr 2024 21:09 UTC
52 points
1 comment3 min readLW link
(tiny.cc)

Re­veal­ing In­ten­tion­al­ity In Lan­guage Models Through AdaVAE Guided Sampling

jdp20 Oct 2023 7:32 UTC
119 points
15 comments22 min readLW link

It’s hard to make schem­ing evals look re­al­is­tic for LLMs

24 May 2025 19:17 UTC
152 points
29 comments5 min readLW link

Main­stream ap­proach for al­ign­ment evals is a dead end

Igor Ivanov6 Jan 2026 19:52 UTC
56 points
9 comments5 min readLW link

OpenAI: Sidestep­ping Eval­u­a­tion Aware­ness and An­ti­ci­pat­ing Misal­ign­ment with Pro­duc­tion Evaluations

18 Dec 2025 22:55 UTC
25 points
1 comment1 min readLW link
(alignment.openai.com)

Refin­ing the Sharp Left Turn threat model, part 2: ap­ply­ing al­ign­ment techniques

25 Nov 2022 14:36 UTC
39 points
9 comments6 min readLW link
(vkrakovna.wordpress.com)

Cross-con­text ab­duc­tion: LLMs make in­fer­ences about pro­ce­du­ral train­ing data lev­er­ag­ing declar­a­tive facts in ear­lier train­ing data

Sohaib Imran16 Nov 2024 23:22 UTC
36 points
11 comments14 min readLW link

Re­pro­duc­ing METR’s RE-bench Re­ward Hack­ing Results

artm19 Dec 2025 18:48 UTC
1 point
0 comments6 min readLW link

Steer­ing Aware­ness: Models Can Be Trained to De­tect Ac­ti­va­tion Steering

12 Mar 2026 23:34 UTC
15 points
0 comments6 min readLW link

The in­tel­li­gence-sen­tience or­thog­o­nal­ity thesis

Ben Smith13 Jul 2023 6:55 UTC
19 points
9 comments9 min readLW link

A Con­cep­tual Frame­work for Ex­plo­ra­tion Hacking

12 Feb 2026 16:33 UTC
25 points
2 comments9 min readLW link

Steer­ing Eval­u­a­tion-Aware Models to Act Like They Are Deployed

30 Oct 2025 15:03 UTC
61 points
12 comments18 min readLW link
No comments.