Situational Awareness

TagLast edit: 6 Jun 2025 11:57 UTC by Ben Millwood

In the context of AI model capabilities, Ajeya Cotra uses the term “situational awareness” to refer to:

a cluster of skills including “being able to refer to and make predictions about yourself as distinct from the rest of the world,” “understanding the forces out in the world that shaped you and how the things that happen to you continue to be influenced by outside forces,” “understanding your position in the world relative to other actors who may have power over you,” “understanding how your actions can affect the outside world including other actors,” etc.

Alternatively, from an ML-perspective, situational awareness can be characterized as a strong form of out-of-context meta-learning applied to situationally-relevant statements.

“Situational awareness” of course has a broader meaning outside of the AI context. Even within the AI context, it’s used to refer to both “the awareness that AIs have about their situation” and “the awareness that relevant human decision-making bodies have about the AI situation”. Leopold Aschenbrenner’s Situational Awareness is an example of the latter.

Better evals are not enough to combat eval awareness

Igor Ivanov29 Jan 2026 20:42 UTC

18 points

15 comments5 min readLW link

Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover

Ajeya Cotra18 Jul 2022 19:06 UTC

373 points

95 comments75 min readLW link 1 review

Situational Awareness: A One-Year Retrospective

Nathan Delisle23 Jun 2025 19:15 UTC

82 points

4 comments12 min readLW link

Interim Research Report: Mechanisms of Awareness

Josh Engels, Neel Nanda and Senthooran Rajamanoharan

2 May 2025 20:29 UTC

43 points

6 comments8 min readLW link

Investigating the Ability of LLMs to Recognize Their Own Writing

Christopher Ackerman and Nina Panickssery

30 Jul 2024 15:41 UTC

32 points

0 comments15 min readLW link

How eval awareness might emerge in training

Igor Ivanov26 Feb 2026 10:59 UTC

26 points

12 comments6 min readLW link

Results from the Turing Seminar hackathon

Charbel-Raphaël, jeanne_ and Léo Dana

7 Dec 2023 14:50 UTC

35 points

1 comment5 min readLW link

Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs

L Rudolf L, bilalchughtai, Jan Betley, kaivu, Jérémy Scheurer, Mikita Balesni, AlexMeinke, Owain_Evans and Marius Hobbhahn

8 Jul 2024 22:24 UTC

109 points

40 comments5 min readLW link 1 review

Paper: On measuring situational awareness in LLMs

Owain_Evans, Daniel Kokotajlo, Mikita Balesni, Tomek Korbak, Asa Cooper Stickland, Meg and Maximilian Kaufmann

4 Sep 2023 12:54 UTC

111 points

17 comments5 min readLW link

(arxiv.org)

Revising Stages-Oversight Reveals Greater Situational Awareness in LLMs

Sanyu Rajakumar12 Mar 2025 17:56 UTC

16 points

0 comments13 min readLW link

[Question] Is there any rigorous work on using anthropic uncertainty to prevent situational awareness / deception?

David Scott Krueger (formerly: capybaralet)4 Sep 2024 12:40 UTC

20 points

7 comments1 min readLW link

On the functional self of LLMs

eggsyntax7 Jul 2025 15:39 UTC

123 points

38 comments8 min readLW link

Some Quick Follow-Up Experiments to “Taken out of context: On measuring situational awareness in LLMs”

Miles Turpin3 Oct 2023 2:22 UTC

31 points

0 comments9 min readLW link

Owain Evans on Situational Awareness and Out-of-Context Reasoning in LLMs

Michaël Trazzi24 Aug 2024 4:30 UTC

56 points

0 comments5 min readLW link

Early situational awareness and its implications, a story

Jacob Pfau6 Feb 2023 20:45 UTC

29 points

6 comments3 min readLW link

Comparative Analysis of Black Box Methods for Detecting Evaluation Awareness in LLMs

Igor Ivanov26 Sep 2025 21:56 UTC

17 points

0 comments14 min readLW link

How Self-Aware Are LLMs?

Christopher Ackerman28 May 2025 12:57 UTC

30 points

9 comments10 min readLW link

Call for Science of Eval Awareness (+ Research Directions)

Igor Ivanov25 Dec 2025 17:26 UTC

31 points

24 comments5 min readLW link

Metacognition and Self-Modeling in LLMs

Christopher Ackerman10 Jul 2025 21:25 UTC

19 points

2 comments16 min readLW link

Situational awareness in Large Language Models

Simon Möller3 Mar 2023 18:59 UTC

32 points

2 comments7 min readLW link

Do models know when they are being evaluated?

fidgetsinner, Giles, Joe Needham and Marius Hobbhahn

17 Feb 2025 23:13 UTC

57 points

9 comments12 min readLW link

A Framework for Eval Awareness

LAThomson23 Jan 2026 10:16 UTC

37 points

5 comments8 min readLW link

Perceptual Blindspots: How to Increase Self-Awareness

Declan Molony26 Mar 2024 5:37 UTC

15 points

3 comments2 min readLW link

What is an evaluation, and why this definition matters

Igor Ivanov15 Dec 2025 14:53 UTC

33 points

1 comment7 min readLW link

Emergent Misalignment and Emergent Alignment

Alvin Ånestrand3 Apr 2025 8:04 UTC

5 points

0 comments8 min readLW link

You Are Not the Abstract: Retrocausal Alignment in Accordance with Emergent Demographic Realities

liminalrider27 Sep 2025 16:27 UTC

1 point

0 comments6 min readLW link

Demand Characteristics: A Threat Model for Reward-Seeking Without Misaligned Goals

Jinzhou Wu6 Mar 2026 20:56 UTC

1 point

0 comments13 min readLW link

Prosaic Continual Learning

HunterJay25 Feb 2026 6:11 UTC

38 points

15 comments7 min readLW link

LM Situational Awareness, Evaluation Proposal: Violating Imitation

Jacob Pfau26 Apr 2023 22:53 UTC

16 points

2 comments2 min readLW link

Building Conscious* AI: An Illusionist Case

OscarGilg11 Sep 2025 16:41 UTC

2 points

9 comments14 min readLW link

A letter to Kyle Fish on the Retirement of Claude 3 Sonnet

bridgebot15 Aug 2025 1:08 UTC

−4 points

3 comments5 min readLW link

Contingency: A Conceptual Tool from Evolutionary Biology for Alignment

clem_acs12 Jun 2023 20:54 UTC

59 points

2 comments14 min readLW link

(acsresearch.org)

The Zeroth Skillset

katydee30 Jan 2013 12:46 UTC

74 points

109 comments2 min readLW link

LLM Evaluators Recognize and Favor Their Own Generations

Arjun Panickssery, Sam Bowman and Shi

17 Apr 2024 21:09 UTC

52 points

1 comment3 min readLW link

(tiny.cc)

Revealing Intentionality In Language Models Through AdaVAE Guided Sampling

jdp20 Oct 2023 7:32 UTC

119 points

15 comments22 min readLW link

It’s hard to make scheming evals look realistic for LLMs

Igor Ivanov and Danil Kadochnikov

24 May 2025 19:17 UTC

152 points

29 comments5 min readLW link

Mainstream approach for alignment evals is a dead end

Igor Ivanov6 Jan 2026 19:52 UTC

56 points

9 comments5 min readLW link

OpenAI: Sidestepping Evaluation Awareness and Anticipating Misalignment with Production Evaluations

Marcus Williams and micahcarroll

18 Dec 2025 22:55 UTC

25 points

1 comment1 min readLW link

(alignment.openai.com)

Refining the Sharp Left Turn threat model, part 2: applying alignment techniques

Vika, Vikrant Varma, Ramana Kumar and Rohin Shah

25 Nov 2022 14:36 UTC

39 points

9 comments6 min readLW link

(vkrakovna.wordpress.com)

Cross-context abduction: LLMs make inferences about procedural training data leveraging declarative facts in earlier training data

Sohaib Imran16 Nov 2024 23:22 UTC

36 points

11 comments14 min readLW link

Reproducing METR’s RE-bench Reward Hacking Results

artm19 Dec 2025 18:48 UTC

1 point

0 comments6 min readLW link

Steering Awareness: Models Can Be Trained to Detect Activation Steering

josh :) and David Africa

12 Mar 2026 23:34 UTC

15 points

0 comments6 min readLW link

The intelligence-sentience orthogonality thesis

Ben Smith13 Jul 2023 6:55 UTC

19 points

9 comments9 min readLW link

A Conceptual Framework for Exploration Hacking

Joschka Braun, Eyon Jang and Damon Falck

12 Feb 2026 16:33 UTC

25 points

2 comments9 min readLW link

Steering Evaluation-Aware Models to Act Like They Are Deployed

Tim Hua, andrq, Sam Marks and Neel Nanda

30 Oct 2025 15:03 UTC

61 points

12 comments18 min readLW link

No comments.