RSS

Owain_Evans

Karma: 3,336

https://​​owainevans.github.io/​​

Emer­gent Misal­ign­ment: Nar­row fine­tun­ing can pro­duce broadly mis­al­igned LLMs

Feb 25, 2025, 5:39 PM
328 points
90 comments4 min readLW link

Tell me about your­self: LLMs are aware of their learned behaviors

Jan 22, 2025, 12:47 AM
130 points
5 comments6 min readLW link

New, im­proved mul­ti­ple-choice TruthfulQA

Jan 15, 2025, 11:32 PM
72 points
0 comments3 min readLW link

In­fer­ence-Time-Com­pute: More Faith­ful? A Re­search Note

Jan 15, 2025, 4:43 AM
69 points
10 comments11 min readLW link

Tips On Em­piri­cal Re­search Slides

Jan 8, 2025, 5:06 AM
90 points
4 comments6 min readLW link

LLMs can learn about them­selves by introspection

Oct 18, 2024, 4:12 PM
102 points
38 comments9 min readLW link

Me, My­self, and AI: the Si­tu­a­tional Aware­ness Dataset (SAD) for LLMs

Jul 8, 2024, 10:24 PM
109 points
37 comments5 min readLW link

Con­nect­ing the Dots: LLMs can In­fer & Ver­bal­ize La­tent Struc­ture from Train­ing Data

Jun 21, 2024, 3:54 PM
163 points
13 comments8 min readLW link
(arxiv.org)

How do LLMs give truth­ful an­swers? A dis­cus­sion of LLM vs. hu­man rea­son­ing, en­sem­bles & parrots

Owain_EvansMar 28, 2024, 2:34 AM
27 points
0 comments9 min readLW link

Paper: Tell, Don’t Show- Declar­a­tive facts in­fluence how LLMs generalize

Dec 19, 2023, 7:14 PM
45 points
4 comments6 min readLW link
(arxiv.org)

How to Catch an AI Liar: Lie De­tec­tion in Black-Box LLMs by Ask­ing Un­re­lated Questions

Sep 28, 2023, 6:53 PM
187 points
39 comments3 min readLW link1 review

Paper: LLMs trained on “A is B” fail to learn “B is A”

Sep 23, 2023, 7:55 PM
121 points
74 comments4 min readLW link
(arxiv.org)

Paper: On mea­sur­ing situ­a­tional aware­ness in LLMs

Sep 4, 2023, 12:54 PM
109 points
16 comments5 min readLW link
(arxiv.org)

Paper: Fore­cast­ing world events with neu­ral nets

Jul 1, 2022, 7:40 PM
39 points
3 comments4 min readLW link

Paper: Teach­ing GPT3 to ex­press un­cer­tainty in words

Owain_EvansMay 31, 2022, 1:27 PM
97 points
7 comments4 min readLW link

How do new mod­els from OpenAI, Deep­Mind and An­thropic perform on Truth­fulQA?

Owain_EvansFeb 26, 2022, 12:46 PM
44 points
3 comments11 min readLW link

Lives of the Cam­bridge poly­math geniuses

Owain_EvansJan 25, 2022, 4:45 AM
108 points
40 comments3 min readLW link

The Ra­tion­al­ists of the 1950s (and be­fore) also called them­selves “Ra­tion­al­ists”

Owain_EvansNov 28, 2021, 8:17 PM
186 points
32 comments3 min readLW link1 review

Truth­ful and hon­est AI

Oct 29, 2021, 7:28 AM
42 points
1 comment13 min readLW link

AMA on Truth­ful AI: Owen Cot­ton-Bar­ratt, Owain Evans & co-authors

Owain_EvansOct 22, 2021, 4:23 PM
31 points
15 comments1 min readLW link