RSS

Sam F. Brown

Karma: 355

[Paper] AI Sand­bag­ging: Lan­guage Models can Strate­gi­cally Un­der­perform on Evaluations

13 Jun 2024 10:04 UTC
75 points
10 comments2 min readLW link
(arxiv.org)

Oxford Ra­tion­al­ish—June Pub

10 Jun 2024 11:44 UTC
1 point
0 comments1 min readLW link

OxRat ACX Mee­tups Every­where—Spring 2024

16 Mar 2024 19:41 UTC
7 points
0 comments1 min readLW link

OxRat March Pub Social

10 Mar 2024 21:27 UTC
1 point
0 comments1 min readLW link

Oxford Ra­tion­al­ish—Dec Pub

8 Dec 2023 20:20 UTC
1 point
0 comments1 min readLW link

Tall Tales at Differ­ent Scales: Eval­u­at­ing Scal­ing Trends For De­cep­tion In Lan­guage Models

8 Nov 2023 11:37 UTC
49 points
0 comments18 min readLW link

Oxford Ra­tion­al­ish—Sept Pub

Sam F. Brown19 Sep 2023 10:03 UTC
4 points
0 comments1 min readLW link

OxRat ACX Mee­tups Every­where 2023

Sam F. Brown30 Aug 2023 3:15 UTC
4 points
0 comments1 min readLW link

Oxford, UK – ACX Mee­tups Every­where Fall 2023

Sam F. Brown25 Aug 2023 23:33 UTC
4 points
0 comments1 min readLW link

Oxford Ra­tion­al­ish—July Pub

Sam F. Brown15 Jul 2023 10:10 UTC
4 points
0 comments1 min readLW link

Oxford Ra­tion­al­ish—June Pub—Are Woo Non-Re­spon­ders Defec­tive?

Sam F. Brown17 Jun 2023 12:37 UTC
6 points
0 comments1 min readLW link

Oxford Ra­tion­al­ish—May Pub

Sam F. Brown16 May 2023 0:13 UTC
5 points
0 comments1 min readLW link

Oxford, UK – ACX Mee­tups Every­where Spring 2023

Sam F. Brown10 Apr 2023 21:49 UTC
4 points
0 comments1 min readLW link

Oxford Ra­tion­al­ish—April Pub—mini ACX Mee­tups Everywhere

Sam F. Brown16 Mar 2023 13:08 UTC
6 points
0 comments1 min readLW link

Oxford Ra­tion­al­ish—March Pub

Sam F. Brown11 Mar 2023 16:15 UTC
4 points
0 comments1 min readLW link

How to find cool things in a new place

Sam F. Brown24 Jan 2023 11:20 UTC
12 points
0 comments1 min readLW link

Oxford Ra­tion­al­ish—Fe­bru­ary Pub

Sam F. Brown23 Jan 2023 13:42 UTC
4 points
0 comments1 min readLW link

Oxford Ra­tion­al­ish—Jan­uary Pub

Sam F. Brown8 Jan 2023 19:24 UTC
3 points
0 comments1 min readLW link

Oxford Ra­tion­al­ish—De­cem­ber Pub

Sam F. Brown29 Nov 2022 16:38 UTC
3 points
0 comments1 min readLW link

Ques­tions about Value Lock-in, Pa­ter­nal­ism, and Empowerment

Sam F. Brown16 Nov 2022 15:33 UTC
13 points
2 comments12 min readLW link
(sambrown.eu)