RSS

Francis Rhys Ward

Karma: 276

Sim­ple dis­tri­bu­tion ap­prox­i­ma­tion: When sam­pled 100 times, can lan­guage mod­els yield 80% A and 20% B?

29 Jan 2024 0:24 UTC
39 points
5 comments4 min readLW link

Tall Tales at Differ­ent Scales: Eval­u­at­ing Scal­ing Trends For De­cep­tion In Lan­guage Models

8 Nov 2023 11:37 UTC
49 points
0 comments18 min readLW link

Re­ward Hack­ing from a Causal Perspective

21 Jul 2023 18:27 UTC
29 points
4 comments7 min readLW link

Agency from a causal perspective

30 Jun 2023 17:37 UTC
38 points
5 comments6 min readLW link

Causal­ity: A Brief Introduction

20 Jun 2023 15:01 UTC
48 points
18 comments6 min readLW link

In­tro­duc­tion to Towards Causal Foun­da­tions of Safe AGI

12 Jun 2023 17:55 UTC
67 points
6 comments4 min readLW link

For ev­ery choice of AGI difficulty, con­di­tion­ing on grad­ual take-off im­plies shorter timelines.

Francis Rhys Ward21 Apr 2022 7:44 UTC
31 points
13 comments3 min readLW link

On Agent In­cen­tives to Ma­nipu­late Hu­man Feed­back in Multi-Agent Re­ward Learn­ing Scenarios

Francis Rhys Ward3 Apr 2022 18:20 UTC
27 points
10 comments8 min readLW link