RSS

joshc

Karma: 450

joshuaclymer.com

List of strate­gies for miti­gat­ing de­cep­tive alignment

joshc2 Dec 2023 5:56 UTC
32 points
2 comments6 min readLW link

New pa­per shows truth­ful­ness & in­struc­tion-fol­low­ing don’t gen­er­al­ize by default

joshc19 Nov 2023 19:27 UTC
58 points
0 comments4 min readLW link

Testbed evals: eval­u­at­ing AI safety even when it can’t be di­rectly mea­sured

joshc15 Nov 2023 19:00 UTC
68 points
2 comments4 min readLW link

Red team­ing: challenges and re­search directions

joshc10 May 2023 1:40 UTC
20 points
0 comments10 min readLW link

Safety stan­dards: a frame­work for AI regulation

joshc1 May 2023 0:56 UTC
19 points
0 comments8 min readLW link

Are short timelines ac­tu­ally bad?

joshc5 Feb 2023 21:21 UTC
56 points
7 comments3 min readLW link

[MLSN #7]: an ex­am­ple of an emer­gent in­ter­nal optimizer

9 Jan 2023 19:39 UTC
28 points
0 comments6 min readLW link