joshc

Karma: 563

joshuaclymer.com

[Question] What is the best critique of AI existential risk arguments?

joshc30 Aug 2022 2:18 UTC

6 points

11 comments1 min readLW link

Prizes for ML Safety Benchmark Ideas

joshc28 Oct 2022 2:51 UTC

36 points

4 comments1 min readLW link

[MLSN #7]: an example of an emergent internal optimizer

joshc and Dan H

9 Jan 2023 19:39 UTC

28 points

0 comments6 min readLW link

Are short timelines actually bad?

joshc5 Feb 2023 21:21 UTC

56 points

7 comments3 min readLW link

Safety standards: a framework for AI regulation

joshc1 May 2023 0:56 UTC

19 points

0 comments8 min readLW link

Red teaming: challenges and research directions

joshc10 May 2023 1:40 UTC

30 points

1 comment10 min readLW link

Testbed evals: evaluating AI safety even when it can’t be directly measured

joshc15 Nov 2023 19:00 UTC

70 points

2 comments4 min readLW link

New paper shows truthfulness & instruction-following don’t generalize by default

joshc19 Nov 2023 19:27 UTC

58 points

0 comments4 min readLW link

List of strategies for mitigating deceptive alignment

joshc2 Dec 2023 5:56 UTC

34 points

2 comments6 min readLW link

New report: Safety Cases for AI

joshc20 Mar 2024 16:45 UTC

90 points

13 comments1 min readLW link

(twitter.com)