David Johnston

Karma: 672

MIRI’s “The Problem” hinges on diagnostic dilution

David Johnston13 Aug 2025 6:25 UTC

21 points

23 comments6 min readLW link

A brief theory of why we think things are good or bad

David Johnston20 Oct 2024 20:31 UTC

7 points

10 comments4 min readLW link

Mechanistic Anomaly Detection Research Update

Nora Belrose and David Johnston

6 Aug 2024 10:33 UTC

11 points

0 comments1 min readLW link

(blog.eleuther.ai)

Opinion merging for AI control

David Johnston4 May 2023 2:43 UTC

6 points

0 comments11 min readLW link

[Question] Is it worth avoiding detailed discussions of expectations about agency levels of powerful AIs?

David Johnston16 Mar 2023 3:06 UTC

11 points

6 comments2 min readLW link

How likely are malign priors over objectives? [aborted WIP]

David Johnston11 Nov 2022 5:36 UTC

−1 points

0 comments8 min readLW link

When can a mimic surprise you? Why generative models handle seemingly ill-posed problems

David Johnston5 Nov 2022 13:19 UTC

8 points

4 comments16 min readLW link

There’s probably a tradeoff between AI capability and safety, and we should act like it

David Johnston9 Jun 2022 0:17 UTC

3 points

3 comments1 min readLW link

Is evolutionary influence the mesa objective that we’re interested in?

David Johnston3 May 2022 1:18 UTC

3 points

2 comments5 min readLW link

[Cross-post] Half baked ideas: defining and measuring Artificial Intelligence system effectiveness

David Johnston5 Apr 2022 0:29 UTC

2 points

0 comments7 min readLW link

[Question] Are there any impossibility theorems for strong and safe AI?

David Johnston11 Mar 2022 1:41 UTC

5 points

3 comments1 min readLW link

Counterfactuals from ensembles of peers

David Johnston4 Jan 2022 7:01 UTC

3 points

4 comments7 min readLW link