Christopher King

Karma: 669

@theking@mathstodon.xyz

ARC tests to see if GPT-4 can escape human control; GPT-4 failed to do so

Christopher King15 Mar 2023 0:29 UTC

116 points

22 comments2 min readLW link

Formalizing the “AI x-risk is unlikely because it is ridiculous” argument

Christopher King3 May 2023 18:56 UTC

47 points

17 comments3 min readLW link

AI community building: EliezerKart

Christopher King1 Apr 2023 15:25 UTC

45 points

0 comments2 min readLW link

The way AGI wins could look very stupid

Christopher King12 May 2023 16:34 UTC

42 points

22 comments1 min readLW link

Anthropically Blind: the anthropic shadow is reflectively inconsistent

Christopher King29 Jun 2023 2:36 UTC

40 points

38 comments10 min readLW link

Solomonoff induction still works if the universe is uncomputable, and its usefulness doesn’t require knowing Occam’s razor

Christopher King18 Jun 2023 1:52 UTC

38 points

28 comments4 min readLW link

“Corrigibility at some small length” by dath ilan

Christopher King5 Apr 2023 1:47 UTC

32 points

3 comments9 min readLW link

(www.glowfic.com)

[Question] Accuracy of arguments that are seen as ridiculous and intuitively false but don’t have good counter-arguments

Christopher King29 Apr 2023 23:58 UTC

30 points

39 comments1 min readLW link

Current AI harms are also sci-fi

Christopher King8 Jun 2023 17:49 UTC

26 points

3 comments1 min readLW link

Optimality is the tiger, and annoying the user is its teeth

Christopher King28 Jan 2023 20:20 UTC

25 points

5 comments2 min readLW link

Proposal: we should start referring to the risk from unaligned AI as a type of accident risk

Christopher King16 May 2023 15:18 UTC

22 points

6 comments2 min readLW link

GPT-4 is bad at strategic thinking

Christopher King27 Mar 2023 15:11 UTC

22 points

8 comments1 min readLW link

A better analogy and example for teaching AI takeover: the ML Inferno

Christopher King14 Mar 2023 19:14 UTC

18 points

0 comments5 min readLW link

How do low level hypotheses constrain high level ones? The mystery of the disappearing diamond.

Christopher King11 Jul 2023 19:27 UTC

17 points

11 comments2 min readLW link

Bing finding ways to bypass Microsoft’s filters without being asked. Is it reproducible?

Christopher King20 Feb 2023 15:11 UTC

16 points

15 comments1 min readLW link

Does GPT-4 exhibit agency when summarizing articles?

Christopher King24 Mar 2023 15:49 UTC

16 points

2 comments5 min readLW link

GPT-4 aligning with acasual decision theory when instructed to play games, but includes a CDT explanation that’s incorrect if they differ

Christopher King23 Mar 2023 16:16 UTC

7 points

4 comments8 min readLW link

PCAST Working Group on Generative AI Invites Public Input

Christopher King13 May 2023 22:49 UTC

7 points

0 comments1 min readLW link

(terrytao.wordpress.com)

Challenge proposal: smallest possible self-hardening backdoor for RLHF

Christopher King29 Jun 2023 16:56 UTC

7 points

0 comments2 min readLW link

Inference from a Mathematical Description of an Existing Alignment Research: a proposal for an outer alignment research program

Christopher King2 Jun 2023 21:54 UTC

7 points

4 comments16 min readLW link