Jacob_Hilton

Karma: 1,514

Jacob_Hilton’s Shortform

Jacob_HiltonMay 1, 2025, 12:58 AM

6 points

1 comment LW link

A bird’s eye view of ARC’s research

Jacob_HiltonOct 23, 2024, 3:50 PM

121 points

12 comments7 min readLW link

(www.alignment.org)

Backdoors as an analogy for deceptive alignment

Jacob_Hilton and Mark Xu

Sep 6, 2024, 3:30 PM

104 points

2 comments8 min readLW link

(www.alignment.org)

Formal verification, heuristic explanations and surprise accounting

Jacob_HiltonJun 25, 2024, 3:40 PM

156 points

11 comments9 min readLW link

(www.alignment.org)

ARC is hiring theoretical researchers

paulfchristiano, Jacob_Hilton and Mark Xu

Jun 12, 2023, 6:50 PM

126 points

12 comments4 min readLW link

(www.alignment.org)

The effect of horizon length on scaling laws

Jacob_HiltonFeb 1, 2023, 3:59 AM

23 points

2 comments1 min readLW link

(arxiv.org)

Scaling Laws for Reward Model Overoptimization

leogao, John Schulman and Jacob_Hilton

Oct 20, 2022, 12:20 AM

103 points

13 comments1 min readLW link

(arxiv.org)

Common misconceptions about OpenAI

Jacob_HiltonAug 25, 2022, 2:02 PM

238 points

154 comments5 min readLW link 1 review

How much alignment data will we need in the long run?

Jacob_HiltonAug 10, 2022, 9:39 PM

37 points

15 comments4 min readLW link

Deep learning curriculum for large language model alignment

Jacob_HiltonJul 13, 2022, 9:58 PM

57 points

3 comments1 min readLW link

(github.com)

Procedurally evaluating factual accuracy: a request for research

Jacob_HiltonMar 30, 2022, 4:37 PM

25 points

2 comments6 min readLW link

Truthful LMs as a warm-up for aligned AGI

Jacob_HiltonJan 17, 2022, 4:49 PM

65 points

14 comments13 min readLW link

Stationary algorithmic probability

Jacob_HiltonApr 29, 2017, 5:23 PM

3 points

7 comments1 min readLW link

(www.jacobh.co.uk)