Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Jacob_Hilton
Karma:
1,514
All
Posts
Comments
New
Top
Old
Jacob_Hilton’s Shortform
Jacob_Hilton
May 1, 2025, 12:58 AM
6
points
1
comment
LW
link
A bird’s eye view of ARC’s research
Jacob_Hilton
Oct 23, 2024, 3:50 PM
121
points
12
comments
7
min read
LW
link
(www.alignment.org)
Backdoors as an analogy for deceptive alignment
Jacob_Hilton
and
Mark Xu
Sep 6, 2024, 3:30 PM
104
points
2
comments
8
min read
LW
link
(www.alignment.org)
Formal verification, heuristic explanations and surprise accounting
Jacob_Hilton
Jun 25, 2024, 3:40 PM
156
points
11
comments
9
min read
LW
link
(www.alignment.org)
ARC is hiring theoretical researchers
paulfchristiano
,
Jacob_Hilton
and
Mark Xu
Jun 12, 2023, 6:50 PM
126
points
12
comments
4
min read
LW
link
(www.alignment.org)
The effect of horizon length on scaling laws
Jacob_Hilton
Feb 1, 2023, 3:59 AM
23
points
2
comments
1
min read
LW
link
(arxiv.org)
Scaling Laws for Reward Model Overoptimization
leogao
,
John Schulman
and
Jacob_Hilton
Oct 20, 2022, 12:20 AM
103
points
13
comments
1
min read
LW
link
(arxiv.org)
Common misconceptions about OpenAI
Jacob_Hilton
Aug 25, 2022, 2:02 PM
238
points
154
comments
5
min read
LW
link
1
review
How much alignment data will we need in the long run?
Jacob_Hilton
Aug 10, 2022, 9:39 PM
37
points
15
comments
4
min read
LW
link
Deep learning curriculum for large language model alignment
Jacob_Hilton
Jul 13, 2022, 9:58 PM
57
points
3
comments
1
min read
LW
link
(github.com)
Procedurally evaluating factual accuracy: a request for research
Jacob_Hilton
Mar 30, 2022, 4:37 PM
25
points
2
comments
6
min read
LW
link
Truthful LMs as a warm-up for aligned AGI
Jacob_Hilton
Jan 17, 2022, 4:49 PM
65
points
14
comments
13
min read
LW
link
Stationary algorithmic probability
Jacob_Hilton
Apr 29, 2017, 5:23 PM
3
points
7
comments
1
min read
LW
link
(www.jacobh.co.uk)
Back to top