Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Chain-of-thought Alignment
Tag
Last edit:
23 Oct 2022 19:13 UTC
by
elifland
(Feel free to rename, and write a description)
Relevant
New
Old
Podcast: Tamera Lanham on AI risk, threat models, alignment proposals, externalized reasoning oversight, and working at Anthropic
Akash
20 Dec 2022 21:39 UTC
18
points
2
comments
11
min read
LW
link
Externalized reasoning oversight: a research direction for language model alignment
tamera
3 Aug 2022 12:03 UTC
106
points
22
comments
6
min read
LW
link
Steganography in Chain of Thought Reasoning
A Ray
8 Aug 2022 3:47 UTC
50
points
13
comments
6
min read
LW
link
Distilled Representations Research Agenda
Hoagy
and
mishajw
18 Oct 2022 20:59 UTC
15
points
2
comments
8
min read
LW
link
[Question]
Impact of ” ‘Let’s think step by step’ is all you need”?
yrimon
24 Jul 2022 20:59 UTC
20
points
2
comments
1
min read
LW
link
Paper: Large Language Models Can Self-improve [Linkpost]
Evan R. Murphy
2 Oct 2022 1:29 UTC
52
points
14
comments
1
min read
LW
link
(openreview.net)
[ASoT] Simulators show us behavioural properties by default
Jozdien
13 Jan 2023 18:42 UTC
26
points
1
comment
3
min read
LW
link
Imitation Learning from Language Feedback
Jérémy Scheurer
,
Tomek Korbak
and
Ethan Perez
30 Mar 2023 14:11 UTC
45
points
1
comment
10
min read
LW
link
No comments.
Back to top