Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Demystifying “Alignment” through a Comic
milanrosko
9 Jun 2024 8:24 UTC
108
points
19
comments
1
min read
LW
link
Art
AI
Conceptual Media
AI Alignment Fieldbuilding
Has Diagram
Inner Alignment
Post permalink
Link without comments
Link without top nav bars
Link without comments or top nav bars
Disclaimer: This
explanatory
comic is not specifically aimed at the Less Wrong contributor.
What links here?
DPO/PPO-RLHF on LLMs incentivizes sycophancy, exaggeration and deceptive hallucination, but not misaligned powerseeking by
tailcalled
(
10 Jun 2024 21:20 UTC
; 29 points)
milanrosko
9 Jun 2024 8:24 UTC
108
points
19
comments
1
min read
LW
link
Art
AI
Conceptual Media
AI Alignment Fieldbuilding
Has Diagram
Inner Alignment
Post permalink
Link without comments
Link without top nav bars
Link without comments or top nav bars
Back to top
Demystifying “Alignment” through a Comic
Disclaimer: This explanatory comic is not specifically aimed at the Less Wrong contributor.