Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
NickGabs
Karma:
385
All
Posts
Comments
New
Top
Old
Steering Llama-2 with contrastive activation additions
Nina Panickssery
,
Wuschel Schulz
,
NickGabs
,
Meg
,
evhub
and
TurnTrout
Jan 2, 2024, 12:47 AM
125
points
29
comments
8
min read
LW
link
(arxiv.org)
Science of Deep Learning more tractably addresses the Sharp Left Turn than Agent Foundations
NickGabs
Sep 19, 2023, 10:06 PM
20
points
2
comments
6
min read
LW
link
An upcoming US Supreme Court case may impede AI governance efforts
NickGabs
Jul 16, 2023, 11:51 PM
57
points
17
comments
2
min read
LW
link
Empirical Evidence Against “The Longest Training Run”
NickGabs
Jul 6, 2023, 6:32 PM
31
points
0
comments
14
min read
LW
link
Proposal: labs should precommit to pausing if an AI argues for itself to be improved
NickGabs
Jun 2, 2023, 10:31 PM
3
points
3
comments
4
min read
LW
link
AI Doom Is Not (Only) Disjunctive
NickGabs
Mar 30, 2023, 1:42 AM
12
points
0
comments
5
min read
LW
link
We Need Holistic AI Macrostrategy
NickGabs
Jan 15, 2023, 2:13 AM
39
points
4
comments
8
min read
LW
link
Takeoff speeds, the chimps analogy, and the Cultural Intelligence Hypothesis
NickGabs
Dec 2, 2022, 7:14 PM
16
points
2
comments
4
min read
LW
link
Miscellaneous First-Pass Alignment Thoughts
NickGabs
Nov 21, 2022, 9:23 PM
12
points
4
comments
10
min read
LW
link
Distillation of “How Likely Is Deceptive Alignment?”
NickGabs
Nov 18, 2022, 4:31 PM
24
points
4
comments
10
min read
LW
link
Back to top