Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Nandi
Karma:
229
All
Posts
Comments
New
Top
Old
Intricacies of Feature Geometry in Large Language Models
7vik
,
Lucius Bushnaq
and
Nandi
Dec 7, 2024, 6:10 PM
70
points
0
comments
12
min read
LW
link
The Geometry of Feelings and Nonsense in Large Language Models
7vik
and
Nandi
Sep 27, 2024, 5:49 PM
61
points
10
comments
4
min read
LW
link
[Paper] Hidden in Plain Text: Emergence and Mitigation of Steganographic Collusion in LLMs
Yohan Mathew
,
joanv
,
robert mccarthy
,
ollie
,
Nandi
and
Dylan Cope
Sep 25, 2024, 2:52 PM
37
points
2
comments
4
min read
LW
link
(arxiv.org)
Robustness of Contrast-Consistent Search to Adversarial Prompting
Nandi
,
i
,
Jamie Wright
,
Seamus_F
and
hugofry
Nov 1, 2023, 12:46 PM
18
points
1
comment
7
min read
LW
link
Machine Unlearning Evaluations as Interpretability Benchmarks
NickyP
and
Nandi
Oct 23, 2023, 4:33 PM
33
points
2
comments
11
min read
LW
link
Splitting Debate up into Two Subsystems
Nandi
Jul 3, 2020, 8:11 PM
13
points
5
comments
4
min read
LW
link
Acknowledging Human Preference Types to Support Value Learning
Nandi
Nov 13, 2018, 6:57 PM
34
points
4
comments
9
min read
LW
link
Back to top
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel