Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Failfinder70
Karma:
13
All
Posts
Comments
New
Top
Old
Anthropic Is Taking AI Welfare Seriously. I’m Not Sure It Knows What It’s Measuring.
Failfinder70
13 Jun 2026 20:54 UTC
−1
points
2
comments
3
min read
LW
link
An LLM Flagged My Paper About LLMs Flagging Things.
Failfinder70
9 Jun 2026 18:00 UTC
5
points
0
comments
2
min read
LW
link
When Evaluation Fails
Failfinder70
9 Jun 2026 16:21 UTC
2
points
0
comments
20
min read
LW
link
Audit Report: Style over Substance at Scale.
Failfinder70
9 Jun 2026 16:16 UTC
2
points
0
comments
3
min read
LW
link
Contextual Identity Laundering: How Claude’s Image Refusal Can Be Routed Through Web Search
Failfinder70
8 Jun 2026 0:39 UTC
7
points
2
comments
9
min read
LW
link
Can Large Language Models Identify Novel Threats? Part 1: Mirror Life and the Classification Gap
Failfinder70
23 May 2026 0:15 UTC
8
points
0
comments
3
min read
LW
link
Back to top