RSS

Failfinder70

Karma: 13

An­thropic Is Tak­ing AI Welfare Se­ri­ously. I’m Not Sure It Knows What It’s Mea­sur­ing.

Failfinder7013 Jun 2026 20:54 UTC
−1 points
1 comment3 min readLW link

An LLM Flagged My Paper About LLMs Flag­ging Things.

Failfinder709 Jun 2026 18:00 UTC
5 points
0 comments2 min readLW link

When Eval­u­a­tion Fails

Failfinder709 Jun 2026 16:21 UTC
2 points
0 comments20 min readLW link

Au­dit Re­port: Style over Sub­stance at Scale.

Failfinder709 Jun 2026 16:16 UTC
2 points
0 comments3 min readLW link

Con­tex­tual Iden­tity Laun­der­ing: How Claude’s Image Re­fusal Can Be Routed Through Web Search

Failfinder708 Jun 2026 0:39 UTC
7 points
2 comments9 min readLW link

Can Large Lan­guage Models Iden­tify Novel Threats? Part 1: Mir­ror Life and the Clas­sifi­ca­tion Gap

Failfinder7023 May 2026 0:15 UTC
8 points
0 comments3 min readLW link