RSS

mrinank_sharma

Karma: 211

Disem­pow­er­ment pat­terns in real-world AI usage

29 Jan 2026 16:36 UTC
47 points
3 comments2 min readLW link
(www.anthropic.com)

Best-of-N Jailbreaking

14 Dec 2024 4:58 UTC
79 points
5 comments2 min readLW link
(arxiv.org)

Towards Un­der­stand­ing Sy­co­phancy in Lan­guage Models

24 Oct 2023 0:30 UTC
66 points
0 comments2 min readLW link
(arxiv.org)

Paper: Un­der­stand­ing and Con­trol­ling a Maze-Solv­ing Policy Network

13 Oct 2023 1:38 UTC
70 points
0 comments1 min readLW link
(arxiv.org)