RSS

Zygi Straznickas

Karma: 297

Previously: math for AI and AI for math

Now just kinda trying to figure out if AI can be made safe

Why White-Box Redteam­ing Makes Me Feel Weird

Zygi Straznickas16 Mar 2025 18:54 UTC
198 points
34 comments3 min readLW link

Fluent dream­ing for lan­guage mod­els (AI in­ter­pretabil­ity method)

6 Feb 2024 6:02 UTC
46 points
5 comments1 min readLW link
(arxiv.org)