RSS

Filip Sondej

Karma: 473

Currently working on LLM unlearning. Also interested in CoT faithfulness, AI welfare/​rights and mitigating AI conflict.

github.com/​​filyp

Shap­ing the ex­plo­ra­tion of the mo­ti­va­tion-space mat­ters for AI safety

6 Mar 2026 14:43 UTC
77 points
13 comments10 min readLW link

Un­learn­ing Needs to be More Selec­tive [Progress Re­port]

27 Jun 2025 16:38 UTC
24 points
6 comments3 min readLW link

Belief in con­ti­nu­ity of per­son­hood can be money-pumped

Filip Sondej24 Jun 2025 9:39 UTC
3 points
6 comments1 min readLW link

How LLM Beliefs Change Dur­ing Chain-of-Thought Reasoning

16 Jun 2025 16:18 UTC
32 points
3 comments5 min readLW link

Sim­ple Stegano­graphic Com­pu­ta­tion Eval—gpt-4o and gem­ini-exp-1206 can’t solve it yet

Filip Sondej19 Dec 2024 15:47 UTC
13 points
2 comments3 min readLW link