RSS

Filip Sondej

Karma: 418

Currently working on LLM unlearning. Also interested in CoT faithfulness, AI welfare/​rights and mitigating AI conflict.

github.com/​​filyp

Un­learn­ing Needs to be More Selec­tive [Progress Re­port]

27 Jun 2025 16:38 UTC
24 points
6 comments3 min readLW link

In­finite money hack

Filip Sondej24 Jun 2025 9:39 UTC
3 points
6 comments1 min readLW link

How LLM Beliefs Change Dur­ing Chain-of-Thought Reasoning

16 Jun 2025 16:18 UTC
32 points
3 comments5 min readLW link

Sim­ple Stegano­graphic Com­pu­ta­tion Eval—gpt-4o and gem­ini-exp-1206 can’t solve it yet

Filip Sondej19 Dec 2024 15:47 UTC
13 points
2 comments3 min readLW link

Test­ing which LLM ar­chi­tec­tures can do hid­den se­rial reasoning

Filip Sondej16 Dec 2024 13:48 UTC
84 points
9 comments4 min readLW link

GPTs’ abil­ity to keep a se­cret is weirdly prompt-dependent

22 Jul 2023 12:21 UTC
31 points
0 comments9 min readLW link

Boomerang—pro­to­col to dis­solve some com­mit­ment races

Filip Sondej30 May 2023 16:21 UTC
37 points
10 comments8 min readLW link

Spooky ac­tion at a dis­tance in the loss landscape

28 Jan 2023 0:22 UTC
62 points
4 comments7 min readLW link
(www.jessehoogland.com)

Mind is uncountable

Filip Sondej2 Nov 2022 11:51 UTC
18 points
22 comments3 min readLW link

New tool for ex­plor­ing EA Fo­rum, LessWrong and Align­ment Fo­rum—Tree of Tags

Filip Sondej13 Sep 2022 17:33 UTC
31 points
2 comments1 min readLW link

New co­op­er­a­tion mechanism—quadratic fund­ing with­out a match­ing pool

Filip Sondej5 Jun 2022 13:55 UTC
11 points
0 comments5 min readLW link