Filip Sondej

Karma: 476

Currently working on LLM unlearning. Also interested in CoT faithfulness, AI welfare/rights and mitigating AI conflict.

github.com/filyp

Shaping the exploration of the motivation-space matters for AI safety

Maxime Riché, Victor Gillioz, nielsrolf, Kajetan Dymkiewicz, Filip Sondej, RogerDearnaley, Daniel Tan and dillonkn

6 Mar 2026 14:43 UTC

78 points

15 comments10 min readLW link

Unlearning Needs to be More Selective [Progress Report]

Filip Sondej, Yushi Yang and Marcel Windys

27 Jun 2025 16:38 UTC

24 points

6 comments3 min readLW link

Belief in continuity of personhood can be money-pumped

Filip Sondej24 Jun 2025 9:39 UTC

3 points

6 comments1 min readLW link

How LLM Beliefs Change During Chain-of-Thought Reasoning

Filip Sondej, Petr Kašpárek, alex-kazda and Tomáš Gavenčiak

16 Jun 2025 16:18 UTC

32 points

3 comments5 min readLW link

Simple Steganographic Computation Eval—gpt-4o and gemini-exp-1206 can’t solve it yet

Filip Sondej19 Dec 2024 15:47 UTC

13 points

2 comments3 min readLW link

Testing which LLM architectures can do hidden serial reasoning

Filip Sondej16 Dec 2024 13:48 UTC

86 points

9 comments4 min readLW link

GPTs’ ability to keep a secret is weirdly prompt-dependent

Mateusz Bagiński, Filip Sondej and Marcel Windys

22 Jul 2023 12:21 UTC

31 points

0 comments9 min readLW link

Boomerang—protocol to dissolve some commitment races

Filip Sondej30 May 2023 16:21 UTC

37 points

10 comments8 min readLW link

Spooky action at a distance in the loss landscape

Jesse Hoogland and Filip Sondej

28 Jan 2023 0:22 UTC

62 points

4 comments7 min readLW link

(www.jessehoogland.com)

Mind is uncountable

Filip Sondej2 Nov 2022 11:51 UTC

18 points

22 comments3 min readLW link

New tool for exploring EA Forum, LessWrong and Alignment Forum—Tree of Tags

Filip Sondej13 Sep 2022 17:33 UTC

31 points

2 comments1 min readLW link

New cooperation mechanism—quadratic funding without a matching pool

Filip Sondej5 Jun 2022 13:55 UTC

11 points

0 comments5 min readLW link