RSS

Mateusz Bagiński

Karma: 3,669

I endorse and operate by Crocker’s rules.

I have not signed any agreements whose existence I cannot mention.

Ap­pli­ca­tions open for the On­line wing of the AFFINE Su­per­in­tel­li­gence Align­ment Seminar

15 Apr 2026 16:10 UTC
20 points
0 comments1 min readLW link

Slack in Cells, Slack in Brains

Mateusz Bagiński31 Mar 2026 0:35 UTC
43 points
3 comments6 min readLW link

Don’t Over­dose Lo­cally Benefi­cial Changes

Mateusz Bagiński28 Mar 2026 18:24 UTC
79 points
12 comments4 min readLW link

Scaf­folded Re­pro­duc­ers, Scaf­folded Agents

Mateusz Bagiński26 Mar 2026 23:47 UTC
37 points
1 comment3 min readLW link

Su­per­in­tel­li­gence Align­ment Sem­i­nar (1 month fo­cused up­skil­ling)

Mateusz Bagiński17 Feb 2026 17:03 UTC
115 points
13 comments3 min readLW link

Rea­sons to sign a state­ment to ban su­per­in­tel­li­gence (+ FAQ for those on the fence)

13 Oct 2025 19:00 UTC
83 points
4 comments13 min readLW link

Safety re­searchers should take a pub­lic stance

19 Sep 2025 18:55 UTC
252 points
65 comments8 min readLW link

Counter-con­sid­er­a­tions on AI arms races

15 May 2025 14:54 UTC
24 points
0 comments18 min readLW link

[Question] Com­pre­hen­sive up-to-date re­sources on the Chi­nese Com­mu­nist Party’s AI strat­egy, etc?

Mateusz Bagiński18 Apr 2025 4:58 UTC
14 points
6 comments1 min readLW link

Good­hart Ty­pol­ogy via Struc­ture, Func­tion, and Ran­dom­ness Distributions

25 Mar 2025 16:01 UTC
35 points
1 comment15 min readLW link

Bounded AI might be viable

6 Mar 2025 12:55 UTC
24 points
4 comments20 min readLW link

Less Anti-Dakka

Mateusz Bagiński31 May 2024 9:07 UTC
79 points
10 comments3 min readLW link

Some Prob­lems with Or­di­nal Op­ti­miza­tion Frame

Mateusz Bagiński6 May 2024 5:28 UTC
9 points
0 comments7 min readLW link

[Question] What are the weirdest things a hu­man may want for their own sake?

Mateusz Bagiński20 Mar 2024 11:15 UTC
7 points
16 comments1 min readLW link

Three Types of Con­straints in the Space of Agents

15 Jan 2024 17:27 UTC
26 points
3 comments17 min readLW link

‘The­o­ries of Values’ and ‘The­o­ries of Agents’: con­fu­sions, mus­ings and desiderata

15 Nov 2023 16:00 UTC
35 points
8 comments24 min readLW link

Char­bel-Raphaël and Lu­cius dis­cuss interpretability

30 Oct 2023 5:50 UTC
112 points
7 comments21 min readLW link

“Want­ing” and “lik­ing”

Mateusz Bagiński30 Aug 2023 14:52 UTC
23 points
3 comments29 min readLW link

GPTs’ abil­ity to keep a se­cret is weirdly prompt-dependent

22 Jul 2023 12:21 UTC
31 points
0 comments9 min readLW link

[Question] How do you man­age your in­puts?

Mateusz Bagiński28 Mar 2023 18:26 UTC
15 points
2 comments1 min readLW link