peligrietzer

Karma: 1,012

Shallow review of technical AI safety, 2025

technicalities, Tomáš Gavenčiak, Stephen McAleese, peligrietzer, Stag, jordinne, ozziegooen, Violet Hour and lenz

17 Dec 2025 18:18 UTC

193 points

9 comments47 min readLW link

The Problem With the Word ‘Alignment’

peligrietzer and particlemania

21 May 2024 3:48 UTC

64 points

8 comments6 min readLW link

Paper: Understanding and Controlling a Maze-Solving Policy Network

TurnTrout, Ulisse Mini, peligrietzer, mrinank_sharma, Austin Meek, Monte M and lisathiergart

13 Oct 2023 1:38 UTC

70 points

0 comments1 min readLW link

(arxiv.org)

Some Thoughts on Virtue Ethics for AIs

peligrietzer2 May 2023 5:46 UTC

88 points

8 comments4 min readLW link

Behavioural statistics for a maze-solving agent

peligrietzer and TurnTrout

20 Apr 2023 22:26 UTC

46 points

11 comments10 min readLW link

Maze-solving agents: Add a top-right vector, make the agent go to the top-right

TurnTrout, peligrietzer and lisathiergart

31 Mar 2023 19:20 UTC

101 points

17 comments11 min readLW link

Understanding and controlling a maze-solving policy network

TurnTrout, peligrietzer, Ulisse Mini, Monte M and David Udell

11 Mar 2023 18:59 UTC

335 points

28 comments23 min readLW link

Predictions for shard theory mechanistic interpretability results

TurnTrout, Ulisse Mini and peligrietzer

1 Mar 2023 5:16 UTC

105 points

10 comments5 min readLW link

[Simulators seminar sequence] #2 Semiotic physics—revamped

Jan, Charlie Steiner, Logan Riggs, janus, jacquesthibs, metasemi, Michael Oesterle, Lucas Teixeira, peligrietzer and remember

27 Feb 2023 0:25 UTC

22 points

23 comments13 min readLW link

[Simulators seminar sequence] #1 Background & shared assumptions

Jan, Charlie Steiner, Logan Riggs, janus, jacquesthibs, metasemi, Michael Oesterle, Lucas Teixeira, peligrietzer and remember

2 Jan 2023 23:48 UTC

50 points

4 comments3 min readLW link

peligrietzer’s Shortform

peligrietzer1 Dec 2022 0:51 UTC

2 points

4 comments1 min readLW link

A Short Dialogue on the Meaning of Reward Functions

Leon Lang, Quintin Pope and peligrietzer

19 Nov 2022 21:04 UTC

45 points

0 comments3 min readLW link