RSS

Moksh Nirvaan

Karma: 42

Helping Friends, Harm­ing Foes: Test­ing Trib­al­ism in Lan­guage Models

11 Mar 2026 12:06 UTC
10 points
0 comments9 min readLW link

A Be­havi­oural and Rep­re­sen­ta­tional Eval­u­a­tion of Goal-di­rect­ed­ness in Lan­guage Model Agents

5 Mar 2026 1:08 UTC
20 points
0 comments7 min readLW link

Model­ling, Mea­sur­ing, and In­ter­ven­ing on Goal-di­rected Be­havi­our in AI Systems

31 Oct 2025 1:28 UTC
15 points
0 comments8 min readLW link

Prob­ing Power-Seek­ing in LLMs

Moksh Nirvaan13 Aug 2025 16:04 UTC
8 points
0 comments12 min readLW link

Will AGI Emerge Through Self-Gen­er­ated Re­ward Loops?

Moksh Nirvaan30 Jul 2025 13:17 UTC
5 points
0 comments1 min readLW link