RSS

Olli Järviniemi

Karma: 1,863

Homepage: https://​​ollij.fi/​​

Opinions expressed are my own.

Sub­ver­sion via Fo­cal Points: In­ves­ti­gat­ing Col­lu­sion in LLM Monitoring

Olli Järviniemi8 Jul 2025 10:15 UTC
14 points
2 comments1 min readLW link

Mak­ing deals with early schemers

20 Jun 2025 18:21 UTC
127 points
41 comments15 min readLW link

Schel­ling game eval­u­a­tions for AI control

Olli Järviniemi8 Oct 2024 12:01 UTC
71 points
5 comments11 min readLW link

Dist­in­guish worst-case anal­y­sis from in­stru­men­tal train­ing-gaming

5 Sep 2024 19:13 UTC
43 points
0 comments5 min readLW link

Trust­wor­thy and un­trust­wor­thy models

Olli Järviniemi19 Aug 2024 16:27 UTC
47 points
3 comments8 min readLW link

Near-mode think­ing on AI

Olli Järviniemi4 Aug 2024 20:47 UTC
129 points
9 comments5 min readLW link

An ex­per­i­ment on hid­den cognition

Olli Järviniemi22 Jul 2024 3:26 UTC
25 points
2 comments7 min readLW link

Brief notes on the Wikipe­dia game

Olli Järviniemi14 Jul 2024 2:28 UTC
68 points
9 comments4 min readLW link

Dialogue in­tro­duc­tion to Sin­gu­lar Learn­ing Theory

Olli Järviniemi8 Jul 2024 16:58 UTC
108 points
15 comments8 min readLW link

A civ­i­liza­tion ran by amateurs

Olli Järviniemi30 May 2024 17:57 UTC
65 points
8 comments6 min readLW link

Test­ing for par­allel rea­son­ing in LLMs

19 May 2024 15:28 UTC
9 points
7 comments9 min readLW link

Un­cov­er­ing De­cep­tive Ten­den­cies in Lan­guage Models: A Si­mu­lated Com­pany AI Assistant

6 May 2024 7:07 UTC
95 points
13 comments1 min readLW link
(arxiv.org)

On pre­cise out-of-con­text steering

Olli Järviniemi3 May 2024 9:41 UTC
9 points
6 comments3 min readLW link

In­stru­men­tal de­cep­tion and ma­nipu­la­tion in LLMs—a case study

Olli Järviniemi24 Feb 2024 2:07 UTC
39 points
13 comments12 min readLW link

Urg­ing an In­ter­na­tional AI Treaty: An Open Letter

Olli Järviniemi31 Oct 2023 11:26 UTC
48 points
2 comments1 min readLW link
(aitreaty.org)

Olli Järv­iniemi’s Shortform

Olli Järviniemi23 Mar 2023 10:59 UTC
3 points
26 comments1 min readLW link

Take­aways from cal­ibra­tion training

Olli Järviniemi29 Jan 2023 19:09 UTC
45 points
2 comments3 min readLW link1 review