Olli Järviniemi

Karma: 645

Takeaways from calibration training

Olli Järviniemi29 Jan 2023 19:09 UTC

38 points

1 comment3 min readLW link

Language models are not inherently safe

Olli Järviniemi7 Mar 2023 21:15 UTC

11 points

1 comment3 min readLW link

Urging an International AI Treaty: An Open Letter

Olli Järviniemi31 Oct 2023 11:26 UTC

48 points

2 comments1 min readLW link

(aitreaty.org)

Instrumental deception and manipulation in LLMs—a case study

Olli Järviniemi24 Feb 2024 2:07 UTC

39 points

13 comments12 min readLW link

On precise out-of-context steering

Olli Järviniemi3 May 2024 9:41 UTC

7 points

6 comments3 min readLW link

Uncovering Deceptive Tendencies in Language Models: A Simulated Company AI Assistant

Olli Järviniemi and evhub

6 May 2024 7:07 UTC

85 points

6 comments1 min readLW link

(arxiv.org)