RSS

Olli Järviniemi

Karma: 643

Un­cov­er­ing De­cep­tive Ten­den­cies in Lan­guage Models: A Si­mu­lated Com­pany AI Assistant

6 May 2024 7:07 UTC
82 points
4 comments1 min readLW link
(arxiv.org)

On pre­cise out-of-con­text steering

Olli Järviniemi3 May 2024 9:41 UTC
7 points
6 comments3 min readLW link