Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Olli Järviniemi
Karma:
645
All
Posts
Comments
New
Top
Old
Takeaways from calibration training
Olli Järviniemi
29 Jan 2023 19:09 UTC
38
points
1
comment
3
min read
LW
link
Language models are not inherently safe
Olli Järviniemi
7 Mar 2023 21:15 UTC
11
points
1
comment
3
min read
LW
link
Urging an International AI Treaty: An Open Letter
Olli Järviniemi
31 Oct 2023 11:26 UTC
48
points
2
comments
1
min read
LW
link
(aitreaty.org)
Instrumental deception and manipulation in LLMs—a case study
Olli Järviniemi
24 Feb 2024 2:07 UTC
39
points
13
comments
12
min read
LW
link
On precise out-of-context steering
Olli Järviniemi
3 May 2024 9:41 UTC
7
points
6
comments
3
min read
LW
link
Uncovering Deceptive Tendencies in Language Models: A Simulated Company AI Assistant
Olli Järviniemi
and
evhub
6 May 2024 7:07 UTC
85
points
6
comments
1
min read
LW
link
(arxiv.org)
Back to top