RSS

SoerenMind

Karma: 1,169

How to Catch an AI Liar: Lie De­tec­tion in Black-Box LLMs by Ask­ing Un­re­lated Questions

28 Sep 2023 18:53 UTC
183 points
37 comments3 min readLW link

Wikipe­dia as an in­tro­duc­tion to the al­ign­ment problem

SoerenMind29 May 2023 18:43 UTC
83 points
10 comments1 min readLW link
(en.wikipedia.org)

The Align­ment Prob­lem from a Deep Learn­ing Per­spec­tive (ma­jor rewrite)

10 Jan 2023 16:06 UTC
83 points
8 comments39 min readLW link
(arxiv.org)