ELK shaving

> Paul Christiano's incredibly complicated schemes have no chance of working in real life before DeepMind destroys the world.
> Eliezer in Death With Dignity

Eliciting Latent Knowledge reads to me as an incredibly narrow slice in reasoning space, a hyperbolically branching philosophical rabbit hole of caveats.

For example, this paragraph on page 7

translates as:

If you can ask whether AI is saying the truth and it answers “no” then you know it is lying.

But how is trusting this answer different from just trusting AI to not deceive you as a whole?


A hundred pages of an elaborate system with competing actors playing games of causal diagrams trying to solve for the worst case is exciting ✨ precisely because it allows one to “make progress” and have incredibly nuanced discussions (ELK shaving [1]) while failing to address the core AI safety concern:

if AI is sufficiently smart, it can do absolutely whatever

– fundamentally ignoring whatever clever constraints one might come up with.

I am confused why people might be “very optimistic” about ELK, I hope I am wrong.

  1. ^“Yak shaving” means performing a seemingly endless series of small tasks that must be completed before the next step in the project can move forward. Elks are kinda like yaks