TurnTrout comments on Looking back on my alignment PhD

TurnTrout 15 Aug 2022 2:15 UTC
2 points
0
Here are some more.
- “How surprised would I be if I learned I had just asked a Wrong Question and framed everything incorrectly?”
- “Is the thing I’m trying to do (e.g. understand the type signature of human values) actually impossible? What evidence have I seen which discriminates between worlds where it is impossible, and worlds where it isn’t?”
  - (This is more applicable to other kinds of questions; I think it would be quite ridiculous for it to be literally impossible to understand the type signature of human values.)
- Query my models of smart people (for this purpose, I have reasonably good models of e.g. John Wentworth, Eliezer Yudkowsky, and Quintin Pope)
- Pretend to be a smarmy asshole who’s explaining why TurnTrout can’t possibly understand the type signature of human values, and just visualize the smirk on their face as they drip condescension onto me, and see if some part of me responds “Oh yeah, well what about [actually good insight X]?!”