mesaoptimizer comments on Projects I would like to see (possibly at AI Safety Camp)

mesaoptimizer 3 Oct 2023 7:57 UTC
3 points
1
Here is one output I want to see: a succinct, concrete fully specified (ideally formalized) argument for uncontrollabilty that involves a simple setup and demonstrates Remmelt’s reasons for uncontrollability of superhuman AGI.

That should make the argument easier to communicate and evaluate.
- Remmelt 30 Dec 2023 8:33 UTC
  1 point
  0
  Parent
  Still wanted to say:
  
  I appreciate the spirit of this comment.
  
  There are trade-offs here.
  
  If it’s simple or concrete like a toy model, then it is not fully specified. If it is fully specified, then the inferential distance of going through the reasoning steps is large (people get overwhelmed and they opt out).
  
  If it’s formalised, then people need to understand the formal language. Look back at Gödel’s incompleteness theorems, which involved creating a new language and describing a mathematical world people were not familiar with. Actually reading through the original paper would have been a toil for most mathematicians.
  
  There are further bottlenecks, which I won’t get into.
  
  For now, I suggest people who care to understand (because everything we care about is at stake) to read this summary post: https://www.lesswrong.com/posts/xp6n2MG5vQkPpFEBH/the-control-problem-unsolved-or-unsolvable
  
  Anders Sandberg also had an insightful conversation with my research mentor about fundamental controllability limits. I guess the transcript will be posted on this forum somewhere next month.
  
  Again, it’s not simple.
  
  For the effort it takes you to read through and understand parts, please recognise it took a couple of orders of magnitude more effort for me and my collaborators to convey the arguments in a more intuitive digestible form.
  - mesaoptimizer 30 Dec 2023 10:21 UTC
    5 points
    1
    Parent
    I now agree with your sentiment here and don’t think my request when I made that comment was very sensible. It does seem like going from an informal / not-fully-specified argument to a fully specified argument is extremely difficult and unlikely to be worth the effort in convincing people who would already be convinced by extremely sensible but not fully formalized arguments.
    
    It does seem to me that a toy model that is not fully specified is still a big deal when it comes to progress in communicating what is going on though.
    
    I might look into the linked post again in more detail and seriously consider it and wrap my head around it. Thanks for this comment.
    - Remmelt 25 Jan 2024 13:40 UTC
      1 point
      0
      Parent
      Thanks for coming back on this. Just saw your comment, and am agreeing with your thoughtful points.
      
      Let me also DM you the edited transcript of the conversation with Anders Sandberg.