Mateusz Bagiński comments on Daniel Tan’s Shortform

Mateusz Bagiński 30 Dec 2024 21:01 UTC
2 points
0
Something like “We have mapped out the possible human-understandable or algorithmically neat descriptions of the network’s behavior sufficiently comprehensively and sampled from this space sufficiently comprehensively to know that the probability that there’s a description of its behavior that is meaningfully shorter than the shortest one of the ones that we’ve found is at most $ϵ$ .”.
- Daniel Tan 31 Dec 2024 4:42 UTC
  1 point
  0
  Parent
  Yeah, seems hard
  
  I’m not convinced that you can satisfy either of those “sufficiently comprehensively” such that you’d be comfortable arguing your model is not somehow interpretable
  - Mateusz Bagiński 31 Dec 2024 6:26 UTC
    2 points
    1
    Parent
    I’m not claiming it’s feasible (within decades). That’s just what a solution might look like.