davidad comments on Towards Hodge-podge Alignment

davidad 19 Dec 2022 23:56 UTC
10 points
9
I want to voice my strong support for attempts to define something like dependent type signatures for alignment-relevant components and use wiring diagrams and/or string diagrams (in some kind of double-categorical systems theory, such as David Jaz Myers’) to combine them into larger AI systems proposals. I also like the flowchart. I’m less excited about swiss-cheese security, but I think assemblages are also on the critical path for stronger guarantees.
- Cleo Nardo 20 Dec 2022 0:19 UTC
  4 points
  3
  Parent
  Okay, I think I’ll write an alignment-relevent distillation of Myers’ book.
  
  It might be useful for thinking about embedded agency — and especially for the problem that Scott Garrabrant has been grappling recently with his Cartesian Frames, i.e. how do we formalise world-states that admit multiple distinct decompositions into environment and agents.
  - davidad 20 Dec 2022 0:29 UTC
    7 points
    5
    Parent
    I think it will also prove useful for world-modeling even with a naïve POMDP-style Cartesian boundary between the modeler and the environment, since the environment is itself generally well-modeled by a decomposition into locally stateful entities that interact in locally scoped ways (often restricted by naturally occurring boundaries).
    What links here?
    Chris Lakin's comment on «Boundaries/Membranes» and AI safety compilation by Chris Lakin (3 May 2023 22:09 UTC; 1 point)