When chaining parallel and sequential calls to large language models (like LangChain), you implicitly create a causal graph that can be analyzed visually if you have the right tracing tools (https://github.com/oughtinc/ice). This notebook describes different agents using an explicit formalism based on causal influence diagrams, which we can treat as a notation for describing the data flow, components and steps involved when a user makes a request. We use the example diagrams to explain and fix risk scenarios, showing how easy it is to debug agent architectures if you can visually reason about the data flow, and ask questions about intent alignment for AGI in the context of such agents.
Examples and Theory in Colab to Get Started:
Work done at the Alignment Jam #8 (Verification), starts at 31:43 but the whole event was great: https://youtu.be/XauqlTQm-o4
When chaining parallel and sequential calls to large language models (like LangChain), you implicitly create a causal graph that can be analyzed visually if you have the right tracing tools (https://github.com/oughtinc/ice). This notebook describes different agents using an explicit formalism based on causal influence diagrams, which we can treat as a notation for describing the data flow, components and steps involved when a user makes a request. We use the example diagrams to explain and fix risk scenarios, showing how easy it is to debug agent architectures if you can visually reason about the data flow, and ask questions about intent alignment for AGI in the context of such agents.
Examples and Theory in Colab to Get Started:
Work done at the Alignment Jam #8 (Verification), starts at 31:43 but the whole event was great: https://youtu.be/XauqlTQm-o4
Paper: https://docs.google.com/document/d/160Yw_iuvztB6CTT9Osj5wC0sOrEKjfaGkkeeYuwQf4Y/edit#heading=h.kwtox8r6b7n6
https://colab.research.google.com/drive/1roLQgXhEtI83Q5vX1q24Q9iDgu5LFFWA#scrollTo=5b8fbbeb-e90b-4990-ac3e-d484205b78aa
TODO:
Mechanistic Interpretability: Info-Weighted Attention mechanisms, Info-weighted Averaging (https://youtu.be/etFCaFvt2Ks)
[viz] Animating the temporal dependence if we have timestamps of each sub-agent process starting—should add this to tracing code
[Theory] Study links to Garrabrant’s Temporal Inference with Finite Factored Sets: https://arxiv.org/abs/2109.11513