And there are theoretical results suggesting some latent factors can be identified using multimodality (all the following could be intepretable as different modalities—multiple brain recording modalities, animals, sessions, species, brains-ANNs), while being provably unindentifiable without the multiple modality—e.g. Identifiability Results for Multimodal Contrastive Learning (and results on nonlinear ICA in single-modal vs. multi-modal settings reviewed in section 2.1). This might a way to bypass single-model interpretability difficulties, by e.g. ‘comparing’ to brains or to other models.
Contrastive methods could be used both to detect common latent structure across animals, measuring sessions, multiple species (https://twitter.com/LecoqJerome/status/1673870441591750656) and to e.g. look for which parts of an artificial neural network do what a specific brain area does during a task assuming shared inputs (https://twitter.com/BogdanIonutCir2/status/1679563056454549504).
And there are theoretical results suggesting some latent factors can be identified using multimodality (all the following could be intepretable as different modalities—multiple brain recording modalities, animals, sessions, species, brains-ANNs), while being provably unindentifiable without the multiple modality—e.g. Identifiability Results for Multimodal Contrastive Learning (and results on nonlinear ICA in single-modal vs. multi-modal settings reviewed in section 2.1). This might a way to bypass single-model interpretability difficulties, by e.g. ‘comparing’ to brains or to other models.
Example of potential cross-species application: empathy mechanisms seem conserved across species Empathy as a driver of prosocial behaviour: highly conserved neurobehavioural mechanisms across species. Example of brain-ANN applications: ‘matching’ to modular brain networks, e.g. language network—ontology-relevant, non-agentic (e.g. The universal language network: A cross-linguistic investigation spanning 45 languages and 12 language families) or Theory of Mind network—could be very useful for detecting lying-relevant circuits (e.g. Single-neuronal predictions of others’ beliefs in humans).
Examples of related interpretability across models—Cross-GAN Auditing: Unsupervised Identification of Attribute Level Similarities and Differences between Pretrained Generative Models, across brain measurement modalities—Learnable latent embeddings for joint behavioural and neural analysis
, across animals and brain-ANN—Quantifying stimulus-relevant representational drift using cross-modality contrastive learning.
Examples of reasons to expect (approximate) convergence to the same causal world models in various setups: theorem 2 in Robust agents learn causal world models; from Deep de Finetti: Recovering Topic Distributions from Large Language Models: ‘In particular, given the central role of exchangeability in our analysis, this analysis would most naturally be extended to other latent variables that do not depend heavily on word order, such as the author of the document [Andreas, 2022] or the author’s sentiment’ (this assumption might be expected to be approximately true for quite a few alignment-relevant-concepts); results from Victor Veitch: Linear Structure of (Causal) Concepts in Generative AI.