There also seems to be some theoretical and empirical ML evidence for the perspective of in-context learning as Bayesian inference: http://ai.stanford.edu/blog/understanding-incontext/
Bogdan Ionut Cirstea
Here’s a recent article on the inductive biases of pre-trained LMs and how that affects fine-tuning: https://openreview.net/forum?id=mNtmhaDkAr
This paper links inductive biases of pre-trained [language] models (including some related to simplicity measures like MDL), path dependency and sensitivity to label evidence/noise: https://openreview.net/forum?id=mNtmhaDkAr
This seems related and might be useful to you, especially (when it comes to Natural Abstractions) the section ‘Linking Behavior and Neural Representations’: ‘A mathematical theory of semantic development in deep neural networks’
“The language model works with text. The language model remains the best interface I’ve ever used. It’s user-friendly, composable, and available everywhere. It’s easy to automate and easy to extend.”—Text Is the Universal Interface
Excited to see people thinking about this! Importantly, there’s an entire ML literature out there to get evidence from and ways to [keep] study[ing] this empirically. Some examples of the existing literature (also see Path dependence in ML inductive biases and How likely is deceptive alignment?): Linear Connectivity Reveals Generalization Strategies—on fine-tuning path-dependance, The Grammar-Learning Trajectories of Neural Language Models (and many references in that thread), Let’s Agree to Agree: Neural Networks Share Classification Order on Real Datasets—on pre-training path-dependance. I can probably find many more references through my boorkmarks, if there’s an interest for this.
It might be interesting to think if there could be connections to the framing of corrections in robotics e.g. “No, to the Right” – Online Language Corrections for Robotic Manipulation via Shared Autonomy
More evidence of something like world models in language models: Language models as agent models, Implicit Representations of Meaning in Neural Language Models
It might be useful to have a look at Language models show human-like content effects on reasoning, they empirically test for human-like incoherences / biases in LMs performing some logical reasoning tasks (twitter summary thread; video presentation)
Some relevant literature: Language is more abstract than you think, or, why aren’t languages more iconic?, Meaning without reference in large language models, Grounding the Vector Space of an Octopus: Word Meaning from Raw Text, Understanding models understanding language, Implications of the Convergence of Language and Vision Model Geometries, Shared computational principles for language processing in humans and deep language models.
Valence (and arousal) also seem relatively easy to learn even for current models e.g. The Perceptual Primacy of Feeling: Affectless machine vision models robustly predict human visual arousal, valence, and aesthetics; Quantifying Valence and Arousal in Text with Multilingual Pre-trained Transformers. And abstract concepts like ‘human flourishing’ could be relatively easy to learn even just from text e.g. Language is more abstract than you think, or, why aren’t languages more iconic?; Artificial neural network language models align neurally and behaviorally with humans even after a developmentally realistic amount of training.
Yup, (something like) the human anchor seems surprisingly good as a predictive model when interacting with LLMs. Related, especially for prompting: Large Language Models Are Implicitly Topic Models: Explaining and Finding Good Demonstrations for In-Context Learning; A fine-grained comparison of pragmatic language understanding in humans and language models; Task Ambiguity in Humans and Language Models.
Table 2, page 21 → (above) human-level performance on LeetCode.
Probably not, from the paper: ‘We used LeetCode in Figure 1.5 in the introduction, where GPT-4 passes all stages of mock interviews for major tech companies. Here, to test on fresh questions,
we construct a benchmark of 100 LeetCode problems posted after October 8th, 2022, which is after GPT-4’s pretraining period.’
Linear decoding also works pretty well for others’ beliefs in humans: Single-neuronal predictions of others’ beliefs in humans
It seems to me the the results here that ‘instruction tuning strengthens both the use of semantic priors and the capacity to learn input-label mappings, but more of the former’ could be interpreted as some positive evidence for the optimistic case (and perhaps more broadly, for ‘Do What I Mean’ being not-too-hard); summary twitter thread, see especially tweets 4 and 5
Another reason to expect approximate linearity in deep learning models: point 12 + arguments about approximate (linear) isomorphism between human and artificial representations (e.g. search for ‘isomorph’ in Understanding models understanding language and in Grounding the Vector Space of an Octopus: Word Meaning from Raw Text).
- 13 May 2023 23:04 UTC; 9 points) 's comment on Steering GPT-2-XL by adding an activation vector by (
Related—I’d be excited to see connectome studies on how mice are mechanistically capable of empathy; this (+ computational models) seems like it should be in the window of feasibility given e.g. Towards a Foundation Model of the Mouse Visual Cortex: ‘We applied the foundation model to the MICrONS dataset: a study of the brain that integrates structure with function at unprecedented scale, containing nanometer-scale morphology, connectivity with >500,000,000 synapses, and function of >70,000 neurons within a ∼ 1mm3 volume spanning multiple areas of the mouse visual cortex. This accurate functional model of the MICrONS data opens the possibility for a systematic characterization of the relationship between circuit structure and function.’
The computational part could take inspiration from the large amounts of related work modelling other brain areas (using Deep Learning!), e.g. for a survey/research agenda: The neuroconnectionist research programme.
‘We conjecture that reinforcement strengthens the behavior-steering computations that guide a system into reinforcement events, and that those behavior-steering computations will only form around abstractions already represented inside of a system at the time of reinforcement. We bet that there are a bunch of quantitative relationships here just waiting to be discovered—that there’s a lot of systematic structure in what learned values form given which training variables. To ever get to these quantitative relationships, we’ll need to muck around with language model fine-tuning under different conditions a lot.’ → this could be (somewhat) relevant: https://openreview.net/forum?id=mNtmhaDkAr