Table 2, page 21 → (above) human-level performance on LeetCode.
Bogdan Ionut Cirstea
Yup, (something like) the human anchor seems surprisingly good as a predictive model when interacting with LLMs. Related, especially for prompting: Large Language Models Are Implicitly Topic Models: Explaining and Finding Good Demonstrations for In-Context Learning; A fine-grained comparison of pragmatic language understanding in humans and language models; Task Ambiguity in Humans and Language Models.
Valence (and arousal) also seem relatively easy to learn even for current models e.g. The Perceptual Primacy of Feeling: Affectless machine vision models robustly predict human visual arousal, valence, and aesthetics; Quantifying Valence and Arousal in Text with Multilingual Pre-trained Transformers. And abstract concepts like ‘human flourishing’ could be relatively easy to learn even just from text e.g. Language is more abstract than you think, or, why aren’t languages more iconic?; Artificial neural network language models align neurally and behaviorally with humans even after a developmentally realistic amount of training.
Some relevant literature: Language is more abstract than you think, or, why aren’t languages more iconic?, Meaning without reference in large language models, Grounding the Vector Space of an Octopus: Word Meaning from Raw Text, Understanding models understanding language, Implications of the Convergence of Language and Vision Model Geometries, Shared computational principles for language processing in humans and deep language models.
Recent works from Anders Søgaard might be relevant, e.g. Grounding the Vector Space of an Octopus: Word Meaning from Raw Text, Understanding models understanding language, Implications of the Convergence of Language and Vision Model Geometries.
E.g. from Grounding the Vector Space of an Octopus: Word Meaning from Raw Text on the success of unsupervised machine translation (and more):
’Consider, for example, the fact that unsupervised machine translation is possible (Lample et al., 2018a, b; Park et al., 2021). Unsupervised machine translation works by first aligning vector spaces induced by monolingual language models in the source and target languages (Søgaard et al., 2019). This is possible because such vector spaces are often near-isomorphic (Vulic et al., 2020). If weak supervision is available, we can use techniques such as Procrustes Analysis (Gower, 1975) or Iterative Closest Point (Besl & McKay, 1992), but aligments can be obtained in the absence of any supervision using adversarial learning (Li et al., 2019; Søgaard et al., 2019) or distributional evidence alone. If the vector spaces induced by language models exhibit high degrees of isomorphism to the physical world or human perceptions thereof, we have reason to think that similar techniques could provide us with sufficient grounding in the absence of supervision.
Unsupervised machine translation show that language model representations of different vocabularies of different languages are often isomorphic. Some researchers have also explored cross-modality alignment: (Chung et al., 2018) showed that unsupervised alignment of speech and written language is possible using the same techniques, for example. This also suggests unsupervised grounding should be possible.
Is there any direct evidence that language model vector spaces are isomorphic to (representations of) the physical world? There is certainly evidence that language models learn isomorphic representations of parts of vocabularies. Abdou et al. (2021), for example, present evidence that language models encode color in a way that is near-isomorphic to conceptual models of how color is perceived, in spite of known reporting biases (Paik et al., 2021). Patel and Pavlick (2022) present similar results for color terms and directionals. Liétard et al. (2021) show that the larger models are, the more isomorphic their representations of geographical place names are to maps of their physical location.′
‘Unsupervised machine translation and unsupervised bilingual dictionary induction are evaluated over the full vocabulary, often with more than 85% precision. This indicates language models learn to represent concepts in ways that are not very language-specific. There is also evidence for near-isomorphisms with brain activity, across less constrained subsets of the vocabulary: (Wu et al., 2021), for example, show how brain activity patterns of individual words are encoded in a way that facilitates analogical reasoning. Such a property would in the limit entail that brain encodings are isomorphic to language model representations (Peng et al., 2020). Other research articles that seem to suggest that language model representations are generally isomorphic to brain activity patterns include (Mitchell et al., 2008; Søgaard, 2016; Wehbe et al., 2014; Pereira et al., 2018; Gauthier & Levy, 2019; Caucheteux & King, 2022).’
I’ll probably write more about this soon.
It might be useful to have a look at Language models show human-like content effects on reasoning, they empirically test for human-like incoherences / biases in LMs performing some logical reasoning tasks (twitter summary thread; video presentation)
More evidence of something like world models in language models: Language models as agent models, Implicit Representations of Meaning in Neural Language Models
It might be interesting to think if there could be connections to the framing of corrections in robotics e.g. “No, to the Right” – Online Language Corrections for Robotic Manipulation via Shared Autonomy
Excited to see people thinking about this! Importantly, there’s an entire ML literature out there to get evidence from and ways to [keep] study[ing] this empirically. Some examples of the existing literature (also see Path dependence in ML inductive biases and How likely is deceptive alignment?): Linear Connectivity Reveals Generalization Strategies—on fine-tuning path-dependance, The Grammar-Learning Trajectories of Neural Language Models (and many references in that thread), Let’s Agree to Agree: Neural Networks Share Classification Order on Real Datasets—on pre-training path-dependance. I can probably find many more references through my boorkmarks, if there’s an interest for this.
“The language model works with text. The language model remains the best interface I’ve ever used. It’s user-friendly, composable, and available everywhere. It’s easy to automate and easy to extend.”—Text Is the Universal Interface
This seems related and might be useful to you, especially (when it comes to Natural Abstractions) the section ‘Linking Behavior and Neural Representations’: ‘A mathematical theory of semantic development in deep neural networks’
This paper links inductive biases of pre-trained [language] models (including some related to simplicity measures like MDL), path dependency and sensitivity to label evidence/noise: https://openreview.net/forum?id=mNtmhaDkAr
Here’s a recent article on the inductive biases of pre-trained LMs and how that affects fine-tuning: https://openreview.net/forum?id=mNtmhaDkAr
There also seems to be some theoretical and empirical ML evidence for the perspective of in-context learning as Bayesian inference: http://ai.stanford.edu/blog/understanding-incontext/
‘We conjecture that reinforcement strengthens the behavior-steering computations that guide a system into reinforcement events, and that those behavior-steering computations will only form around abstractions already represented inside of a system at the time of reinforcement. We bet that there are a bunch of quantitative relationships here just waiting to be discovered—that there’s a lot of systematic structure in what learned values form given which training variables. To ever get to these quantitative relationships, we’ll need to muck around with language model fine-tuning under different conditions a lot.’ → this could be (somewhat) relevant: https://openreview.net/forum?id=mNtmhaDkAr
Should work now, I had inadvertently added ‘:’ as part of the link
From https://mailchi.mp/b3dc916ac7e2/an-80-why-ai-risk-might-be-solved-without-additional-intervention-from-longtermists : ‘On the point about lumpiness, my model is that there are only a few underlying factors (such as the ability to process culture) that allow humans to so quickly learn to do so many tasks, and almost all tasks require near-human levels of these factors to be done well. So, once AI capabilities on these factors reach approximately human level, we will “suddenly” start to see AIs beating humans on many tasks, resulting in a “lumpy” increase on the metric of “number of tasks on which AI is superhuman” (which seems to be the metric that people often use, though I don’t like it, precisely because it seems like it wouldn’t measure progress well until AI becomes near-human-level).’
Probably not, from the paper: ‘We used LeetCode in Figure 1.5 in the introduction, where GPT-4 passes all stages of mock interviews for major tech companies. Here, to test on fresh questions,
we construct a benchmark of 100 LeetCode problems posted after October 8th, 2022, which is after GPT-4’s pretraining period.’