Bogdan Ionut Cirstea

Karma: 628

Bogdan Ionut Cirstea 25 Aug 2022 15:32 UTC
2 points
0
on: The Shard Theory Alignment Scheme
‘We conjecture that reinforcement strengthens the behavior-steering computations that guide a system into reinforcement events, and that those behavior-steering computations will only form around abstractions already represented inside of a system at the time of reinforcement. We bet that there are a bunch of quantitative relationships here just waiting to be discovered—that there’s a lot of systematic structure in what learned values form given which training variables. To ever get to these quantitative relationships, we’ll need to muck around with language model fine-tuning under different conditions a lot.’ → this could be (somewhat) relevant: https://openreview.net/forum?id=mNtmhaDkAr

Bogdan Ionut Cirstea 8 Sep 2022 16:12 UTC
5 points
3
on: Simulators
There also seems to be some theoretical and empirical ML evidence for the perspective of in-context learning as Bayesian inference: http://ai.stanford.edu/blog/understanding-incontext/

Bogdan Ionut Cirstea 8 Sep 2022 16:39 UTC
1 point
0
on: Conditioning, Prompts, and Fine-Tuning
Here’s a recent article on the inductive biases of pre-trained LMs and how that affects fine-tuning: https://openreview.net/forum?id=mNtmhaDkAr

Bogdan Ionut Cirstea 18 Sep 2022 11:03 UTC
3 points
2
on: Path dependence in ML inductive biases
This paper links inductive biases of pre-trained [language] models (including some related to simplicity measures like MDL), path dependency and sensitivity to label evidence/noise: https://openreview.net/forum?id=mNtmhaDkAr

Bogdan Ionut Cirstea 26 Dec 2022 13:23 UTC
8 points
0
on: [Hebbian Natural Abstractions] Mathematical Foundations
This seems related and might be useful to you, especially (when it comes to Natural Abstractions) the section ‘Linking Behavior and Neural Representations’: ‘A mathematical theory of semantic development in deep neural networks’

Bogdan Ionut Cirstea 3 Jan 2023 23:34 UTC
3 points
on: Alignment as Translation
“The language model works with text. The language model remains the best interface I’ve ever used. It’s user-friendly, composable, and available everywhere. It’s easy to automate and easy to extend.”—Text Is the Universal Interface

Bogdan Ionut Cirstea 16 Jan 2023 11:40 UTC
LW: 6 AF: 1
1
AF
on: Speculation on Path-Dependance in Large Language Models.
Excited to see people thinking about this! Importantly, there’s an entire ML literature out there to get evidence from and ways to [keep] study[ing] this empirically. Some examples of the existing literature (also see Path dependence in ML inductive biases and How likely is deceptive alignment?): Linear Connectivity Reveals Generalization Strategies—on fine-tuning path-dependance, The Grammar-Learning Trajectories of Neural Language Models (and many references in that thread), Let’s Agree to Agree: Neural Networks Share Classification Order on Real Datasets—on pre-training path-dependance. I can probably find many more references through my boorkmarks, if there’s an interest for this.

Bogdan Ionut Cirstea 17 Jan 2023 21:30 UTC
2 points
0
on: Experiment Idea: RL Agents Evading Learned Shutdownability
It might be interesting to think if there could be connections to the framing of corrections in robotics e.g. “No, to the Right” – Online Language Corrections for Robotic Manipulation via Shared Autonomy

Bogdan Ionut Cirstea 23 Jan 2023 10:28 UTC
4 points
0
on: Large language models learn to represent the world
More evidence of something like world models in language models: Language models as agent models, Implicit Representations of Meaning in Neural Language Models

Bogdan Ionut Cirstea 26 Jan 2023 19:11 UTC
3 points
1
in reply to: Kaarel’s comment on: How “Discovering Latent Knowledge in Language Models Without Supervision” Fits Into a Broader Alignment Scheme
It might be useful to have a look at Language models show human-like content effects on reasoning, they empirically test for human-like incoherences / biases in LMs performing some logical reasoning tasks (twitter summary thread; video presentation)

Bogdan Ionut Cirstea 12 Mar 2023 11:14 UTC
4 points
0
on: The issue of meaning in large language models (LLMs)
Some relevant literature: Language is more abstract than you think, or, why aren’t languages more iconic?, Meaning without reference in large language models, Grounding the Vector Space of an Octopus: Word Meaning from Raw Text, Understanding models understanding language, Implications of the Convergence of Language and Vision Model Geometries, Shared computational principles for language processing in humans and deep language models.

Bogdan Ionut Cirstea 14 Mar 2023 0:14 UTC
16 points
0
on: Plan for mediocre alignment of brain-like [model-based RL] AGI
Valence (and arousal) also seem relatively easy to learn even for current models e.g. The Perceptual Primacy of Feeling: Affectless machine vision models robustly predict human visual arousal, valence, and aesthetics; Quantifying Valence and Arousal in Text with Multilingual Pre-trained Transformers. And abstract concepts like ‘human flourishing’ could be relatively easy to learn even just from text e.g. Language is more abstract than you think, or, why aren’t languages more iconic?; Artificial neural network language models align neurally and behaviorally with humans even after a developmentally realistic amount of training.

Bogdan Ionut Cirstea 17 Mar 2023 1:22 UTC
1 point
0
on: Want to predict/explain/control the output of GPT-4? Then learn about the world, not about transformers.
Yup, (something like) the human anchor seems surprisingly good as a predictive model when interacting with LLMs. Related, especially for prompting: Large Language Models Are Implicitly Topic Models: Explaining and Finding Good Demonstrations for In-Context Learning; A fine-grained comparison of pragmatic language understanding in humans and language models; Task Ambiguity in Humans and Language Models.

Bogdan Ionut Cirstea 23 Mar 2023 7:34 UTC
7 points
1
on: Sparks of Artificial General Intelligence: Early experiments with GPT-4 | Microsoft Research
Table 2, page 21 → (above) human-level performance on LeetCode.

Bogdan Ionut Cirstea 23 Mar 2023 11:15 UTC
15 points
11
in reply to: Qumeric’s comment on: Sparks of Artificial General Intelligence: Early experiments with GPT-4 | Microsoft Research
Probably not, from the paper: ‘We used LeetCode in Figure 1.5 in the introduction, where GPT-4 passes all stages of mock interviews for major tech companies. Here, to test on fresh questions,
we construct a benchmark of 100 LeetCode problems posted after October 8th, 2022, which is after GPT-4’s pretraining period.’

Bogdan Ionut Cirstea 28 Mar 2023 12:41 UTC
LW: 3 AF: 1
0
AF
on: Lessons from Convergent Evolution for AI Alignment
Partial convergence between language models and brains and evolutionary analogy

Bogdan Ionut Cirstea 24 Apr 2023 19:39 UTC
4 points
0
on: Deep learning models might be secretly (almost) linear
Linear decoding also works pretty well for others’ beliefs in humans: Single-neuronal predictions of others’ beliefs in humans

Bogdan Ionut Cirstea 29 Apr 2023 18:08 UTC
1 point
on: Experimentally evaluating whether honesty generalizes
It seems to me the the results here that ‘instruction tuning strengthens both the use of semantic priors and the capacity to learn input-label mappings, but more of the former’ could be interpreted as some positive evidence for the optimistic case (and perhaps more broadly, for ‘Do What I Mean’ being not-too-hard); summary twitter thread, see especially tweets 4 and 5

Bogdan Ionut Cirstea 30 Apr 2023 22:01 UTC
2 points
0
on: Deep learning models might be secretly (almost) linear
Another reason to expect approximate linearity in deep learning models: point 12 + arguments about approximate (linear) isomorphism between human and artificial representations (e.g. search for ‘isomorph’ in Understanding models understanding language and in Grounding the Vector Space of an Octopus: Word Meaning from Raw Text).
What links here?
- Bogdan Ionut Cirstea's comment on Steering GPT-2-XL by adding an activation vector by TurnTrout (13 May 2023 23:04 UTC; 9 points)

Bogdan Ionut Cirstea 1 May 2023 9:52 UTC
LW: 6 AF: 3
0
AF
on: Connectomics seems great from an AI x-risk perspective
Related—I’d be excited to see connectome studies on how mice are mechanistically capable of empathy; this (+ computational models) seems like it should be in the window of feasibility given e.g. Towards a Foundation Model of the Mouse Visual Cortex: ‘We applied the foundation model to the MICrONS dataset: a study of the brain that integrates structure with function at unprecedented scale, containing nanometer-scale morphology, connectivity with >500,000,000 synapses, and function of >70,000 neurons within a ∼ 1mm³ volume spanning multiple areas of the mouse visual cortex. This accurate functional model of the MICrONS data opens the possibility for a systematic characterization of the relationship between circuit structure and function.’
The computational part could take inspiration from the large amounts of related work modelling other brain areas (using Deep Learning!), e.g. for a survey/research agenda: The neuroconnectionist research programme.