What a great read. Best of luck with this project. It sounds compelling.
Ben Smith
Seems to me that in this case, the two are connected. If I falsely believed my group was in the minority, I might refrain from clicking the button out of a sense of fairness or deference to the majority group.
Consequently, the lie not only influenced people who clicked the button, it perhaps also influenced people who did not. So due to the false premise on which the second survey was based, it should be disregarded altogether. To not disregard would be to have obtained by fraud or trickery a result that is disadvantageous to all the majority group members who chose not to click, falsely believing their view was a minority.
I think, morally speaking, avoiding disadvantaging participants through fraud is more important than honoring your word to their competitors.
The key difference between this and the example is that there’s a connection between the lie and the promise.
Differentiating intelligence and agency seems hugely clarifying for many discussions in alignment.
You might have noticed I didn’t actually fully differentiate intelligence and agency. It seems to me to exert agency a mind needs a certain amount of intelligence, and so I think all agents are intelligent, though not all intelligences are agentic. Agents that are minimally intelligent (like simple RL agents in simple computer models) also are pretty minimally agentic. I’d be curious to hear about a counter-example.
Incidentally I also like Anil Seth’s work and I liked his recent book on consciousness, apart from the bit about AGI. I read it right along with Damasio’s latest book on consciousness and they paired pretty well. Seth is a bit more concrete and detail oriented and I appreciated that.
It would make it much easier to understand ideas in this area if writers used more conceptual clarity, particularly empirical consciousness researchers (philosophers can be a bit better, I think, and I say that as an empirical researcher myself). When I read that quote from Seth, it seems clear he was arguing AGI is unlikely to be an existential threat because it’s unlikely to be conscious. Does he naively conflate consciousness with agency, because he’s not an artificial agency researcher and hasn’t thought much about it? Or does he have a sophisticated point of view about how agency and consciousness really are linked, based on his ~~couple decades of consciousness research? Seems very unlikely, given how much we know about artificial agents, but the only way to be clear is to ask him.
Similarly MANY people including empirical researchers and maybe philosophers treat consciousness and self-awareness as somewhat synonymous, or at least interdependent. Is that because they’re being naive about the link, or because, as outlined in Clark, Friston, & WIlkinson’s Bayesing Qualia, they have sophisticated theories based on evidence that there really are tight links between the two? I think when writing this post I was pretty sure consciousness and self-awareness were “orthogonal”/independent, and now, following other discussion in the comments here and on Facebook, I’m less clear about that. But I’d like more people do what Friston did as he explained exactly why he thinks consciousness arises from self-awareness/meta-cognition.
I found the Clark et al. (2019) “Bayesing Qualia” article very useful, and that did give me an intuition of the account that perhaps sentience arises out of self-awareness. But they themselves acknowledged in their conclusion that the paper didn’t quite demonstrate that principle, and I didn’t find myself convinced of it.
Perhaps what I’d like readers to take away is that sentience and self-awareness can be at the very least conceptually distinguished. Even if it isn’t clear empirically whether or not they are intrinsically linked, we ought to maintain a conceptual distinction in order to form testable hypotheses about whether they are in fact linked, and in order to reason about the nature of any link. Perhaps I should call that “Theoretical orthogonality”. This is important to be able to reason whether, for instance, giving our AIs a self-awareness or situational awareness will cause them to be sentient. I do not think that will be the case, although I do think that, if you gave them the sort of detailed self-monitoring feelings that humans have, that may yield sentience itself. But it’s not clear!
I listened to the whole episode with Bach as a result of your recommendation! Bach hardly even got a chance to express his ideas, and I’m not much closer to understanding his account of
meta-awareness (i.e., awareness of awareness) within the model of oneself which acts as a ‘first-person character’ in the movie/dream/”controlled hallucination” that the human brain constantly generates for oneself is the key thing that also compels the brain to attach qualia (experiences) to the model. In other words, the “character within the movie” thinks that it feels something because it has meta-awareness (i.e., the character is aware that it is aware (which reflects the actual meta-cognition in the brain, rather than in the brain, insofar the character is a faithful model of reality).
which seems like a crux here.
He sort of briefly described “consciousness as a dream state” at the very end, but although I did get the sense that maybe he thinks meta-awareness and sentience are connected, I didn’t really hear a great argument for that point of view.
He spent several minutes arguing that agency, or seeking a utility function, is something humans have, but that these things aren’t sufficient for consciousness (I don’t remember whether he said whether they were necessary, so I suppose we don’t know if he thinks they’re orthogonal).
I wanted to write myself about a popular confusion between decision making, consciousness, and intelligence which among other things leads to bad AI alignment takes and mediocre philosophy.
This post has not got a lot of attention, so if you write your own post, perhaps the topic will have another shot at reaching popular consciousness (heh), and if you succeed, I might try to learn something about how you did it and this post did not!
I wasn’t thinking that it’s possible to separate qualia perception and self awareness
Separating qualia and self-awareness is a controversial assertion and it seems to me people have some strong contradictory intuitions about it!
I don’t think, in the experience of perceiving red, there necessarily is any conscious awareness of oneself—in that moment there is just the qualia of redness. I can imagine two possible objections: (a) perhaps there is some kind of implicit awareness of self in that moment that enables the conscious awareness of red, or (b) perhaps it’s only possible to have that experience of red within a perceptual framework where one has perceived onesself. But personally I don’t find either of those accounts persuading.
I think flow states are also moments where one’s awareness can be so focused on the activity one is engaged in that one momentarily loses any awareness of one’s own self.
there is no intersection between sentience and intelligence that is not self-awarness.
I should have defined intelligence in the post—perhaps i”ll edit. The only concrete and clear definition of intelligence I’m aware of is psychology’s g factor, which is something like the ability to recognize patterns and draw inferences from them. That is what I mean—no more than that.
A mind that is sentient and intelligent but not self aware might look like this: when a computer programmer is deep in the flow state of bringing a function in their head into code on the screen, they may experience moments of time where they have sentient awareness of their work, and certainly are using intelligence to transform their ideas into code, but do not in those particular moments have any awareness of self.
Thank you for the link to the Friston paper. I’m reading that and will watch Lex Fridman’s interview with Joscha Bach, too. I sort of think “illusionism” is a bit too strong, but perhaps it’s a misnomer rather than wrong (or I could be wrong altogether). Clark, Friston, and Wilkinson say
But in what follows we aim not to Quine (explain away) qualia but to ‘Bayes’ them – to reveal them as products of a broadly speaking rational process of inference, of the kind imagined by the Reverend Bayes in his (1763) treatise on how to form and update beliefs on the basis of new evidence. Our story thus aims to occupy the somewhat elusive ‘revisionary’ space, in between full strength ‘illusionism’ (see below) and out-and-out realism
and I think somewhere in the middle sounds more plausible to me.
Anyhow, I’ll read the paper first before I try to respond more substantively to your remarks, but I intend to!
The intelligence-sentience orthogonality thesis
great post, two points of disagreement that are worth mentioning
Exploring the full ability of dogs and cats to communicate isn’t so much impractical to do in academia; it just isn’t very theoretically interesting. We know animals can do operant conditioning (we’ve known for over 100 years probably), but we also know they struggle with complex syntax. I guess there’s a lot of uncertainty in the middle, so I’m low confidence about this. But generally to publish a high impact paper about dog or cat communication you’d have to show they can do more than “conditioning”, that they understand syntax in some way. That’s probably pretty hard; maybe you can do it, but do you want to stake your career on it?
That brings me to my second point...is it more than operant conditioning? Some of the videos show the animals pressing multiple buttons. But Billy the Cat’s videos show his trainer teaching his button sequences. I’m not a language expert, but to demonstrate syntax understanding, you have to do more than show he can learn sequences of button presses he was taught verbatim. At a minimum there’d need to be evidence he can form novel sentences by combining buttons in apparently-intentional ways that could only be put together by generalizing from some syntax rules. Maaaybe @Adele Lopez ’s observation that Bunny seems to reverse her owner’s word order might be appropriate evidence. But if she’s been reinforced for her own arbitrarily chosen word order in the past, she might develop it without really appreciating rules of syntax per se. In fact, a hallmark of learning language is that you can learn syntax correctly.
There’s not just acceptance at stake here. Medical insurance companies are not typically going to buy into a responsibility to support clients’ morphological freedom, as if medically transitioning is in the same class of thing as a cis person getting a facelift
woman getting a boob job, because it is near-universally understood this is an “elective” medical procedure. But if their clients have a “condition” that requires “treatment”, well, now insurers are on the hook to pay.A lot of mental health treatment works the same way imho—people have various psychological states, many of which get inappropriately shoehorned into a pathology or illness narrative in order to get the insurance companies to pay.
All this adds a political dimension to the not inconsiderable politics of social acceptance.
I guess this falls into the category of “Well, we’ll deal with that problem when it comes up”, but I’d imagine when a human preference in a particular dilemma is undefined or even just highly uncertain, one can often defer to other rules like—rather than maximize an uncertain preference, default to maximizing the human’s agency, in scenarios where preference is unclear, even if this predictably leads to less-than-optimal preference satisfaction.
I think your point is interesting and I agree with it, but I don’t think Nature are only addressing the general public. To me, it seems like they’re addressing researchers and policymakers and telling them what they ought to focus on as well.
Nature: “Stop talking about tomorrow’s AI doomsday when AI poses risks today”
Well written, I really enjoyed this. This is not really on topic but I’d be curious to read and “idiot’s guide” or maybe an “autist’s guide” on how to avoid sounding condescending.
interpretability on pretrained model representations suggest they’re already internally “ensembling” many different abstractions of varying sophistication, with the abstractions used for a particular task being determined by an interaction between the task data available and the accessibility of the different pretrained abstraction
That seems encouraging to me. There’s a model of AGI value alignment where the system has a particular goal it wants to achieve and brings all it’s capabilities to bear on achieving that goal. It does this by having a “world model” that is coherent and perhaps a set of consistent bayesian priors about how the world works. I can understand why such a system would tend to behave in a hyperfocused way to go out to achieve its goals.
In contrast, a systems with an ensemble of abstractions about the world, many of which may even be inconsistent, seems much more human like. It seems more human like specifically in that the system won’t be focused on a particular goal, or even a particular perspective about how to achieve it, but could arrive at a particular solution ~~randomly, based on quirks of training data.
I wonder if there’s something analogous to human personality, where being open to experience or even open to some degree of contradiction (in a context where humans are generally motivated to minimize cognitive dissonance) is useful for seeing the world in different ways and trying out strategies and changing tack, until success can be found. If this process applies to selecting goals, or at least sub-goals, which it certainly does in humans, you get a system which is maybe capable of reflecting on a wide set of consequences and choosing a course of action that is more balanced, and hopefully balanced amongst the goals we give a system.
Who Aligns the Alignment Researchers?
I’ve been writing about multi-objective RL and trying to figure out a way that an RL agent could optimize for a non-linear sum of objectives in a way that avoids strongly negative outcomes on any particular objective.
This sounds like a very interesting question.
I get stuck trying to answer your question itself on the differences between AGI and humans.
But taking your question itself at its face:
ferreting out the fundamental intentions
What sort of context are you imagining? Humans aren’t even great at identifying the fundamental reason for their own actions. They’ll confabulate if forced to.
thank you for writing this. I really personally appreciate it!
If Ray eventually found that the money was “still there”, doesn’t this make Sam right that “the money was really all there, or close to it” and “if he hadn’t declared bankruptcy it would all have worked out”?