LW1.0 username Manfred. PhD in condensed matter physics. I am independently thinking and writing about value learning.
Charlie Steiner(Charlie Steiner)
Cute idea, but I think you won’t get many upvotes because the post felt longer (and probably more technical) than the idea could sustain.
One unavoidable issue with defining consciousness, which has to be handled with some delicacy, is that people don’t have separate mental buckets for “consciousness” and “the mental properties of humans that I care about.” Sometimes we like to say that we intrinsically care about consciousness (as if they were independent), but really it’s more like consciousness and us caring about things are all muddled together.
In one direction, this means that it seems obvious that upon offering a definition for consciousness, this means that there’s a “consciousness monster” that maximizes the definition, which seems interesting because since you’ve labeled the thing you’re defining “consciousness,” it feels like you intrinsically care about it.
In the other direction, this means that upon offering a simple definition for consciousness, everyone who applies common sense to it will go “Wait, but this definition doesn’t include properties of humans that I care about, like emotions / pain / dreams / insert your favorite thing here.”
What should you say, as the groom? At weddings I’ve been at, usually the groom just says a bunch of “I do”s and some vows the fiancees maybe sat down and tweaked from examples they both liked. Then usually at the reception they’ll tell a story about how they met, what they love about the other person, and/or what they were thinking or feeling at important points in their relationship. Whoever gives that speech second thanks everyone for coming.
Typically it’s the officiant’s job to give the speech about what Love is and also about how marriage is hard, and the best man and woman’s job to tell the audience how well the couple fit each other while also making fun of them.
I think this is a pretty sensible way to do things. So I’ll give you three thoughts:
Option one, the lazy but totally fine way, is you ask your officiant to put the Ursula LeGuin quote (picked as the example because I really like it) in their short speech about what Love is and how marriage is hard.
Option two is to put it at the absolute start of your reception speech—“Ursula LeGuin said “[quote].” With [partner name,] I really feel [feelings that relate to quote]. When I first met her, my first thought was [humorous anecdote]. [rest of speech]”
Option three is to put it at the very end of your reception speech. “[speech, e.g. about a phone call with your mom during which you realized it was getting serious]. The next day I started shopping for a ring. Ursula LeGuin said “[quote].” [Partner name], I [way of pledging your love that uses pieces of the quote]. [Telling them you love them 0-2 more times and ways.]”
I don’t remember it having any names, but this SEP article might help: https://plato.stanford.edu/entries/identity-time/
See e.g. “Four-dimensionlism,” “Personal Identity.”
But yeah I think most people around here have gotten used to the notion of identity as coming from a memory and perceptual relationship that might skip across time or space (every teleporter is a time machine), or even branch. Not really from cryonics so much as from computationalism about minds.
Thanks for this mammoth comment!
I’ve seen the subreddit a few times but don’t have any sort of mental image of what the users are like. And it’s likely going to stay that way unless they write reasonably interesting things and put them on LW.
It’s a C-tier journal. (Impact factor 3.3, where 1 is trash and 5.5 is everyone in your subfield reads it)
It’s not an A-tier journal that everyone in the field cares about. It’s not even a B-tier journal that everyone in the sub-field cares about. It’s just a place random Joes with PhDs can go get their thoughts published. But it still has standards and still gets citations. In ethics as a field, AI is a niche that doesn’t really get a B-tier journal (not like medicine or law).
Plus, I was recommended to check out the work of one of the editors of the special issue on AI, and saw they had more AI papers than the best-cited ethics journals, so I decided it would be interesting to take a broad sample of this one journal.
I think there’s just one journal it would have been more appropriate for me to delve into, which is Ethics and Information Technology, which is even more on-brand than Science and Engineering Ethics and also slightly better-cited. But it’s not like they talk about superhuman AI much either—topic-wise they spend less time on wooly philosophical rambling and more time on algorithmic bias.
I’ll talk more about them in my next post that’s more of an attempt to go out and find interesting papers I didn’t know about before. One of the problems with a literature search is that a lot of interesting articles seem to have ended up in one-off collections (e.g. Douglas Summers-Stay’s article from a collection called Autonomy and Artificial Intelligence) or low-impact publications.
Reading the ethicists: A review of articles on AI in the journal Science and Engineering Ethics
There is no way to be polite so I will be brief. I think you should consider the positives of epistemic learned helpelessness with respect to arguments for any particular UFO being non-mundane.
If I wanted to play fast and loose, I would claim that our sense of ourselves as having a first-person at all is part of an evolutionary solution to the problem of learning from other peoples’s experiences (wait, wasn’t there a post like that recently? Or was that about empathy...). It merely seems like a black box to us because we’re too good at it, precisely because it’s so important.
Somehow we develop a high-level model of the world with ourselves and other people in it, and then this level of abstraction actually gets hooked up to our motivations—making this a subset of social instincts.
When imagining hooking up abstract learned world models to motivation for AI like this, I sometimes imagine something much less “fire and forget” than the human brain, something more like people testing, responding to, and modifying an AI that’s training or pre-training on real-world data. Evolution doesn’t get to pause me at age 4 and rummage around in my skull.
Suppose I have two robots, one of which wants to turn the world into paperclips and the other of which wants to turn the world into staples.
Your argument here seems to extend to saying that we can’t call these robots or their preferences “misaligned with each other,” because robot A’s preferences are used to search over actions of robot A, and vice versa for robot B.
I don’t think that argument makes sense. The action spaces are different, but both robots are still trying to affect the same world and steer it in different directions. We could formalize this by defining for each robot a utility function over states of the world.
There is one important type signature point here, which is that robots are made of atoms and utility functions are not. The robots’ utility functions don’t live in the physical robots (they’re not the right type of stuff), they actually live in our abstract model of the robots. This doesn’t mean it’s futile to compare things, though—it’s fine to use abstractions within their domains of validity.
Thoughts on AI Safety Camp
Yes, some people are interested in it and other people think it’s not worth it. See e.g. the Eliezer Yudkowsky + Richard Ngo chat log posts.
There’s definitely a tension here between avoiding bad disruptive actions and doing good disruptive actions.
It seems to me like you’re thinking about SEM more like a prior that starts out dominant but can get learned away over time. Is that somewhat close to how you’re thinking about this tension?
Thanks for the interview.
I am confused about what you even mean at several points.
Maybe try re-explaining with a more typical example of bias, as clearly as you can?
To some extent this sounds like it’s already captured by the notion of intelligence as being able to achieve goals in a wide range of environments—mesa-optimizers will have some edge if they’re intelligent (or else why would they arise?). And this edge grows larger the more complicated stuff they’re expected to do.
Contrary to the middle of your post, I would expect the training environment to screen off the deployment environment—the influentialness of a future AI is going to be because the training environment rewarded intelligence, not because influentialness on the deployment environment somehow reaches back to bypass the training environment and affect the AI.
Cool to hear you tried it!
Depends on if it generates stuff like this if you ask it for tic-tac-toe :P
What do you think the results would be like if you try to use a language model to automatically filter for direct-opinion tweets and do automatic negation?
Super cool, thanks!
Image interpretability seems mostly so easy because humans are already really good at interpreting 2D images with local structure. But thinking about this does suggest an idea for language model interpretability—how practical is it to find text that a) has high probability according to the prior distribution, b) strongly activates one attention head or feed-forward neuron or something, c) only weakly activates other parts of the transformer (within some reference class)? Probably this has already been tried somewhere and gotten middling results.