Well, you mentioned that a lot of people were getting off the train at point 1. My comment can be thought of as giving a much more thoroughly inside-view look at point 1, and deriving other stuff as incidental consequences.
I’m mentally working with an analogy to teaching people a new contra dance (if you don’t know what contra dancing is, I’m just talking about some sequence of dance moves). The teacher often has an abstract view of expression and flow that the students lack, and there’s a temptation for the teacher to try to share that view with the students. But the students don’t want to abstractions, what they want is concrete steps to follow, and good dancers will dance the dance just fine without ever hearing about the teacher’s abstract view. Before dancing they regard the abstractions as difficult to understand and distracting from the concrete instructions; they’ll be much more equipped to understand and appreciate them *after* dancing the dance.
Huh, I wonder what you think of a different way of splitting it up. Something like:
It’s a scientific possibility to have AI that’s on average better than humanity at the class of tasks “choose actions that achieve a goal in the real world.” Let’s label this by some superlative jargon like “superintelligent AI.” Such a technology would be hugely impactful.
It would be really bad if a superintelligent AI was choosing actions to achieve some goal, but this goal wasn’t beneficial to humans. There are several open problems that this means we need to solve before safely turning on any such AI.
We know enough that we can do useful work on (most of) these open problems right now. Arguing for this also implies that superintelligent AI is close enough (if not in years, then in “number of paradigm shifts”) that this work needs to start getting done.
We would expect a priori that work on these open problems of beneficial goal design should be under-prioritized (public goods problem, low immediate profit, not obvious you need it before you really need it). And indeed that seems to be the case (insert NIPS survey here), though there’s work going on at nonprofits that have different incentives. So consider thinking about this area if you’re looking for things to research.
Welp, we’re doomed (/s), as soon as someone figures out how to get 100 million tries at taking over the world so we can crush the world-taking-over problem with stochastic gradient descent.
Here’s some: “antipacek”, “progressive killer fat”, “hut refusal guideline”, “south-stream resignation”, “pamplem”, “conscience bw”, “fog log dog bog”, “layer iron trolley”, “prevent publication frequency”, “sconspiracyn”.
Pretty sure you understood it :) But yeah, not only would I like to be able to compare two things, I’d like to be able to find the optimum values of some continuous variables. Though I suppose it doesn’t matter as much if you’re trying to check / evaluate ideas that you arrived at by more abstract reasoning.
I’m also looking forward to upcoming posts, but all these examples so far sound to me like a modernist’s substitute for sympathetic magic :P
Sounds like a sales pitch for whiteboard wallpaper :)
The impractical part about training for good behavior is that it’s a nested loop—every training example on how to find good maxima requires training a model that in turn needs its own training examples. So it’s destined to be behind the state of the art, probably using state of the art models to generate the copious required training data.
The question, I suppose, is whether this is still good enough to learn useful general lessons. And after thinking about it, I think the answer is that yes, it should be, especially for feed-forward architectures that look like modern machine learning, where you don’t expect qualitative changes in capability as you scale computational resources.
Yes, I hope that my framing of the problem supports this sort of conclusion :P
An alternate framing where it still seems important would be “moral uncertainty”. Where when we don’t know what to do, it’s because we are lacking some facts, maybe even key facts. So I’m sort of sneakily arguing against that frame.
Any sequence that involves recommending people work through Drawing on the Right Side of the Brain is a sequence I should read :P
You mean, why I expect a person-affecting utility function to be different if evaluated today v. tomorrow?
Well, suppose that today I consider the action of creating a person, and am indifferent to creating them. Since this is true for all sorts of people, I am indifferent to creating them one way vs. another (e.g. happy vs sad). If they are to be created inside my guest bedroom, this means I am indifferent between certain ways the atoms in my guest bedroom could be arranged. Then if this person gets created tonight and is around tomorrow, I’m no longer indifferent between the arrangement that is them sad and the arrangement that is them happy.
Yes, you could always reverse-engineer a utility function over world-histories that encompasses both of these. But this doesn’t necessarily solve the problems that come to mind when I say “change in utility functions”—for example, I might take bets about the future that appear lose/lose when I have to pay them off, or take actions that modify my own capabilities in ways I later regret.
I dunno—were you thinking of some specific application of indifference that could sidestep some of these problems?
Hilary Greaves sounds like a really interesting person :)
So, you could use these methods to construct a utility function corresponding to the person-affecting viewpoint from your current world, but this wouldn’t protect this utility function from critique. She brings up the Pareto principle, where this person-affecting utility function would be indifferent to some things that were strict improvements, which seems undesirable.
I think the more fundamental problem there is intransitivity. You might be able to define a utility function that captures the person-affecting view to you, but a copy of you one day later (or one world over) would say “hang on, I didn’t agree to that.” They’d make their own utility function with priorities on different people. And so you end up fighting with yourself, until one of you can self-modify to actually give up the person-affecting view, and just keep this utility function created by their past self.
A more reflective self might try to do something clever like bargaining between all selves they expect to plausibly be (and who will follow the same reasoning), and taking actions that benefit those selves, confident that their other selves will keep their end of the bargain.
My general feeling about population ethics, though, is that it’s aesthetics. This was a really important realization for me, and I think most people who think about population ethics don’t think about the problem the right way. People don’t inherently have utility, utility isn’t a fluid stored in the gall bladder, it’s something evaluated by a decision-maker when they think about possible ways for the world to be. This means it’s okay to have a preferred standard of living for future people, to have nonlinear terms on population and “selfish” utility, etc.
If the growth is exponential, I still don’t think there’s a paradox—sure, you’re incentivized to wait forever, but I’m already incentivized to wait forever with my real life investments. The only thing that stops me from real life investing my money forever is that sometimes I have things (not included in the toy problem) that I really want to buy with that money.
So, the dictionary definition (SEP) would be something like “objectively good/parsimonious/effective ways of carving up reality.”
There’s also the implication that when we use kinds in reasoning, things of the same kind should share most or all important properties for the task at hand. There’s also sort of the implication that humans naively think of the world as made out of natural kinds on an ontologically basic level.
I’m saying that even if people don’t believe in disembodied souls, when they ask “what do I want?” they think they’re getting an answer back that is objectively a good/parsimonious/effective way of talking. That there is some thing, not necessarily a soul but at least a pattern, that is being accessed by different ways of asking “what do I want?”, which can’t give us inconsistent answers because it’s all one thing.
Thanks for the reply :)
Sure, you can get the AI to draw polka-dots by targeting a feature that likes polka dots, or a Mondrian by targeting some features that like certain geometries and colors, but now you’re not using style transfer at all—the image is the style. Moreover, it would be pretty hard to use this to get a Kandinsky, because the AI that makes style-paintings has no standard by which it would choose things to draw that could be objects but aren’t. You’d need a third and separate scheme to make Kandinskys, and then I’d just bring up another artist not covered yet.
If you’re not trying to capture all human visual art in one model, then this is no biggie. So now you’re probably going “this is fine, why is he going on about this.” So I’ll stop.
Do you have examples in mind when you mention “human experience” and “embodiment” and “limited agents”
For “human experience,” yeah, I just means something like communicative/evocative content that relies on a theory of mind to use for communication. Maybe you could train an AI on patriotic paintings and then it could produce patriotic paintings, but I think only by working on theory of mind would an AI think to produce a patriotic painting without having seen one before. I’m also reminded of Karpathy’s example of Obama with his foot on the scale.
For embodiment, this means art that blurs the line between visual and physical. I was thinking of how some things aren’t art if they’re normal sized, but if you make them really big, then they’re art. Since all human art is physical art, this line can be avoided mostly but not completely.
For “limited,” I imagined something like Dennett’s example of the people on the bridge. The artist only has to paint little blobs, because they know how humans will interpret them. Compared to the example above of using understanding of humans to choose content, this example uses an understanding of humans to choose style.
Yet even with zero prior training on visual art they can make pretty impressive images by human lights. I think this was surprising to most people both in and outside deep learning. I’m curious whether this was surprising to you.
It was impressive, but I remember the old 2015 post the Chris Olah co-authored. First off, if you look at the pictures, they’re less pretty than the pictures that came later. And I remember one key sentence: “By itself, that doesn’t work very well, but it does if we impose a prior constraint that the image should have similar statistics to natural images, such as neighboring pixels needing to be correlated.” My impression is that DeepDream et al. have been trained to make visual art—by hyperparameter tuning (grad student descent).
I like this exposition, but I’m still skeptical about the idea.
Since “art” is a human concept, it’s naturally a grab bag of lots of different meanings. It’s plausible that for some meanings of “art,” humans do something similar to searching through a space of parameters for something that strongly activates some target concept within the constraints of a style. But there’s also a lot about art that’s not like that.
Like art that’s non-representational, or otherwise denies the separation between form and content. Or art that’s heavily linguistic, or social, or relies on some sort of thinking on the part of the audience. Art that’s very different for the performer and the audience, so that it doesn’t make sense to talk about a search process optimizing for the audience’s experience, or otherwise doesn’t have a search process as a particularly simple explanation. Art that’s so rooted in emotion or human experience that we wouldn’t consider an account of it complete without talking about the human experience. Art that only makes sense when considering humans as embodied, limited agents.
So if I consider the statement “the DeepDream algorithm is doing art,” there is a sense in which this is reasonable. But I don’t think that extends to calling what DeepDream does a model for what humans do when we think about or create art. We do something not merely more complicated in the details, but more complicated in its macros-structure, and hooked into many of the complications of human psychology.
Dropout is like the converse of this—you use dropout to assess the non-outdropped elements. This promotes resiliency to perturbations in the model—whereas if you evaluate things by how bad it is to break them, you could promote fragile, interreliant collections of elements over resilient elements.
I think the root of the issue is that this Shapley value doesn’t distinguish between something being bad to break, and something being good to have more of. If you removed all my blood I would die, but that doesn’t mean that I would currently benefit from additional blood.
Anyhow, the joke was that as soon as you add a continuous parameter, you get gradient descent back again.
0.3 mg melatonin an hour before I want to be asleep works, my only trouble is actually planning in advance.
You look at the world, and you say: “how can I maximize utility?” You look at your beliefs, and you say: “how can I maximize accuracy?” That’s not a consequentialist agent; that’s two different consequentialist agents!
Not… really? “how can I maximize accuracy?” is a very liberal agentification of a process that might be more drily thought of as asking “what is accurate?” Your standard sequence predictor isn’t searching through epistemic pseudo-actions to find which ones best maximize its expected accuracy, it’s just following a pre-made plan of epistemic action that happens to increase accuracy.
Though this does lead to the thought: if you want to put things on equal footing, does this mean you want to describe a reasoner that searches through epistemic steps/rules like an agent searching through actions/plans?
This is more or less how humans already conceive of difficult abstract reasoning. We don’t solve integrals by gradient descent, we imagine doing some sort of tree search where the edges are different abstract manipulations of the integral. But for everyday reasoning, like navigating 3D space, we just use our specialized feed-forward hardware.