If you can come up with a prior that can learn human preferences, why put that prior into a superintelligent agent instead of first updating it to match human preferences? It seems like the latter could be safer as one could then investigate the learned preferences directly, and as one then doesn’t have to deal with it making mistakes before it has learned much.
tailcalled
The key areas where I can think of this being a problem is 1. when there are unobservable latent variables, particularly ones which act on a very slow time scale, or 2. when the training data only varies on a submanifold of the full state space.
I wonder if certain kinds of inductive biases can help address 2. E.g. if you have a model architecture that demands everything to reduce to small-scale dynamics for voxels, like forcing the world model to be a giant 3D CNN, then you don’t need the training data to vary across the full state space. Instead you might be able to get away with the training data having voxels that span the full state space.
I think problem 1 is just genuinely super hard to solve but I don’t know for sure. There’s a lot of information about problem 1 that exists in e.g. text on the internet, so maybe it contains the solution.
Insofar as that is true, seems like a scale issue. (It doesn’t seem entirely true—global warming is a major problem, but not exactly an x-risk. Many of the biggest contributors to global warming are not the ones who will be hit the hardest. And there’s a tragedy of the commons issue to it.)
Regularization doesn’t eliminate spurious correlations.
I’m not saying that it is the sole explanation of all human behavior, I’m saying that it is a major class of behaviors that is difficult to stop people from engaging in and which is necessary for the effectiveness of humans in influencing the world.
Not sure what the relevance of global warming is to this discussion.
They are created by other intelligent minds, though. What I mean is, would it be adaptive for intelligence to evolve without velocity?
I would analogize it to plants vs animals. Animals tend to be much more intelligent than plants, presumably because their ability to move around means that they have to deal with much more varied conditions, or because they can have much more complex influences on the world. These seem difficult to achieve without varying one’s velocity. There’s also stuff like social relations; inanimate organisms might help or hurt each other, but they probably have to do so in much simpler ways, since their positions relative to each other are fixed, while animals can more easily interact with others in more complex ways and have more varying relations.
It’s not illegal. I said “killings”, not “homicides” or “murders”.
I’m not sure I agree with this. For instance, changing one’s “velocity” in a controlled manner seems nearly impossible in practically all cellular automata for various reasons, partly because they lack Poincare invariance. Could one have intelligent life without this?
I have a hobby of trying to “invent universes” to use as toy models for various things. Sort of like Conway’s Game of Life, except Game of Life lacks a bunch of properties that seem fairly core to characterizing the dynamics of our universe (Poincare invariance and therefore also continuity, Louville’s theorem, conservation of energy and momentum etc.). (It does have some important characteristics that match our universe, e.g. fixed speed of causality (light).) Such universes generally have to be deterministic or stochastic, because making them quantum would be computationally infeasible. However, the properties seem very difficult to satisfy in interesting ways for deterministic or stochastic universes.
Factory farming
This could really.do.with some concrete examples of homeostasis,
There’s a whole bunch of pure bio stuff that I’m not properly familiar with the details of, e.g. the immune system. But the more interesting stuff is probably the behavioral stuff.
When you are low on nutrition or calories, you get hungry and try to eat. You generally feel motivated to ensure that you have food to eat for when you do get hungry, which in modern society involves doing stuff like working for pay, but in other societies has involved farming or foraging. If there are specific nutrients like salt that you are missing, then you feel strong cravings for food containing those nutrients.
Shelter, if the weather is sufficiently cold that it would hurt you (or waste energy), then you find it aversive and seek protection. Longer-term, you ensure you have a house etc..
Safety. If you touch a hot stove, you immediately move your hand away from it. In order to avoid getting hurt, people create safety regulations. Etc.
There’s just tons of stuff, and nobody is likely to talk you out of ensuring that you get your nutritional needs covered, or to talk you out of being protected from the elements. They are very strong drives.
and some discussion of how homeostasis is compatible with major life changes.
And second, because maintaining homeostasis is again just a proxy for other goals that evolution has, namely because it grants power to engage in reproduction and kin altruism
Staying alive grants power to engage in reproduction and kin altruism … But homeostasis means “staying the same”, according to the dictionary. The two come apart. A single homeostat would want to stay single, because that is their current state. But singletons often don’t want to stay single, and don’t serve evolutionary purposes by doing so.
I mean homeostasis in the sense of “keeping conditions within the range where one is healthily alive”, not in the sense of “keeping everything the same”.
Your post seems to start out with the wrong assumption that human minds don’t (to good approximation) have the wrapper structure. See here for counter: https://www.lesswrong.com/posts/dKTh9Td3KaJ8QW6gw/why-assume-agis-will-optimize-for-fixed-goals?commentId=8E5DHLzXkdBBERx5B#8E5DHLzXkdBBERx5B
You start with the example of aphantasia, and then generalize to other mental building blocks. Then you propose that the mental building blocks lead to differences in various abilities. But is that true? I was under the impression that aphantasia research tends to find that it doesn’t make a difference for practical abilities.
Seems like it would be interesting to study more, generally, though. My immediate thought would be that I’d want to investigate whether the mental building blocks are independent of each other or not.
I think this is a fun exercise. It of course can’t replace the Bayesian model of probability, but it’s conceptually interesting enough as a way to think about chaos.
Why the disagree vote?
I expect unaligned human-level AIs to try the same thing and have much more success because optimizing code and silicon hardware is easier than optimizing flesh brains.
Seems to me that optimizing flesh brains is easier than optimizing code and silicon hardware. It’s so easy, evolution can do it despite being very dumb.
Roughly speaking the part that makes it easy is that the effects of flesh brains are additive with respect to the variables one might modify (standing genetic variation), whereas the effects of hardware and software are very nonlinear with respect to the variables one might modify (circuit connectivity(?) and code characters).
We haven’t made much progress on optimizing humans, but that’s less because optimizing humans is hard and more because humans prefer using the resources that could’ve been used for optimizing humans for self-preservation instead.
In the evolutionary case, the answer is that this is out of distribution, so it’s not evolved to be robust to such changes.
“No way” is indeed an excessively strong phrasing, but it seems clear to me that pursuit of homeostasis is much more robust to perturbations than most other pursuits.
One thing I can end up worrying about is that useful tricks get ignored due to a dynamic of:
A person tries to overextend the useful trick beyond its range of applicability such that it turns into a godzilla strategy
Everyone starts associates the trick with the godzilla strategy
People don’t consider using the trick within the range where it is actually applicable
For instance, consider debate. Debate is not magic and there’s lots of things it can’t do. But (constructively understood) logical operators such as “for all” and “exists” can be given meaning using a technique called “game semantics”, and “debate” seems like a potential way to implement this in AI.
Can this do even a fraction of the things that people want debate to do? No. Can I think of anything that needs these game semantics? Not right now, no. But is it a tool that seems potentially powerful for the future? Yeah, I’d say so; it expands the range of things we can express, should we ever find a case where we want to express it, and so it is a good idea to be ready to deploy it.
My comment here is a bit narrow, but re
A lot of people get surprised at how quickly and easily I intuit linear algebra things, and I think a major factor is this. For linear algebra, you can think of vector spaces as being the equivalent of types/units. E.g. the rotation map for a diagonalization maps between the original vector space and the direct sum of the eigenvectors. Sort of.
It’s always the first question I ask when I see a matrix or a tensor—what spaces does it map between?