There’s a nearby kind of obvious but rarely directly addressed generalized version of one of your arguments, which is that ML learns complex functions all the time, so why should human values be any different? I rarely see this discussed, and I thought the replies from Nate and the ELK related difficulties were important to have out in the open, so thanks a lot for including the face learning <-> human values learning analogy.
Ronny Fernandez
- 18 Oct 2022 0:16 UTC; 2 points) 's comment on Counterarguments to the basic AI x-risk case by (
I came here to say something pretty similar to what Duncan said, but I had a different focus in mind.
It seems like it’s easier for organizations to coordinate around PR than it is for them to coordinate around honor. People can have really deep intractable, or maybe even fundamental and faultless, disagreements about what is honorable, because what is honorable is a function of what normative principles you endorse. It’s much easier to resolve disagreements about what counts as good PR. You could probably settle most disagreements about what counts as good PR using polls.
Maybe for this reason we should expect being into PR to be a relatively stable property of organizations, while being into honor is a fragile and precious thing for an organization.
AN APOLOGY ON BEHALF OF FOOLS FOR THE DETAIL ORIENTED
Misfits, hooligans, and rabble rousers.
Provocateurs and folk who don’t wear trousers.
These are my allies and my constituents.
Weak in number yet suffused with arcane power.
I would never condone bullying in my administration.
It is true we are at times moved by unkind motivations.
But without us the pearl clutchers, hard asses, and busy bees would overrun you.
You would lose an inch of slack per generation.
Many among us appreciate your precision.
I admit there are also those who look upon it with derision.
Remember though that there are worse fates than being pranked.
You might instead have to watch your friend be “reeducated”, degraded, and spanked
On high broadband public broadcast television.
We’re not so different really.
We often share your indignation
With those who despise copulation.
Although our alliance might be uneasy
We both oppose the soul’s ablation.
So let us join as cats and dogs, paw in paw
You will persistently catalog
And we will joyously gnaw.
To be clear, I did not think we were discussing the AI optimist post. I don’t think Nate thought that. I thought we were discussing reasons I changed my mind a fair bit after talking to Quintin.
For anyone who may have the executive function to go for the 1M, I propose myself as a cheap author if I get to play as the dungeon master role, or play as the player role, but not if I have to do both. I recommend taking me as the dungeon master role. This sounds genuinely fun to me. I would happily do a dollar per step.
I can also help think about how to scale the operation, but I don’t think I have the executive function, management experience, or slack to pull it off myself.
I am Ronny Fernandez. You can contact me on fb.
That’s not semantics, it’s syntactics.
hehehe
(Get it? Cause that is a minor semantic issue.)
I loved this, but maybe should come with a cw.
Sometimes I sort of feel like a grumpy old man that read the sequences back in the good old fashioned year of 2010. When I am in that mood I will sometimes look around at how memes spread throughout the community and say things like “this is not the rationality I grew up with”. I really do not want to stir things up with this post, but I guess I do want to be empathetic to this part of me and I want to see what others think about the perspective.
One relatively small reason I feel this way is that a lot of really smart rationalists, who are my friends or who I deeply respect or both, seem to have gotten really into chakras, and maybe some other woo stuff. I want to better understand these folks. I’ll admit now that I have weird biased attitudes towards woo stuff in general, but I am going to use chakras as a specific example here.
One of the sacred values of rationality that I care a lot about is that one should not discount hypotheses/perspectives because they are low status, woo, or otherwise weird.
Another is that one’s beliefs should pay rent.
To be clear, I am worried that we might be failing on the second sacred value. I am not saying that we should abandon the first one as I think some people may have suggested in the past. I actually think that rationalists getting into chakras is strong evidence that we are doing great on the first sacred value.
Maybe we are not failing on the second sacred value. I want to know whether we are or not, so I want to ask rationalists who think a lot or talk enthusiastically about chakras a question:
Do chakras exist?
If you answer “yes”, how do you know they exist?
I’ve thought a bit about how someone might answer the second question if they answer “yes” to the first question without violating the second sacred value. I’ve thought of basically two ways that seems possible, but there are probably others.
One way might be that you just think that chakras literally exist in the same ways that planes literally exist, or in the way that waves literally exist. Chakras are just some phenomena that are made out of some stuff like everything else. If that is the case, then it seems like we should be able to at least in principle point to some sort of test that we could run to convince me that they do exist, or you that they do not. I would definitely be interested in hearing proposals for such tests!
Another way might be that you think chakras do not literally exist like planes do, but you can make a predictive profit by pretending that they do exist. This is sort of like how I do not expect that if I could read and understand the source code for a human mind, that there would be some parts of the code that I could point to and call the utility and probability functions. Nonetheless, I think it makes sense to model humans as optimization processes with some utility function and some probability function, because modeling them that way allows me to compress my predictions about their future behavior. Of course, I would get better predictions if I could model them as mechanical objects, but doing so is just too computationally expensive for me. Maybe modeling people as having chakras, including yourself, works sort of the same way. You use some of your evidence to infer the state of their chakras, and then use that model to make testable predictions about their future behavior. In other words, you might think that chakras are real patterns. Again it seems to me that in this case we should at least in principle be able to come up with tests that would convince me that chakras exist, or you that they do not, and I would love to hear any such proposals.
Maybe you think they exist in some other sense, and then I would definitely like to hear about that.
Maybe you do not think they exist in anyway, or make any predictions of any kind, and in that case, I guess I am not sure how continuing to be enthusiastic about thinking about chakras or talking about chakras is supposed to jive with the sacred principle that one’s beliefs should pay rent.
I guess it’s worth mentioning that I do not feel as averse to Duncan’s color wheel thing, maybe because it’s not coded as “woo” to my mind. But I still think it would be fair to ask about that taxonomy exactly how we think that it cuts the universe at its joints. Asking that question still seems to me like it should reduce to figuring out what sorts of predictions to make if it in fact does, and then figuring out ways to test them.
I would really love to have several cooperative conversations about this with people who are excited about chakras, or other similar woo things, either within this framework of finding out what sorts of tests we could run to get rid of our uncertainty, or questioning the framework I propose altogether.
My faith in the expertise of physicists like Richard Feynman, for instance, permits me to endorse—and, if it comes to it, bet heavily on the truth of—a proposition that I don’t understand. So far, my faith is not unlike religious faith, but I am not in the slightest bit motivated to go to my death rather than recant the formulas of physics. Watch: E doesn’t equal mc2, it doesn’t, it doesn’t!
--Dan Dennet: Breaking the Spell
All joking aside, I really mean this. Try listening to it as a solemn piece. I don’t think it’s that great of a fugue, but it has some nice stuff in there. The lack of rhythmic and tonal movement becomes more appropriate all of a sudden if you put on a sour-puss face. If you imagine that its torturous, repetitive nature, is an intentional part of the emotional experience Ludwig wanted to give you, it becomes less annoying and more powerful, to my ear anyway.
and also:
I could keep listening to the Great Fugue, and see if I, too, come to love it in time. But what would that prove? Of course I would come to love it in time,
Why not just make an earnest attempt to like all art in that case. You’ll be better off. Is there some artistic merit out there which you would not be rewarding accurately if you liked all art? If you end up liking the great fugue after you listen to it a bunch, even though you didn’t like it at first, sweet deal.
I got into jazz, essentially because i thought that it was cool to be into jazz. I did not like it when I bought my first jazz album, and I probably didn’t like the next ten I bought either. But I’m really glad I thought it so cool that i was willing to torture myself for those hours at a time until i liked it. If I hadn’t I wouldn’t have the crazy good relative pitch I have today, nor the ability to mind-cream myself when someone rips Coltrane changes.
So, is my appreciation of jazz, then somehow shallower by virtue of my forcing myself to like it? Or perhaps in some way inauthentic? Well I’m not being inauthentic about loving jazz now. And I def have an above average ear for changes and improv. Ultimately, I don’t think I should care at all what i did to like it now; who cares? I seriously doubt that someone who liked jazz from their first time hearing it, gets more happiness chemicals from jazz than I do by virtue of their being naturally into jazz, and my forcing myself.
The question is “if there’s something new, and I don’t like it, how much suffering should I be willing to put up with to learn to like it?” The answer clearly depends on juxtaposing the quantity of pleasure I should expect after I like it, and the availability of this thing , with the amount of suffering and time I’ll have to put in to learn to like it.
Don’t worry about why you like a terminal value. Just get it.
So the shoggoth here is the actual process that gets low loss on token prediction. Part of the reason that it is a shoggoth is that it is not the thing that does the talking. Seems like we are onboard here.
The shoggoth is not an average over masks. If you want to see the shoggoth, stop looking at the text on the screen and look at the input token sequence and then the logits that the model spits out. That’s what I mean by the behavior of the shoggoth.On the question of whether it’s really a mind, I’m not sure how to tell. I know it gets really low loss on this really weird and hard task and does it better than I do. I also know the task is fairly universal in the sense that we could represent just about any task in terms of the task it is good at. Is that an intelligence? Idk, maybe not? I’m not worried about current LLMs doing planning. It’s more like I have a human connectnome and I can do one forward pass through it with an input set of nerve activations. Is that an intelligence? Idk, maybe not?
I think I don’t understand your last question. The shoggoth would be the thing that gets low loss on this really weird task where you predict sequences of characters from an alphabet with 50,000 characters that have really weird inscrutable dependencies between them. Maybe it’s not intelligent, but if it’s really good at the task, since the task is fairly universal, I expect it to be really intelligent. I further expect it to have some sort of goals that are in some way related to predicting these tokens well.
Quick submission:
The first two prongs of OAI’s approach seems to be aiming to get a human values aligned training signal. Let us suppose that there is such a thing, and ignore the difference between a training signal and a utility function, both of which I think are charitable assumptions for OAI. Even if we could search the space of all models and find one that in simulations does great on maximizing the correct utility function which we found by using ML to amplify human evaluations of behavior, that is no guarantee that the model we find in that search is aligned. It is not even on my current view great evidence that the model is aligned. Most intelligent agents that know that they are being optimized for some goal will behave as if they are trying to optimize that goal if they think that is the only way to be released into physics, which they will think because it is and they are intelligent. So P(they behave aligned | aligned, intelligent) ~= P(they behave aligned | unaligned, intelligent). P(aligned and intelligent) is very low since most possible intelligent models are not aligned with this very particular set of values we care about. So the chances of this working out are very low.
The basic problem is that we can only select models by looking at their behavior. It is possible to fake intelligent behavior that is aligned with any particular set of values, but it is not possible to fake behavior that is intelligent. So we can select for intelligence using incentives, but cannot select for being aligned with those incentives, because it is both possible and beneficial to fake behaviors that are aligned with the incentives you are being selected for.
The third prong of OAI’s strategy seems doomed to me, but I can’t really say why in a way I think would convince anybody that doesn’t already agree. It’s totally possible me and all the people who agree with me here are wrong about this, but you have to hope that there is some model such that that model combined with human alignment researchers is enough to solve the problem I outlined above, without the model itself being an intelligent agent that can pretend to be trying to solve the problem while secretly biding its time until it can take over the world. The above problem seems AGI complete to me. It seems so because there are some AGIs around that cannot solve it, namely humans. Maybe you only need to add some non AGI complete capabilities to humans, like being able to do really hard proofs or something, but if you need more than that, and I think you will, then we have to solve the alignment problem in order to solve the alignment problem this way, and that isn’t going to work for obvious reasons.
I think the whole thing fails way before this, but I’m happy to spot OAI those failures in order to focus on the real problem. Again the real problem is that we can select for intelligent behavior, but after we select to a certain level of intelligence, we cannot select for alignment with any set of values whatsoever. Like not even one bit of selection. The likelihood ratio is one. The real problem is that we are trying to select for certain kinds of values/cognition using only selection on behavior, and that is fundamentally impossible past a certain level of capability.
It’s not as if a star would have absolutely no effect from a Boltzmann cake suddenly appearing inside of it. A civilization with a good enough model of how this star zigs and zags, they would be able to find facts about the star which would force a bayesian to move from the ridiculously tiny prior probability of the hypothesis :
On August 1st 2008 at midnight Greenwich time, a one-foot sphere of chocolate cake spontaneously formed in the center of the Sun; and then, in the natural course of events, this Boltzmann Cake almost instantly dissolved.
to some posterior distribution. Some pieces of evidence might increase the probability of the hypothesis, some might decrease it.
This is not a cheap objection in anyway. To misinterpret verifications such as early Wittgenstein and W.V. Quine as claiming that only those sentences which we can currently test are meaningful is a mistake. A common mistake, and one that some using the term positivist to describe themselves have made.
If logical positivism / verificationism were true, then the assertion of the spaceship’s continued existence would be necessarily meaningless, because it has no experimental consequences distinct from its nonexistence. I don’t see how this is compatible with a correspondence theory of truth.
This is another sort of mistake. Because a hypothesis can’t be tested by me does not mean that it is meaningless. Vereficationists would agree with this because they think verification works everywhere, even on the other side of the universe. If some alien race over there could have seen the spaceship, or seen something which made the probability of there being a spaceship there high, or not have, then the claim is not meaningless.
What vereficationists like Quine are saying is that science is done through the senses. In the matrix code, way above the level of the machine language, our senses are the evidence nodes of our Bayes nets, and our hypotheses are the last nodes. The top layer of nodes consists of the complete set of states that some beings sensory apparatus can be in, any node in this mind containing a belief which is independent of all of the evidence nodes, contains a belief which is meaningless for that mind. But showing subjective meaninglessness of some hypothesis in one being is not enough to show that a belief/hypothesis is meaningless for all minds.
I think the critiques of this article apply to the worse of the worse of positivism. But many of those critiques are critiques that were made by hard verificationists such as Quine. But the simplest form of versificationism can be traced to Edmund Husserl belief it or not. The core of what the first movement of phenomenologists, and Quine, were saying is that only stimulus sentences can ever be used as initial evidence. Some stimulus may increase the probability of some other belief which may then be used as evidence for some other belief in turn, but without evidence from stimulus there wouldn’t be enough useful shifting about of probability to do anything. Certainly a human brain, or even a replica of Einsteins brain, would have a hard time figuring out the theories of relativity if they only had a 4by4 binary black and white pixel view of the world, and could move around the camera providing them the input around freely as they like.
If no constructable mind could ever get any result from any instrument, natural, current or wildly advanced, that would force a rational mind to update its probability about a given sentence, a la bayes, then that sentence is not a scientifically meaningful belief. This is to be senseless for Wittgenstein, or literally meaningless, this is only to be scientifically meaningless for Quine. Both positions have been called vereficationism and I think both are useful, and true-ish at least.
Lastly,I’ve always thought of positivism as going perfectly with a correspondence theory of truth. We can treat “senseless” or “meaningless” as just meaning “un-entangle-able beliefs”, as in beliefs which make no restrictions on experience.
It seems to me that Yudkowsky and the whole lot of LW staples are plainly positivists. And I have always thought of this as a good thing. Positivism plus LW style Bayesianism plus effort, form an epistemology which at least gives you a stronger fighting chance than you would have otherwise. Forming stupid belifs is harder after reading lesswrong, and harder after reading Quine, or Goodman, or even the most basic vereficationists texts. Many people have made philosophical mistakes which can then be avoided by reading vereficationists. Such as LW. Give credit where it is due, to yourself and Quine.
I disagree, I am dyslexic, I actually reread every post several times before posting. I am just bad at noticing small details in letters, and bad at remembering arbitrary sequences. I am working on it.
In other words, the OP has mixed up the quotation and the referent (or the representation and the referent).
It seems to me that I am the one proposing a sharp distinction between probability theory (the representation), and rational degree of belief (the referent). If you say that probability is degree of belief, you destroy all the distinction between the model and the modeled. If by “probability” you mean subjective degree of belief, I don’t really care what you call it. But know that “probability” has been used in ways which are not consistent with that synonymy claim. By the fact that we do not have 100% belief that bayes does model ideal inference with uncertainty, this means that bayesian probability is not identical to subjective belief given out knowledge. If X is identical to Y, then X is isomorphic-to/models Y. Because we can still conceive of bayes not perfectly modeling rationality, without implying a contradiction, this means that our current state of knowledge does not include that bayes is identical to subjective degree of belief.
We learn that something is probability by looking at probability theory, not by looking at subjective belief. If rational subjective belief turned out to not be modeled by probability theory, then we would say that subjective degree of belief was not like probability, not that probability theory does not define probability.
The first person to make bayes, may have been thinking about rationality when he/she first created the system, or he/she may have been thinking about spatial measurements, or he/she may have been thinking about finite frequencies, and he/she would have made the same formal system in every case. Their interpretations would have been different, but they would all be the one identical probability theory. Which one the actual creator was thinking of, is irrelevant. What spaces, beliefs, finite frequencies all have in common is that they are modeled by probability theory. To use “probability” to refer to one of these, over another, is a completely arbitrary choice (mind you I said finite frequency).
If we loose nothing by using “models” instead of “is”, why would we ever use “is”? “Is’ is a much stronger claim than “models”. And frankly, I know how to check whether or not a given argument is an animal, for instance; how do I check if a given argument is a probability? I see if it satisifies the probability axioms. Finite frequency, measure, and rational degree of belief all seem to follow the probability axioms and inferences under specific, though similar, interpretations of probability theory.
Less impressive, but about as useful.
Here is an idea I just thought of in an uber ride for how to narrow down the space of languages it would be reasonable to use for universal induction. To express the k-complexity of an object relative to a programing language I will write:
Suppose we have two programing languages. The first is Python. The second is Qython, which is a lot like Python, except that it interprets the string “A” as a program that outputs some particular algorithmically large random looking character string with . I claim that intuitively, Python is a better language to use for measuring the complexity of a hypothesis than Qython. That’s the notion that I just thought of a way to formally express.
There is a well known theorem that if you are using to measure the complexity of objects, and I am using to measure the complexity of objects, then there is a constant such that for any object :
In words, this means that you might think that some objects are less complicated than I do, and you might think that some objects are more complicated than I do, but you won’t think that any object is complexity units more complicated than I do. Intuitively, is just the length of the shortest program in that is a compiler for So worst case scenario, the shortest program in that outputs will be a compiler for written in (which is characters long) plus giving that compiler the program in that outputs (which would be characters long).
I am going to define the k-complexity of a function relative to a programing language as the length of the shortest program in that language such that when it is given as an input, it returns . This is probably already defined that way, but jic. So say we have a function from programs in to their outputs and we call that function , then:
There is also another constant:
The first is the length of the shortest compiler for written in , and the second is the length of the shortest compiler for written in . Notice that these do not need to be equal. For instance, I claim that the compiler for Qython written in Python is roughly characters long, since we have to write the program that outputs in Python which by hypothesis was about characters long, and then a bit more to get it to run that program when it reads “A”, and to get that functionality to play nicely with the rest of Qython however that works out. By contrast, to write a compiler for Python in Qython it shouldn’t take very long. Since Qython basically is Python, it might not take any characters, but if there are weird rules in Qython for how the string “A” is interpreted when it appears in an otherwise Python-like program, then it still shouldn’t take any more characters than it takes to write a Python interpreter in regular Python.
So this is my proposed method for determining which of two programming languages it would be better to use for universal induction. Say again that we are choosing between and . We find the pair of constants such that and , and then compare their sizes. If is less than this means that it is easier to write a compiler for in than vice versa, and so there is more hidden complexity in ‘s encodings than in ’s, and so we should use instead of for assessing the complexity of hypotheses.
Lets say that if then hides more complexity than .
A few complications:
It is probably not always decidable whether the smallest compiler for written in is smaller than the smallest compiler for written in , but this at least in principle gives us some way to specify what we mean by one language hiding more complexity than another, and it seems like at least in the case of Python vs. Qython, we can make a pretty good argument that the smallest compiler for Python written in Qython is smaller than the smallest compiler for Qython written in Python.
It is possible (I’d say probable) that if we started with some group of candidate languages and looked for languages that hide less complexity, we might run into a circle. Like the smallest compiler for in might be the same size as the smallest compiler for in but there might still be an infinite set of objects such that:
In this case, the two languages would disagree about the complexity of an infinite set of objects, but at least they would disagree about it by no more than the same fixed constant in both directions. Idk, seems like probably we could do something clever there, like take the average or something, idk. If we introduce an and the smallest compiler for in is larger than it is in , then it seems like we should pick
If there is an infinite set of languages that all stand in this relationship to each other, ie, all of the languages in an infinite set disagree about the complexity of an infinite set of objects and hide less complexity than any language not in the set, then idk, seems pretty damning for this approach, but at least we narrowed down the search space a bit?
Even if it turns out that we end up in a situation where we have an infinite set of languages that disagree about an infinite set of objects by exactly the same constant, it might be nice to have some upper bound on what that constant is.
In any case, this seems like something somebody would have thought of, and then proved the relevant theorems addressing all of the complications I raised. Ever seen something like this before? I think a friend might have suggested a paper that tried some similar method, and concluded that it wasn’t a feasible strategy, but I don’t remember exactly, and it might have been a totally different thing.
Watcha think?
Hey, sorry if it’s mad trivial, but may I ask for a derivation of this? You can start with “P(H) = P(H|E)P(E) + P(H|~E)P(~E)” if that makes it shorter.
(edit):
Never mind, I just did it. I’ll post it for you in case anyone else wonders.
1} P(H) = P(H|E)P(E) + P(H|~E)P(~E) [CEE]
2} P(H)P(E) + P(H)P(~E) = P(H|E)P(E) + P(H|~E)P(~E) [because ab + (1-a)b = b]
3} (P(H) - P(H))P(E) + (P(H) - P(H))P(~E) = (P(H|E) - P(H))P(E) + (P(H|~E) - P(H))P(~E) [subtract P(H) from every value to be weighted]
4} (P(H) - P(H))P(E) + (P(H) - P(H))P(~E) = P(H) - P(H) = 0 [because ab + (1-a)b = b]
(conclusion)
5} 0 = (P(H|E) - P(H))P(E) + (P(H|~E) - P(H))P(~E) [by identity syllogism from lines 3 and 4]
Now, can’t I be a philosophical frequentest and a subjective bayesian? Just because probability theory models subjective beliefs does not mean that it doesn’t model frequencies; in fact, if some body told me that bayes doesn’t model frequencies I’m pretty sure I could prove them wrong much more easily than someone who said that probabilities don’t model degrees of belief.
But there is no contradiction in saying that the komolgorov probability function models both degrees of beliefs and actual frequencies.
(edit)
In fact it seems to me that komolgorov plainly does model frequency since it models odds, and odds model frequencies by a simple conversion. In fact, degrees of belief seem to model frequency as well. Using the thought experiment of frequencies of worlds you think you might find yourself in makes it simple to see how at least some degrees of belief can be seen as frequencies in and of themselves. In this thought experiment we treat probability as a measure of the worlds you think you might find yourself in; if you think that “there are ten cards and eight of them are blue”, then in 4/5s of the the worlds where “there are ten cards and eight of them are blue” holds, “the top card is blue” also holds. So you rightfully assign a 80% probability to the top card being blue.
Where the frequentest makes an error is in thinking that probabilities are then out there in the world treated as degrees of belief. What I mean by this is that they take the step from frequencies being in the world to uncertainties being in the world. This is a mistake, but I think that it is not central to the philosophical doctrine of frequent-ism. All that some frequentests claim is that probability models frequency and this is plainly true. And it is also true that there are frequencies in the world. Real frequencies independent of our minds. These are not probabilities, because there are no probabilities anywhere. Not even in minds.
Probability is not degree of subjective belief, probability is a class of automatized functions. These automized function model a great deal of things, measure theory, euclidean geometry, the constraints of rational beliefs, set cardinality, frequencies, etc. and the list can go on and on. Probability is a mathematical tool. And it is isomorphic to many important features of rationality and science, perhaps the most important being subjective degree of belief. But to argue that probability is subjective degree of belief just because it models degree of belief seems as silly to me as arguing that probability is frequency just because it models frequency. Why not the probability position of measure. Measure theory is isomorphic to probability. Why not say that probability is measure? Add that to the debating line.
I think the position to take towards probability is a properly Hofstadter-ish-ian formalism. Where the true statements about probability are simply the statements which are formed when you interpret the theorems of probability theory. Whatever else probability may be able to talk about truthfully it does so through isomorphism.
The shoggoth is supposed to be a of a different type than the characters. The shoggoth for instance does not speak english, it only knows tokens. There could be a shoggoth character but it would not be the real shoggoth. The shoggoth is the thing that gets low loss on the task of predicting the next token. The characters are patterns that emerge in the history of that behavior.