# zulupineapple

Karma: 382
• Maybe I should just let you tell me what framework you are even using in the first place.

I’m looking at the Savage theory from your own https://​​plato.stanford.edu/​​entries/​​decision-theory/​​ and I see U(f)=∑u(f(si))P(si), so at least they have no problem with the domains (O and S) being different. Now I see the confusion is that to you Omega=S (and also O=S), but to me Omega=dom(u)=O.

Furthermore, if O={o0,o1}, then I can group the terms into u(o0)P(“we’re in a state where f evaluates to o0”) + u(o1)P(“we’re in a state where f evaluates to o1″), I’m just moving all of the complexity out of EU and into P, which I assume to work by some magic (e.g. LI), that doesn’t involve literally iterating over every possible S.

We can either start with a basic set of “worlds” (eg, ) and define our “propositions” or “events” as sets of worlds <...>

That’s just math speak, you can define a lot of things as a lot of other things, but that doesn’t mean that the agent is going to be literally iterating over infinite sets of infinite bit strings and evaluating something on each of them.

By the way, I might not see any more replies to this.

• A classical probability distribution over with a utility function understood as a random variable can easily be converted to the Jeffrey-Bolker framework, by taking the JB algebra as the sigma-algebra, and V as the expected value of U.

Ok, you’re saying that JB is just a set of axioms, and U already satisfies those axioms. And in this construction “event” really is a subset of Omega, and “updates” are just updates of P, right? Then of course U is not more general, I had the impression that JB is a more distinct and specific thing.

Regarding the other direction, my sense is that you will have a very hard time writing down these updates, and when it works, the code will look a lot like one with an utility function. But, again, the example in “Updates Are Computable” isn’t detailed enough for me to argue anything. Although now that I look at it, it does look a lot like the U(p)=1-p(“never press the button”).

events (ie, propositions in the agent’s internal language)

I think you should include this explanation of events in the post.

construct ‘worlds’ as maximal specifications of which propositions are true/​false

It remains totally unclear to me why you demand the world to be such a thing.

I’m not sure why you say Omega can be the domain of U but not the entire ontology.

My point is that if U has two output values, then it only needs two possible inputs. Maybe you’re saying that if |dom(U)|=2, then there is no point in having |dom(P)|>2, and maybe you’re right, but I feel no need to make such claims. Even if the domains are different, they are not unrelated, Omega is still in some way contained in the ontology.

I agree that we can put even more stringent (and realistic) requirements on the computational power of the agent

We could and I think we should. I have no idea why we’re talking math, and not writing code for some toy agents in some toy simulation. Math has a tendency to sweep all kinds of infinite and intractable problems under the rug.

<...> then I think the Jeffrey-Bolker setup is a reasonable formalization.

Jeffrey is a reasonable formalization, it was never my point to say that it isn’t. My point is only that U is also reasonable, and possibly equivalent or more general. That there is no “case against” it. Although, if you find Jeffery more elegant or comfortable, there is nothing wrong with that.

do you believe that any plausible utility function on bit-strings can be re-represented as a computable function (perhaps on some other representation, rather than bit-strings)?

I don’t know what “plausible” means, but no, that sounds like a very high bar. I believe that if there is at least one U that produces an intelligent agent, then utility functions are interesting and worth considering. Of course I believe that there are many such “good” functions, but I would not claim that I can describe the set of all of them. At the same time, I don’t see why any “good” utility function should be uncomputable.

I think there is a good reason to imagine that the agent structures its ontology around its perceptions. The agent cannot observe whether-the-button-is-ever-pressed; it can only observe, on a given day, whether the button has been pressed on that day. |Omega|=2 is too small to even represent such perceptions.

I agree with the first sentence, however Omega is merely the domain of U, it does not need to be the entire ontology. In this case Omega={”button has been pressed”, “button has not been pressed”} and P(“button has been pressed” | “I’m pressing the button”)~1. Obviously, there is also no problem with extending Omega with the perceptions, all the way up to |Omega|=4, or with adding some clocks.

We could expand the scenario so that every “day” is represented by an n-bit string.

If you want to force the agent to remember the entire history of the world, then you’ll run out of storage space before you need to worry about computability. A real agent would have to start forgetting days, or keep some compressed summary of that history. It seems to me that Jeffrey would “update” the daily utilities into total expected utility; in that case, U can do something similar.

I can always “extend” a world with an extra, new fact which I had not previously included. IE, agents never “finish” imagining worlds; more detail can always be added

You defined U at the very beginning, so there is no need to send these new facts to U, it doesn’t care. Instead, you are describing a problem with P, and it’s a hard problem, but Jeffrey also uses P, so that doesn’t solve it.

> … set our model to be a list of “events” we’ve observed …
I didn’t understand this part.

If you “evaluate events”, then events have some sort of bit representation in the agent, right? I don’t clearly see the events in your “Updates Are Computable” example, so I can’t say much and I may be confused, but I have a strong feeling that you could define U as a function on those bits, and get the same agent.

This is an interesting alternative, which I have never seen spelled out in axiomatic foundations.

The point would be to set U(p) = p(“button has been pressed”) and then decide to “press the button” by evaluating U(P conditioned on “I’m pressing the button”) * P(“I’m pressing the button” | “press the button”), where P is the agent’s current belief, and p is a variable of the same type as P.

• If you actually do want to work on AI risk, but something is preventing you, you can just say “personal reasons”, I’m not going to ask for details.

I understand that my style is annoying to some. Unfortunately, I have not observed polite and friendly people getting interesting answers, so I’ll have to remain like that.

• OK, there are many people writing explanations, but if all of them are rehashing the same points from Superintelligence book, then there is not much value in that (and I’m tired of reading the same things over and over). Of course you don’t need new arguments or new evidence, but it’s still strange if there aren’t any.

Anyone who has read this FAQ and others, but isn’t a believer yet, will have some specific objections. But I don’t think everyone’s objections are unique, a better FAQ should be able to cover them, if their refutations exist to begin with.

Also, are you yourself working on AI risk? If not, why not? Is this not the most important problem of our time? Would EY not say that you should work on it? Could it be that you and him actually have wildly different estimates of P(AI doom), despite agreeing on the arguments?

As for Raemon, you’re right, I probably misunderstood why he’s unhappy with newer explanations.

• Stampy seems pretty shallow, even more so than this FAQ. Is that what you meant by it not filling “this exact niche”?

By the way, I come from AGI safety from first principles, where I found your comment linking to this. Notably, that sequence says “My underlying argument is that agency is not just an emergent property of highly intelligent systems, but rather a set of capabilities which need to be developed during training, and which won’t arise without selection for it.” which is reasonable and seems an order of magnitude more conservative than this FAQ, which doesn’t really touch the question of agency at all.

• I’m talking specifically about discussions on LW. Of course in reality Alice ignores Bob’s comment 90% of the time, and that’s a problem in it’s own right. It would be ideal if people who have distinct information would choose to exchange that information.

I picked a specific and reasonably grounded topic, “x-risk”, or “the probability that we all die in the next 10 years”, which is one number, so not hard to compare, unless you want to break it down by cause of death. In contrived philosophical discussions, it can certainly be hard to determine who agrees on what, but I have a hunch that this is the least of the problems in those discussions.

A lot of things have zero practical impact, and that’s also a problem in it’s own right. It seems to me that we’re barely ever having “is working on this problem going to have practical impact?” type of discussions.

• I want neither. I observe that Raemon cannot find an up to date introduction that he’s happy with, and I point out that this is really weird. What I want is an explanation to this bizarre situation.

Is your position that Raemon is blind, and good, convincing explanations are actually abundant? If so, I’d like to see them, it doesn’t matter where from.

• “The world is full of adversarial relationships” is pretty much the weakest possible argument and is not going to convince anyone.

Are you saying that MIRI website has convincing introductory explanation of AI risk, the kind that Raemon wishes he had? Surely he would have found them already? If there aren’t, then, again, why not?

# Why We Disagree

25 Oct 2023 10:50 UTC
7 points
• If our relationship to them is adversarial, we will lose. But you also need to argue that this relationship will (likely) be adversarial.

Also, I’m not asking you to make the case here, I’m asking why the case is not being made on front page of LW and on every other platform. Would that not help with advocacy and recruitment? No idea what “keeping up with current events” means.

• I certainly don’t evaluate my U on quarks. Omega is not the set of worlds, it is the set of world models, and we are the ones who decide what that model should be. In “procrastination” example you intentionally picked a bad model, so it proves nothing (if the world only has one button we care about, then maybe |Omega|=2 and everything is perfectly computable).

Further on, it seems to me that if we set our model to be a list of “events” we’ve observed, then we get the exact thing you’re talking about. Although you’re imprecise and inconsistent about what an event is, how it’s represented, how many there are, so I’m not sure if that’s supposed to make anything more tractable.

In general, asking questions about the domain of U (and P!) is a good idea, and something that all introductions to Utility lack. But the ease with which you abandon a perfectly good formalism is concerning. LI is cool, and it doesn’t use U, but that’s not an argument against U, at best you can say that U was not as useful as you’d hoped.

My own take is that the domain of U is the type of P. That is, U is evaluated on possible functions P. P certainly represents everything the agent cares about in the world, and it’s also already small and efficient enough to be stored and updated in the agent, so this solution creates no new problems.

• Seems like a red flag. How can there not be a more up-to-date one? Is advocacy and recruitment not a goal of AI-risk people? Are they instrumentally irrational? What is preventing you from writing such a post right now?

Most importantly, could it be that people struggle to write a good case for AI-risk, because the case for it is actually pretty weak, when you think about it?

• The link is broken. I was only able to find the article here, with the wayback machine.

• In the examples, sometimes the problem is people having different goals for the discussion, sometimes it is having different beliefs about what kinds of discussions work, and sometimes it might be about almost object-level beliefs. If “frame” refers to all of that, then it’s way too broad and not a useful concept. If your goal is to enumerate and classify the different goals and different beliefs people can have regarding discussions, that’s great, but possibly to broad to make any progress.

My own frustration with this topic is lack of real data. Apart from “FOOM Debate”, the conversations in your post are all fake. To continue your analogy in another comment, this is like doing zoology by only ever drawing cartoons of animals, without ever actually collecting or analyzing specimens. Good zoologists would collect many real discussions, annotate them, classify them, debate about those classifications, etc. They may also tamper with ongoing discussions. You may be doing some of that privately, but doing it publicly would be better. Unfortunately there seem to be norms against that.

• That’s what I think every time I hear “history repeats itself”. I wish Scott had considered the idea.

The biggest claim Turchin is making seems to be about the variance of the time intervals between “bad” periods. Random walk would imply that it is high, and “cycles” would imply that it is low.

• For example, say I wanted to know how good/​enjoyable a specific movie would be.

My point is that “goodness” is not a thing in the territory. At best it is a label for a set of specific measures (ratings, revenue, awards, etc). In that case, why not just work with those specific measures? Vague questions have the benefit of being short and easy to remember, but beyond that I see only problems. Motivated agents will do their best to interpret the vagueness in a way that suits them.

Is your goal to find a method to generate specific interpretations and procedures of measurement for vague properties like this one? Like a Shelling point for formalizing language? Why do you feel that can be done in a useful way? I’m asking for an intuition pump.

Certainly there is some vagueness, but it seems that we manage to live with it. I’m not proposing anything that prediction markets aren’t already doing.

• “What is the relative effectiveness of AI safety research vs. bio risk research?”

If you had a precise definition of “effectiveness” this shouldn’t be a problem. E.g. if you had predictions for “will humans go extinct in the next 100 years?” and “will we go extinct in the next 100 years, if we invest 1M into AI risk research?” and “will we go extinct, if we invest 1M in bio risk research?”, then you should be able to make decisions with that. And these questions should work fine in existing forecasting platforms. Their long term and conditional nature are problems, of course, but I don’t think that can be helped.

“How much value has this organization created?”

That’s not a forecast. But if you asked “How much value will this organization create next year?” along with a clear measure of “value”, then again, I don’t see much of a problem. And, although clearly defining value can be tedious (and prone to errors), I don’t think that problem can be avoided. Different people value different things, that can’t be helped.

One solution attempt would be to have an “expert panel” assess these questions

Why would you do that? What’s wrong with the usual prediction markets? Of course, they’re expensive (require many participants), but I don’t think a group of experts can be made to work well without a market-like mechanism. Is your project about making such markets more efficient?

# Look at the Shape of Your Utility Distribution

30 Aug 2019 23:27 UTC
15 points