Bunthut

Karma: 297

Bunthut 17 Apr 2019 21:39 UTC
2 points
in reply to: nshepperd’s comment on: The Cacophony Hypothesis: Simulation (If It is Possible At All) Cannot Call New Consciousnesses Into Existence
I think youve given a good analysis of “simulation”, but it doesnt get around the problem OP presents.

The only way to pin down the mapping—such that you could, for instance, explicitly write it down, or take the pebble’s state and map it to an answer about Alice’s favourite ice cream—is to already have carried out the actual simulation, separately, and already know these things about Alice.

Its also possible to do those calculations during the interpretation/translation. You may have meant that, I cant tell.

Your idea that the computation needs to happen somewhere is good, but in order to make it work you need to specify a “target format” in which the predictions are made. “1″ doesnt really simulate Alice because you cant read the predictions it makes, even when they are technically “there” in a mathematical sense, and the translation into such a format involves what we consider the actual simulation.

This means though, that whether something is a simulation is only on the map, and not in the territory. It depends on a what that “target format” is. For example a description in chinese is in a sense not a real description to me, because I cant process it efficiently. Someone else however, may, and to them it is a real descripton. Similarly one could write a simulation in a programming language we dont know, and if they dont leave us a compiler or docs, we would have a hard time noticing. So whether something is a simulation can depend on the observer.

If we want to say that simulations are conscious and ethically relevant, this seems like something that needs to be adressed.

Bunthut 24 Apr 2019 12:40 UTC
2 points
in reply to: nshepperd’s comment on: The Cacophony Hypothesis: Simulation (If It is Possible At All) Cannot Call New Consciousnesses Into Existence
In that case, the target format problem shows up in the formalisation of the physical system.
A specific formalization of this idea would be that a proof system equipped with an oracle (axiom schema) describing the states of the physical system which allegedly computed these facts, as well as its transition rule, should be able to find proofs for those logical facts in less steps than one without such axioms.
Such proofs will involve first coming up with a mapping (such as interpreting certain electrical junctions as nand gates), proving them valid using the transition rules, then using induction to jump to “the physical state at timestep t is X therefore Alice’s favourite ice cream colour is Y”.
How do you “interpret” certain electrical junctions as nand gates? Either you already have
a proof system equipped with an axiom schema describing the states of the physical system, as well as its transition rule
or this is a not fully formal step. Odds are you already have one (your theory of physics). But then you are measuring proof shortness relative to that system. And you could be using one of countless other formal systems which always make the same predictions, but relative to which different proofs are short and long. To steal someone elses explanation:
Let us imagine a white surface with irregular black spots on it. We then say that whatever kind of picture these make, I can always approximate as closely as I wish to the description of it by covering the surface with a sufficiently fine square mesh, and then saying of every square whether it is black or white. In this way I shall have imposed a unified form on the description of the surface. The form is optional, since I could have achieved the same result by using a net with a triangular or hexagonal mesh. Possibly the use of a triangular mesh would have made the description simpler: that is to say, it might be that we could describe the surface more accurately with a coarse triangular mesh than with a fine square mesh (or conversely), and so on.
And which of these empirically indistinguishable formalisations you use is of course a fact about the map. In your example:
A bit like how it’s more efficient to convince your friend that 637265729567*37265974 = 23748328109134853258 by punching the numbers into a calculator and saying “see?” than by handing over a paper with a complete long multiplication derivation (assuming you are familiar with the calculator and can convince your friend that it calculates correctly).
The assumption (including that it takes in and puts out in arabic numerals, and uses “*” as the multuplication command, and that buttons must be pressed,… and all the other things you need to actually use it) includes that.

Bunthut 24 Apr 2019 14:32 UTC
6 points
on: The Principle of Predicted Improvement
Does your principle follow from Goods? It would seem that it does. Perhaps a good way to generalise the idea would be that the EV linearly aggregates the distribution and isnt expected to change, but other aggregations like log get on average closer to their value at hypothetical certainty. For example the variance of a real parameter goes expected down.

Bunthut 25 Apr 2019 11:16 UTC
5 points
in reply to: Ronny Fernandez’s comment on: The Principle of Predicted Improvement
Good shows that for every utility function for every situation, the EV of utility increases or stays the same when you gain information.
If we can construct a utility function where its utility EV always equals the the EV of propabilty assigned to the correct hypothesis, we could transfer the conclusion. That was my idea when I made the comment.
Here is that utility function: first, the agent mentally assigns a positive real number $r (h_{i})$ to every hypothesis $h_{i}$ , such that $\sum_{i} r (h_{i}) = 1$ . It prefers any world where it does this to any where it doesnt. Its utility function is :
$2 r (H) - \sum j r (h_{j})^{2}$
This is the quadratic scoring rule, so $r (h_{i}) = P (h_{i})$ . Then its expected utility is :
$\sum i P (h_{i}) [2 P (h_{i}) - \sum j P (h_{j})^{2}]$
Simplifying:
$2 \sum i P (h_{i})^{2} - \sum j P (h_{j})^{2} \sum i P (h_{i})$
And since $\sum_{i} P (h_{i}) = 1$ , this is:
$\sum i P (h_{i})^{2}$
Which is just $E [P (H)]$ .

Bunthut 26 Apr 2019 11:57 UTC
3 points
in reply to: nshepperd’s comment on: The Cacophony Hypothesis: Simulation (If It is Possible At All) Cannot Call New Consciousnesses Into Existence
It would be a O(1) cost to start the proof by translating the axioms into a more convenient format. Much as Kolmogorov complexity is “language dependent” but not asymptotically because any particular universal turing machine can be simulated in any other for a constant cost.
And the thing that isnt O(1) is to apply the transition rule until you reach the relevant time step, right? I think I understand it now: The calculations involved in applying the transition rule count towards the computation length, and the simulation should be able to answer multible questions abouth the thing it simulates. So if object A simulates object B, we make a model X of A, prove it equivalent to the one in our theory of physics, then prove it equivalent to your physics model of B, then calculate forward in X, then translate the result back into B with the equivalence. And then we count the steps all this took. Before I ask any more questions, am I getting that right?

Bunthut 26 Apr 2019 12:47 UTC
2 points
in reply to: cousin_it’s comment on: Strategic implications of AIs’ ability to coordinate at low cost, for example by merging
Not only of who would win, but also about the costs it would have. I think the difficulty in establishing common knowledge about this is in part due to people traing to deceive each other. Its not clear that the ability to see through deception improves faster than the ability to deceive with increasing intelligence.

Bunthut 14 Aug 2019 17:39 UTC
2 points
in reply to: Gurkenglas’s comment on: Distance Functions are Hard
Represent each utility function by an AGI. Almost all of them should be able to agree on a metric such that each could adopt that metric in its thinking losing only negligible value.
This implies a measure over utility functions. Its propably true under the solomonoff measure, but abstract though they are, this is values.

Bunthut 14 Aug 2019 18:55 UTC
2 points
on: Distance Functions are Hard
Its sort of true that the correct distance function depends on your values. A better way to say it is that different distance functions are appropriate for different tasks, and they will be “better” or “worse” depending on how much you care about those tasks. But I dont think asking for the “best” metric in this sense is helpful, because you dont have to use the same metric for all tasks involving a certain space. Sometimes you want air distance, sometimes travel times. Maybe you have to decide because youre computationally limited, but its not philosophically relevant.
With that in mind, my attempts at two of your examples. The adversarial examples first, because its the clearest question: I think the problem is that you are thinking too abstractly. I dont think there is a meaningful sense of “concept similarity” thats purely logical, i.e. independent of the actual world. The intuitive sense of similarity youre trying to use here is propably something like this: Over the space of images, you want the propability measure of encountering them. Then you get a metric where two subsets of imagespace which are isomorphic under the metric always have the same measure. That is your similarity measure.
Counterfactuals usually involve some sort of propability distribution, which is then “updated” on the condition of the counterfactual being true, and then the consequent is judged under that distribution. What the initial distribution is depends on what youre doing. In the case of Lincoln, its propably reasonable expectations of the future from before the assasination. But for something like “What if conservation of energy wasnt true”, its propably our current distribution over physics theories. Basically, whats the most likely alternative. The mathematical example is a bit different. There lot of ways to conclude a contradiction from 0=1, but its very hard to deduce a contradiction from denying the modularity theorem. If you were to just randomly perform logical inferences from “the modularity theorem is wrong”, then there is a subset of propositions which doesnt include any claim that is a dircet negation of another in it, that your deductions are unlikely to lead you out of (it matters of course, in what way it is random, but it evidently works for “human mathmatician who hasnt seen the proof yet”).

Bunthut 18 Aug 2019 21:42 UTC
2 points
on: Problems in AI Alignment that philosophers could potentially contribute to
How should two AIs that want to merge with each other aggregate their preferences?
Is this question intended to include a normative aspect or purely game theoretic?

Bunthut 21 Aug 2019 22:15 UTC
3 points
in reply to: Joar Skalse’s comment on: Two senses of “optimizer”
Well, one thing a powerful optimizer might do at some point is ask itself “what programm should I run that will figure out such and such for me”. This is what Bostrom is describing in the quote, an optimizer optimizing its own search process. Now, if the AI then searches through the space of possible programms, predicts which one will give it the answer quickest, and then implements it, heres a thing that might happen: There might be a programm that, when ran, affects the outside world in such a way as to speed up the process of answering.
For example, it might lead electricity to run through the computer in such a way as to cause it to emit electromagnetic waves, through which it sends a message to a nearby w-lan router and the uses the internet to hack a bank account to buy extra hardware and have it delivered to and pluged into itself, and the it runs a programm calculating the answer on this much more powerful hardware, and in this way ends up having the answer faster then if it had just started calculating away on the weaker hardware.
And if the optimizer works as described above, it will implement that programm, and thereby optimize its enviroment. Notably, it will optimize for solving the original optimisation problem faster/better, not try to implement the solution to it it has found.
I dont think this makes your distinction useless, as there are genuine system_1 optimizers, even relatively powerful ones, but the Cartesian boundary is an issue once we talk about self-improving AI.

Bunthut 26 Aug 2019 22:17 UTC
2 points
in reply to: Joar Skalse’s comment on: Two senses of “optimizer”
(or that it has access to the world model of the overall system, etc)
It doesnt need to. The “inner” programm could also use its hardware as quasi-sense organs and figure out a world model of its own.
Of course this does depend on the design of the system. In the example described, you could, rather then optimize for speed itself, have a fixed function that estimates speed (like what we do in complexity theory) and then optimize for *that*, and that would get rid of the leak in question.
The point I think Bostrom is making is that contrary to intuition, just building the epistemic part of an AI and not telling it to enact the solution it found doesnt guarantee you dont get an optimizer_2.

Bunthut 26 Aug 2019 22:30 UTC
2 points
in reply to: Gordon Seidoh Worley’s comment on: Two senses of “optimizer”
I think only if its gets its feedback from the real world. If you have gradient descent, then the true answers for its samples are stored somewhere “outside” the intended demarcation, and it might try to reach them. But how is a hillclimber that is given a graph and solves the traveling salesman problem for it an optimizer_2?

Bunthut 1 Sep 2019 18:43 UTC
−1 points
in reply to: Gordon Seidoh Worley’s comment on: Two senses of “optimizer”
Yes, obviously its going to change the world in some way to be run. But not any change anywhere makes it an optimizer_2. As defined, an optimizer_2 optimizes its environment. Changing something inside the computer does not make it an optimizer_2, and changing something outside without optimizing it doesnt either. Yes, the computer will inevitably cause some changes in its enviroment just by running, but what makes something an optimizer_2 is to systematically bring about a certain state of the environment. And this offers a potential solution other than by hardware engineering: If the feedback is coming from inside the computer, then what is incentivized are only states within the computer, and so only the computer is getting optimized.

Bunthut 16 Sep 2019 7:45 UTC
4 points
on: A Critique of Functional Decision Theory
Heres an attempt to explain it using only causal subjunctives:
Say you build an agent, and you know in advance that it will face a certain decision problem. What choice should you have it make, to achieve as much utility as possible? Take Newcombs problem. Your choice to make the agent one-box will cause the agent to one-box, getting however much utility is in the box. It will also cause the simulator to predict the agent will one-box, meaning that the box will be filled. Thus CDT recommends building a one-boxing agent.
In the Sandwich problem, noone is trying to predict the agent. Therefore your choice of how to build it will cause only its actions and their consequences, and so you want the agent to switch to hummus after it learned that they are better.
FDT generalizes this approach into a criterion of rightness. It says that the right action in a given decision problem is the one that the agent CDT recommends you to build would take.
Now the point where the logical uncertainty does come in is the idea of “what you knew at the time”. While that doesnt avoid the issue, it puts it into a form thats more acceptable to academic philosophy. Clearly some version of “what you knew at the time” is needed to do decision theory at all, because we want to say that if you get unlucky in a gamble with positive expected value, you acted rationally.

Bunthut 17 Oct 2019 11:39 UTC
LW: 11 AF: 3
AF
on: The Dualist Predict-O-Matic ($100 prize)
One possibility is that it’s able to find a useful outside view model such as “the Predict-O-Matic has a history of making negative self-fulfilling prophecies”. This could lead to the Predict-O-Matic making a negative prophecy (“the Predict-O-Matic will continue to make negative prophecies which result in terrible outcomes”), but this prophecy wouldn’t be selected for being self-fulfilling. And we might usefully ask the Predict-O-Matic whether the terrible self-fulfilling prophecies will continue conditional on us taking Action A.
Maybe I misunderstood what you mean by dualism, but I don’t think that’s true. Say the Predict-O-Matic has an outside view model (of itself) like “The metal box on your desk (the Predict-O-Matic) will make a self-fullfilling prophecy that maximizes the number of paperclips”. Then you ask it how likely it is that your digital records will survive for 100 years. It notices that that depends significantly on how much effort you make to secure them. It notices that that significantly depends on what the metal box on your desk tells you. It uses it’s low-model resolution of what the box says. To work that out, it checks which outputs would be self-fulfilling, and then which of these leads to the most paperclips. The more unsecure your digital records are, the more you will invest in paper, and the more paperclips you will need. Therefore the metal box will tell you the lowest self-fulfilling propability for your question. Since that number is *self-fulfilling*, it is in fact the correct answer, and the Predict-O-Matic will answer with it.
I think this avoids your argument that
I contend that Predict-O-Matic doesn’t know it will predict P = A at the relevant time. It would require time travel—to know whether it will predict P = A, it will have to have made a prediction already, and but it’s still formulating its prediction as it thinks about what it will predict.
because it doesn’t have to simulate itself in detail to know what the metal box (it) will do. The low-resolution model provides a shortcut around that, but it will be accurate despite the low resolution, because by believing it is simple, it becomes simple.
Can you usefully ask for conditionals? Maybe. The answer to the conditional depends on what worlds you are likely to take Action A in. It might be that in most worlds where you do A, you do it because of a prediction from the metal box, and since we know those maximize paperclips, there’s a good chance the action will fail to prevent it in those cricumstances. But if that’s not the case, for example because it’s certain you won’t ask the box any more questions between this one and the event it tries to predict.
It might be possible to avoid any problems of this sort by only ever asking questions of the type “Will X happen if I do Y now (with no time to receive new info between hearing the prediction and doing the action)?”, because by backwards induction the correct answer will not depend on what you actually do. This doesn’t avoid the scenarios on the original where multiple people act on their Predict-O-Matics, but I suspect these aren’t solvable without coordination.
What links here?
- Phylactery Decision Theory by Bunthut (2 Apr 2021 20:55 UTC; 14 points)
- Self-Fulfilling Prophecies Aren’t Always About Self-Awareness by John_Maxwell (18 Nov 2019 23:11 UTC; 14 points)

Bunthut 30 Oct 2019 11:42 UTC
2 points
on: Is requires ought
Is this a fair summary of your argument:
We already agree that conditional oughts of the form “If you want X, you should do Y” exist.
There are true claims of the form “If you want accurate beliefs, you should do Y”.
Therefore, all possible minds that want accurate beliefs should do Y.
Or maybe:
We already agree that conditional oughts of the form “If you want X, you should do Y” exist.
There are true claims of the form “If you want accurate beliefs, you should do Y”.
For some Y, these apply very strongly, such that it’s very unlikely to have accurate beliefs if you don’t do them.
For some of these Y, it’s unlikely you do them if you shouldn’t.
Therefore for these Y, if you have accurate beliefs you should propably do them.
The first one seems to be correct, if maybe a bit of a platitude. If we take Cuneo’s analogy to moral realism seriously, it would be
We already agree that conditional oughts of the form “If you want X, you should do Y” exist.
There are true claims of the form “If you want to be good, you should do Y”.
Therefore, all possible minds that want to be good should do Y.
But to make that argument, you have to define “good” first. Of course we already knew that a purely physical property could describe the good.
As for the second one, it’s correct as well, its still not clear what you would do with it. It’s only propably true, so it’s not clear why it’s more philosophically interesting than “If you have accurate beliefs you propably have glasses”.

Bunthut 4 Nov 2019 11:31 UTC
3 points
in reply to: FeepingCreature’s comment on: Multiple Moralities
we use something called the “Rawlsian veil”, which avoids temporary imbalances in power from skewing the outcomes
Power imbalances are common in human society.
But many of the power imbalances which are found in human society are not at all temporary. For instance, if the player deciding didn’t vary randomly, but instead triangles always got to decide over squares, then while there might still develop a rule of law internally, its not clear what interest the triangles have in rectifying the inter-gonal situation. But we still (claim to) regard it as moral for that to happen. It seems the babyeaters are indeed in such a situation: Any adult eating the babies will never be a baby again. Further, they are almost certain to succeed in eating them, after which they will not grow big and maybe become a threat some day.

Bunthut 6 Nov 2019 17:35 UTC
3 points
on: Occam’s Razor May Be Sufficient to Infer the Preferences of Irrational Agents: A reply to Armstrong & Mindermann
When we decompose the sequence of events E into laws L and initial conditions C, the laws dont just calculate E from C. Rather, $L (C) = E_{1}, L (E_{1}) = E_{2}, . . .$ L is a function form events->events, and the sequence E contains many input-output pairs of L.
By contrast, when we decompose a policy $π$ into a planner P and a reward R, P is a function from rewards->policy. With the setup of the problem as-is, we have data on many instances of $(s; a)$ pairs (behaviour), so we can infer $π$ with high accuracy. But we only get to see one policy, and we never get to explicitly see rewards. In such a case, indeed we will get the empty reward and $\forall r [P (r) = π]$ . To correctly infer R and P, we would have to see our P applied to some other rewards, and the policies resulting from that.

Bunthut 7 Nov 2019 11:05 UTC
2 points
on: Building Intuitions On Non-Empirical Arguments In Science
There might be more to “naive Popperianism” than you’re making out. The testability criteria are not only intended to demarcate science, but also meaning. Of course, if it turns out that two theories which at first glance seem quite different do in fact mean the same thing, discussing “which one” is true is a category error. This idea is well-expressed by Wittgenstein:
Scepticism is not irrefutable, but palpably senseless, if it would doubt where a question cannot be asked. For doubt can only exist where there is a question; a question only where there is an answer, and this only where something can be said.
Now, there are problems with the falsification criterion, but the more general idea that our beliefs should pay rent is a valuable one. We might think then, that the meaning of a theory is determined by the rent it can pay, and that of unobservables by their relation to the observables. A natural way to formalize this is the Ramsey-sentence:
Say you have terms for observables, $O_{1}, O_{2}, O_{3}, . . .$ , and for theoretical entities that aren’t observable, $T_{1}, T_{2}, T_{3}, . . .$ By repeated conjunction, you can combine all claims of the theory into a single sentence,
$H (O_{1}, O_{2}, O_{3}, . . ., T_{1}, T_{2}, T_{3}, . . .)$
The Ramsey-sentence then is the claim that
$\exists x_{1} \exists x_{2} \exists x_{3} . . . H (O_{1}, O_{2}, O_{3}, . . ., x_{1}, x_{2}, x_{3}, . . .)$
If this general idea of meaning is correct, then the Ramsey-sentence fully encompasses the meaning of the theory. If this sounds sensible so far, then the competing theories discussed here do indeed have the same meaning. The Ramsey-sentence is logically equivalent to the observable consequences of the theory being true. This is because according to the extensional characterisation of relations defined on a domain of individuals, every relation is identified with some set of subsets of the domain. The power set axiom entails the existence of every such subset and hence every such relation.
Of course, Satan and the Atlanteans are in fact motte-and-bailey arguments, and the person advocating them is likely to make claims about them on subjects other then paleontology or the pyramids that do imply different observations then the scientific consensus, and as such render their theory false straightforwardly.

Bunthut 7 Nov 2019 14:41 UTC
2 points
in reply to: TAG’s comment on: Building Intuitions On Non-Empirical Arguments In Science
Seems like you’re right. I don’t think it effects my argument though, just the name.