Taken.
KnaveOfAllTrades
Then there’s another half—when the wrongness of something is missed because it does not (technically by an approximate dictionary definition) fall into a pre-existing category in the ‘Wrong Cluster’. Examples: Forced consent, dishonesty that’s ‘technically not lying’, extortion that’s ‘technically not stealing’ getting a free ride.
So we have a general ‘linguistic ethical determinism’ (better name anybody?) fallacy, wherein something is considered wrong if and only if it comes under an existing Category of Wrong according to a pedantic definition. (This is itself, of course, a corollary of human obsession with linguistic categories, which I gather is covered in A Human’s Guide to Words.)
“First, consider that the following formula detects 2-ness” Consider changing this to something like “First, consider that the following formula detects 2-ness among the numbers as we want them to be”? It wasn’t immediately obvious to me that the starred chain’s ‘-1*’ didn’t satisfy the equation.
Er, also, you might want to have only one of the interlocutors beginning sentences with “Er” lest we lose track of which is supposed to be current-you. ;)
But yeah, a nice exposition!
Disclaimer: I am not familiar with the formalities of Turing machines, and am quite possibly talking out of my ass, and probably not thinking along the same lines as Eliezer here. But it might be possible to salvage the ideas into something more formal/correct.
Consider a model containing exactly the natural numbers and the starred chain. Then we might have a Turing machine which starts at 0 and 0 , halts if it is fed 0, and continues to the successor otherwise. Then it never halts on the natural chain, but halts immediately on the starred chain. Here, a Turing machine presumably operates on every chain in a model meeting the first-order Peano axioms.
So in general, it might be meaningful to talk of a Turing machine acting within a model containing chains, which is closed on every given chain (e.g. it can’t jump from 0 to 0 ), and which could therefore be said to be associated by a ‘halt time’ function, h, which maps each chain (or each chain’s zero, if you like) to a nonnegative number in that chain which is the halting time on that chain. So in my above example, we might leave h(0) undefined, because the machine never halts on the naturals, and h(0 )=0*, because it halts immediately on that chain. This would then completely define the halting time over chains. (In fact, we could probably drop closure if we wanted to.)
Or T and F for ‘is true’ and ‘is false’, or the T and upside-down T often used for (tautological) truth and (tautological) falsity for true and false.
Is Julian Barbour’s book suitable for this list, or does it, say, require too big a detour through detailed physics?
Whether it’s scope insensitivity/defensible or not can be resolved by clarifying two things:
1) Jordan_2010′s utility function
2) The purpose of disgust/{upset and outrage}Say disgust is a feeling that arises only in response to certain types of ignorance, and a feeling which serves terminal values by neurochemically compelling one to reduce the ignorance, so increase awareness in such a way as to increase one’s utility.
Then disgust ‘would make sense at’ ignorance, and not at the terminal bad itself.
Eliezer gave another example: It might not be effective (‘unlikely to change the universe’s mind’) to be upset and outraged at matters of fact, and might be effective to be so at people with the power to reduce the utility-eating facts.
It might’ve been the case that it seemed initially that Jordan_2010 was suffering scope insensitivity due to a different initial sense of ‘disgust’, such as a general dismay that compels one to action. In that case, ceteris paribus, the terminal value should cause much more disgust, because it is the worse thing, and this general sense of disgust is more dense on terminal values than instrumental values. Then after reading Eliezer’s comment mentioning upset and outrage, your sense of disgust/etc. changed to something more like what I mentioned earlier in this comment.
Upvoted. This article makes an extremely important point.
I use correct/incorrect to refer to ‘prediction and outcome coincided’/‘made what turned out to be the most favourable choice’/etc.
I use right/wrong to refer to ‘made the best prediction’/‘chose the outcome that, as far as could be told, would be most favourable’/etc.
Smith was correct and wrong.
Although one might expect ambiguity to be a problem with these terms (e.g. ‘right’ becoming overloaded to the point of equivocation), in my experience it hasn’t been one once they have been explained.
The thesis of the right/correct distinction is defying the data.
The antithesis is regret of rationality, i.e. predictably losing due to a flaw in a model. This is a hazard that arises from devotion to a theory or undervaluing data, which lead to insistence one is still right even as the defeats pile up.
As for the No True Fencing Victory thingy: that’s simply insufficient correspondence between one’s internal understanding (e.g. one visualises winning in a fifteen-point whitewash, specifies ‘winning’, and is saved from the jaws of defeat by a technicality.) Such cases are of being too imprecise or inaccurate. I generally lean very heavily towards ‘a win is a win’, because anything else often seems to stem from an unrealistic expectation of perfect specification.
I’s not clear that the two can be reconciled. It’s also not clear that the two can’t be reconciled.
Suppose for simplicity there are just hedons and dolors into which every utilitarian reaction can be resolved and which are independent. Then every event occupies a point in a plane. Now, ordering real numbers (hedons with no dolorific part or dolors with no hedonic part) is easy and more or less unambiguous. However, it’s not immediately obvious whether there’s a useful way to specify an order over all events. A zero hedon, one dolor event clearly precedes a one hedon, zero dolor event in the suck--->win ordering. But what about a one hedon, one dolor event vs. a zero hedon, zero dolor event?
It might seem like that we can simply take the signed difference of the parts (so in that last example, 1-1=0-0 so the events are ‘equal’), but the stipulation of independence seems like it forbids such arithmetic (like subtracting apples from oranges).
Orders on the complex numbers that have been used for varying applications (assuming this has been done) might shed some light on the matter.
Clearly a CEV over all complex (i.e. consisting of exactly a possibly-zero hedonic part and possibly-zero dolorific part) utilities would afford comparison between any two events, but this doesn’t seem to help much at this point.
Beyond knowledge of the physical basis of pleasure and pain, brain scans of humans experiencing masochistic pleasure might be a particularly efficient insight generator here. Even if, say, pure pleasure and pure pain appear very differently on an MRI, it might be possible to reduce them to a common unit of utilitarian experience that affords direct comparison. On the other hand, we might have to conclude that there are actually millions of incommensurable ‘axes’ of utilitarian experience.
This post is excellent. Part of this is the extensive use of clear examples and the talking through of anticipated sticking points, objections, and mistakes, and its motivating, exploratory approach (not plucked out of thin vacuum).
For example, if we have decided that we would be indifferent between a tasty sandwich and a 1⁄500 chance of being a whale for tomorrow, and that we’d be indifferent between a tasty sandwich and a 30% chance of sun instead of the usual rain, then we should also be indifferent between a certain sunny day and a 1⁄150 chance of being a whale.
I think you didn’t specify strong enough premises to justify this deduction; I think you didn’t rule out cases where your utility function would depend on probability and outcome in such a way that simply multiplying is invalid. I might have missed it.
Edit: D’oh! Never mind. This is the whole point of an Expected Utility theorem...
...{50%: sunny+sandwich; 50% baseline} and {50%: sunny; 50%: sandwich}, and other such bets. (We need a better solution for rendering probability distributions in prose).
I doubt that significantly better compression is possible. I expect that communicating, uncompressed, the outcome and the probability is necessary, so stronger compression seems doubtful than what you did, which seems minimal with respect to those constraints. However, you might have been referring to clarity more generally.
I would avoid the use of some of the more grim examples in this context. Putting nonconsensual, violent sex, torture, and ruination of vulnerable people through mental manipulation alongside ice cream, a day as a whale, and a sunny day would overstep my flippant-empathetic-Gentle-depressing threshold, and it seems like it would be possible to come up with comparably effective examples that didn’t. Make of that what you will. (I encourage others to reply with their own assessment, particularly those who also felt (slightly) uncomfortable on this point, since I imagine their activation energy for saying so would be highest.)
That settled that quickly. Thanks.
Then I suppose the next question in this line would be: To what extent can we impose useful orders on R^2? (I’d need to study the proof in more detail, but it seems that the no-go theorem on C arises from its ring structure, so we have to drop it.) I’m thinking the place to start is specifying some obvious properties (e.g. an outcome with positive hedonic part and zero dolorific part always comes after the opposite, i.e. is better), though I’m not sure if there’d be enough of them to begin pinning something down.
Edit: Or, oppositely, chipping away at suspect ring axioms and keeping as much structure as possible. Though if it came down to case-checking axioms, it might explode.
Yeah, absolute value is the second-most obvious one, but I think it breaks down:
It seems that if we assume utility to be a function of exactly (i.e. no more and no less than) hedons and dolors in R^2, we might as well stipulate that each part is nonnegative because it would then seem that any sense of dishedons must be captured by dolors and vice versa. So it seems that we may assume nonegativity WLOG. Then given nonnegativity of components, we can actually compare outcomes with the same absolute value:
Given nonnegativity, we can simplify (I’m pretty sure, but even if not, I think a slightly modified argument still goes through) our metric from sqrt(h^2+d^2) (where h,d are the hedonic and dolorific parts) to just d+h. Now suppose that (h1,d1) and (h2,d2) are such that h1+d1=h2+d2. Then:
1) If h1d2 and so (h1,d1) is clearly worse than (h2,d2)
2) If h1=h2, then d1=d2 and equipreferable
3) If h1>h2, then d1<d2 and so (h1,d1) is clearly better than (h2,d2)So within equivalence classes there will be differing utilities.
Moreover, (0,2)<<(0,0)<<(2,0) but the LHS and RHS fall in the same equivalence classs under absolute value. So the intervals of utility occupied by equivalence classes can overlap. (Where e.g. ‘A<<B’ means ‘B is preferable over A’.)
Hence absolute value seems incompatible with the requirements of a utility ordering.
~
The most obvious function of (h,d) to form equivalence classes is h minus d as in my earlier comment, but that seems to break down (if we assume every pair of elements in a given equivalence class has the same utility) by its reliance on fungibility of hedons and dolors. A ‘marginal dolor function’ that gives the dolor-worth of the next hedon after already having x hedons seems like it might fix this, but it also seems like it would be a step away from practicality.
(Began reading. Didn’t click through onto paper. In the post, got to where Bostrom is quoted thus:)
For example, even after an expected-utility-maximizing agent had built 32 paperclips, it could use some extra resources to verify that it had indeed successfully built 32 paperclips meeting all the specifications (and, if necessary, to take corrective action).”
Yes, it could. But would it? If I write a script to, say, generate a multiset of the prime factors of a number by trial and error, and run it on a computer, the computer simply performs the actions encoded into the script.
Now suppose I appended code to my script to check that the elements of the multiset did indeed constitue a prime factorisation by checking the primacy of each element and checking that their product returned the original number. Then one might call what the updated script does (or we might say, what the script tells a computer to do) ‘checking’. All this means is that we’ve performed a test to increase our confidence in a proposed solution.
But another sense of ‘checking’ emerges: Suppose I have someone check that some books are in some sort of alphanumeric order. I don’t tell them the fact that this is in order to put the books on a library shelf correctly. In this situation, it seems that the statement ‘The helper checked that the books were in order’ is clearly true, but the statement ‘The helper checked that the books were ready to be shelved’ seems less intuitive.
It seems, then, that maybe saying that the script/computer itself checks the correctness of the prime factorisation was sloppy if we use the second sense of ‘checking’; I who wrote it was checking by using the script, but the script itself, lacking knowledge of what it was terminally checking, could not be said to be checking the factorisation, just as the sorting helper could not be said to be checking shelf-readiness even as they could be said to be checking order.
Checking is pretty much just applying tests to a proposed solution in order to reach a more reliable understanding of the plausibility of the solution. So unless the agent is ‘programmed’/wired to do such tests, it won’t necessarily do so. Also, if the programming/wiring is not good in terms of correspondence to the intended task (e.g. if my prime factorisation script fails to consider the multiplicity of prime factors or the 32-clipper is programmed with tokens that do not refer), the actions will be taken, not meet the intended target, and not be checked.
This is why algorithms have to be proved to work; throwing some steps together that seem sorta like the right idea won’t yield a knowably accurate method.
(Finished reading the post except solution.)
Go meta or go home. Even if the task is to achieve X with probability p, once this is translated into an algorithm that is performed, nothing special happens. For example, say I get heads if a robot flips a coin and it comes up heads. If I program the robot to ensure ice cream with p=0.5 by (the ‘by’ being necessary because without actually specifying the algorithm, I could be referring to any in a whole class of ways to program a robot to do this, only some of which work, and only some of which check) flipping a coin, the goal will be achieved immediately and no checking will take place.
TL;DR: Taboo ‘check’ or ascertain its meaning by reduction to testing using suitable thought experiments, or by looking at brains to see what physical phenomena correspond to the act of checking.
(After reading solution, comments.)
Manfred: Elegantly put.
The difference between being ‘p certain’ and knowing that one’s p certain might be hard to grasp because we are so often aware of our impressions.
However, until you read this sentence, you didn’t know that you were certain five minutes ago that the Sun would not disappear three minutes ago; now that your mind is blown, your behaviour will be observably different to before this realisation, i.e. coming into knowledge of the certainty has made a measurable difference.
A belief does not necessitate a belief about that belief.
Also, not all checkers would check indefinitely. For example, a checker with a grasp of verification paradoxes might reach a point of maximal confidence and terminate. Meanwhile a checker that thought—or more accurately, was programmed/wired to—flip coins as a test for 32-paper-clips-hood (with its doubt halving each time no matter the outcome) might never terminate.
See also: (Meta) Poe’s Law.
Interesting post. The—I shall dub it—Schrödinger’s Racist example is very good, and it brings me cold-warm fuzzies of indignant recognition from my experiences of ‘Not really racist’/‘Just banter’/etc., even if that’s not for what you were aiming.
Edit after a brief a-Googling: Hah! Someone beat me to ‘Schrödinger’s racist’.
I think you unknowingly {submitted this comment prematurely}? :)
Have enough people at MetaMed been influenced sufficiently by (meatspace) LessWrong/think ‘similarly enough’ to LW rationality that we should precommit to updating by prespecified amountson the effectiveness of LW rationality in response to its successes and failures?
This has a name and a WIkipedia article and a subreddit? Couldn’t be carvier at the joints of reality if it tried. Thank you! (Not sarcastic; I’ve been idly wondering about this for about four years.)
Whaaa...I didn’t know there were MIRIfolk in Bristol. I was thinking I might have to start a Bath one or a Bristol one myself next academic year, but I hadn’t seen Louie’s comment.
This is in the middle of exams for me too, but it’s quite probable I’ll still come.
The (say) real sine function is defined such that its domain and codomain are (subsets of) the reals. The reals are usually characterized as the complete ordered field. I have never come across units that—taken alone—satisfy the axioms of a complete ordered field, and having several units introduces problems such as how we would impose a meaningful order. So a sine function over unit-ed quantities is sufficiently non-obvious as to require a clarification of what would be meant by sin($1). For example—switching over now to logarithms—if we treat $1 as the real multiplicative identity (i.e. the real number, unity) unit-multiplied by the unit $, and extrapolate one of the fundamental properties of logarithms—that log(ab)=loga+logb, we find that log($1)=log($)+log(1)=log($) (assuming we keep that log(1)=0). How are we to interpret log($)? Moreover, log($^2)=2log($). So if I log the square of a dollar, I obtain twice the log of a dollar. How are we to interpret this in the above context of utility? Or an example from trigonometric functions: One characterization of the cosine and sine stipulates that cos^2+sin^2=1, so we would have that cos^2($1)+sin^2($1)=1. If this is the real unity, does this mean that the cosine function on dollars outputs a real number? Or if the RHS is $1, does this mean that the cosine function on dollars outputs a dollar^(1/2) value? Then consider that double, triple, etc. angles in the standard cosine function can be written as polynomials in the single-angle cosine. How would this translate?
So this is a case where the ‘burden of meaningfulness’ lies with proposing a meaningful interpretation (which now seems rather difficult), even though at first it seems obvious that there is a single reasonable way forward. The context of the functions needs to be considered; the sine function originated with plane geometry and was extended to the reals and then the complex numbers. Each of these was motivated by an (analytic) continuation into a bigger ‘domain’ that fit perfectly with existing understanding of that bigger domain; this doesn’t seem to be the case here.