Because that’s what it would do if it were us. https://en.m.wikipedia.org/wiki/Kerckhoffs’s_principle says we should not assume our security mechanisms are secret. One exemption would be if you could define three indistinguishable contexts that the AI might find itself in, then select randomly between them—then you may assume that the AI also doesn’t know.

# Gurkenglas

Suppose we considered simulating some human for a while to get a single response. My math heuristics are throwing up the hypothesis that proving what the response would be is morally equivalent to actually running the simulation—it’s just another substrate. Thoughts? Implications? References?

It’s rather obvious if you’ve done programming or studied provability or read the sequences. The lesswrong crowd isn’t a good sample for testing the strength of this trap.

Then the map is just...the set of where you started from and where you ended up. That is, a and x, respectively.

This sounds like the map is {a,x}.

If you run out of steam before reaching adjunctions, I hope you can manage a post about adjunctions that assumes that you had finished all the previous posts.

You say that functions are the best maps because of those two properties, but they are simply the defining properties of a function. What makes these properties the best properties for a definition of maps to have?

Oh, damn it, I mixed up the designs. Edited.

Design 2̴ 1 may happen to reply “Convince the director to undecouple the AI design by telling him <convincing argument>.” which could convince the operator that reads it and therefore fail as 3̴ 2 fails.

Design 2̴ 1 may also model distant superintelligences that break out of the box by predictably maximizing paperclips iff we draw a runic circle that, when printed as a plan, convinces the reader or hacks the computer.

That a construction is free doesn’t mean that you lose nothing. It means that if you’re going to do some construction anyway, you might as well use the free one, because the free one can get to any other. (Attainable utility anyone?)

Showing that your construction is free means that all you need to show as worthwhile is constructing any category from our quiver. Adjunctions are a fine reason, though I wish we could introduce adjunctions first and then show that we need categories to get them.

Math certainly has ambiguous generalizations. As the image hints, these are also studied in category theory. Usually, when you must select one, the one of interest is the least general one that holds for each of your objects of study. In the image, this is always unique. I’m guessing that’s why bicentric has a name. I’ll pass on the question of how often this turns out unique in general.

Not every way to model reality defines identity and composition. You can start with a category-without-those G (a quiver) and end up at a category C by defining C-arrows as chains of G-arrows (the quiver’s free category), but it doesn’t seem necessary or a priori likely to give new insights. Can you justify this rules choice?

Categories are what we call it when each arrow remembers its source and target. When they don’t, and you can compose anything, it’s called a monoid. The difference is the same as between static and dynamic type systems. The more powerful your system is, the less you can prove about it, so whenever we can, we express that particular arrows can’t be composed, using definitions of source and target.

is a different category

You mean object.

Every category containing O and P must address this question. In the usual category of math functions, if P has only those two pairs then the source object of P is exactly {4,5}, so O and P can’t be composed. In the category of relations, that is arbitrary sets of pairs between the source and target sets, O and P would compose to the empty relation between letters and countries.

If your mapping contains those three pairs, then the arrow’s source object contains 1, A, B and cow, and the target object contains 5, 3, cat and france. Allowing or disallowing mixed types gives two different categories. Whether an arrow mixes types is as far as I can tell you to mean uniquely determined by whether its source or target object mix types. In either case, to compose two arrows they must have a common middle object.

The baggage that comes with the words noun and verb is only for guiding the search for intuition and is to be discarded when it leads to confusion.

In all your interpretations of math/programming functions, there can be different arrows between the same objects. The input/output behavior is seen as part of the arrow. The objects are merely there to establish what kinds of arrows can be strung together because one produces, say, real numbers, and the other consumes them.

I started asking for a chess example because you implied that the reasoning in the top-level comment stops being sane in iterated games.

In a simple iteration of Troll bridge, whether we’re dumb is clear after the first time we cross the bridge. In a simple variation, the troll requires smartness even given past observations. In either case, the best worst-case utility bound requires never to cross the bridge, and A knows crossing blows A up. You seemed to expect more.

Suppose my chess skill varies by day. If my last few moves were dumb, I shouldn’t rely on my skill today. I don’t see why I shouldn’t deduce this ahead of time and, until I know I’m smart today, be extra careful around moves that to dumb players look extra good and are extra bad.

More concretely: Suppose that an unknown weighting of three subroutines approval-votes on my move: Timmy likes moving big pieces, Johnny likes playing good chess, and Spike tries to win in this meta. Suppose we start with move A, B or C available. A and B lead to a Johnny gambit that Timmy would ruin. Johnny thinks “If I play alone, A and B lead to 80% win probability and C to 75%. I approve exactly A and B.”. Timmy gives 0, 0.2 and 1 of his maximum vote to A, B and C. Spike wants the gambit to happen iff Spike and Johnny can outvote Timmy. Spike wants to vote for A and against B. How hard Spike votes for C trades off between his test’s false positive and false negative rates. If B wins, ruin is likely. Spike’s reasoning seems to require those hypothetical skill updates you don’t like.

If I’m a poor enough player that I merely have evidence, not proof, that the queen move mates in four, then the heuristic that queen sacrifices usually don’t work out is fine and I might use it in real life. If I can prove that queen sacrifices don’t work out, the reasoning is fine even for a proof-requiring agent. Can you give a chesslike game where some proof-requiring agent can prove from the rules and perhaps the player source codes that queen sacrifices don’t work out, and therefore scores worse than some other agent would have? (Perhaps through mechanisms as in Troll bridge.)

The charity could also do this itself, right? Take money, don’t some of it yet so it has something to spend tomorrow.

The same reasoning also says to take out loans to bunch the donation now rather than later, to align your future self with your present self because paying off your loans is in your own best interest.

Participants were selected based on whether they seem unlikely to press the button, so whoever would have cared about future extortions being possible CDT-doesn’t need to, because they won’t be a part of it.

Having in mind that we are measuring bits of evidence tells us that to give percentages, we must establish a baseline prior probability that we would assign without reasons.

Mostly you should be fine, just have heuristics for the anomalies near 0 and 1 - if one belief pushes the probability to .5 and another to .6, then the prior was noticeably far from zero or getting only the second reason won’t be noticeable either.

That obvious trick relies on that we will only verify its prediction for strategies that it recommends. Here’s a protocol that doesn’t fail to it: A known number of gems are distributed among boxes, we can only open one box and want many gems. Ask for the distribution, select one gem at random from the answer and open its box. For every gem it hides elsewhere, selecting it reveals deception.