I operate by Crocker’s rules.

I won’t deliberately, derisively spread something just because you tried to point out an infohazard.

Karma: 1,447

I operate by Crocker’s rules.

I won’t deliberately, derisively spread something just because you tried to point out an infohazard.

Suppose the bridge is safe iff there’s a proof that the bridge is safe. Then you would forbid the reasoning “Suppose I cross. I must have proven it’s safe. Then it’s safe, and I get 10. Let’s cross.”, which seems sane enough in the face of Löb.

Why not take the bilimit of these types? , I’m guessing. The MirrorBot mirror diverges and that’s fine. If you use a constructivist approach where every comes with the expression that defined it, you can define such strategies as “Cooperate iff PA proves that the opponent cooperates against me.”.

Ignoring cooperation problems, variance of approaches doesn’t necessarily increase variance of outcomes, like if everyone’s playing minesweeper in parallel and only the first win or loss matters.

One trick I thought of for thinking about high-dimensional spaces is to put multiple dimensions on the same axis: Consider the vectors in R² from the origin onto the unit circle. Lengthen each into a axis, each going infinitely forward and backward, each sharing all its points of R² with one other, all of them intersecting at 0. Embed this in R³, then continuously rotate the tip of each axis into the new dimension, forming a double cone centered at 0. Rotate them further until all tips touch, forming a single axis that contains the information of two dimensions.

You can now have an axis contain the information of any R-vector space, and visualize up to 3 at a time. Of course, not all mental operations that worked in R³ still work.

[APPRENTICE] Advanced math or AI alignment. I’m bad at getting homework done and good at grokking things quickly, so the day-to-day should look like pair programming or tutoring.

[MENTOR] See https://www.lesswrong.com/posts/MHqwi8kzwaWD8wEQc/would-you-like-me-to-debug-your-math. The first session has the highest leverage, but if my calendar doesn’t end up booked (there is one slot in the next two weeks booked out of like 50), more time per person makes sense. My specialization is pattern-matching to correctly predict where a piece of math is going if it’s good. When you science that art you get applied category theory.

Break the minute hand off your wristwatch. Maybe some of the hour hand too.

Seems like less of a market niche, but link it!

It’s not my one trick, of course, but it illustrates my usefulness. It’s more maintainable not just because it is shorter but also because it has decades of theory behind it. Drawing the connection unlocks inspiration from entire branches of math. And the speedups from standing on the shoulders of giants go far beyond the constant factors from vectorized instructions.

subpartitions

So you’re doing category theory after all! :)

Sure, I’ll try it. I don’t expect to be an order-of-magnitude power multiplier in that case, though.

I wouldn’t relate it to humans. In just about any basic machine learning setting, (train, test) has aspect 2. In fact, what you describe speaks in favor of modeling preferences using something other than utility functions, where aspect 3 is ruled out.

re your natural example, I would expect that as one shifts from 90% doing the best for the worst off + 10% attainable utility preservation to 100% the former, average welfare goes down.

Take into account that the AI that interprets needs not be the same as the network being interpreted.

Why do you think that a mere autocomplete engine could not do interpretability work? It has been demonstrated to write comments for code and code for specs.

The error correction needs to be present in the original network because I also do some of the converting network into english. The only reason I don’t do everything myself is that it takes too long. The proportion can be higher at the topmost levels because there are less tasks there. The error correction doesn’t let it completely ignore what I would do at the low levels because on the 1% I can compare its outputs to mine, so they need to at least superficially look similar.

If we find that there’s a bunch of redundancy, we can check whether there is any way to cut it down that would suddenly identify a bunch of mesa optimization. So the mesaoptimizer would have to take over the network entirely or trick its bretheren as well as us.

Translating to a tree of natural language descriptions and back lets you

use different instances of GPT at each step, so that any coded messages would have to rely on the common training set rather than shared internals.

do a random subset of the tasks yourself, so that any subversion of the target system would have to be based on subtle biases, perhaps with a small number of outright lies that it hopes we don’t implement manually, rather than the description being fabricated from whole cloth.

Having the thing write papers is merely an existence proof of embedded agency being irrelevant except for deconfusion.

Intelligent agents causally responsible for your existence.

What do you mean you can think of this, I told this to you :D

Was the conversation in chapter one after this or is Tracey going back and forth between the two, as a Tracey does?

You, me, the Three Broomsticks, a bottle of fizzix and a nuclear missile?

Channelling Luna, I see.

If you try to model yourself as a uniformly randomly selected observer somewhere in the Solomonoff prior, that doesn’t work because there isn’t a uniform distribution on the infinite naturals. When you weight them by each universe’s probability but allow a universe to specify the number of observers in it, you still diverge because the number goes up uncomputably fast while the probability goes only exponentially down. In the end, probabilities are what decision theories use to weight their utility expectations. Therefore I suggest we start from a definition of how much we care about what happens to each of the forever infinite copies of us throughout the multiverse. It is consistent to have 80% of the utility functions in one’s aggregate say that whatever happens to the overwhelming majority of selves in universes of description length >1000, it can only vary utility by at most 0.2.

Call the index set of X IX. Call the partition into empty parts indexed by S 0S. We have 0 ⊣ I ⊣ D ⊣ ⊔ ⊣ T.

None of the our three adjunction strings can be extended further. Let’s apply the construction that gave us histories at the other 5 ends. Niceness is implicit.

- The right construction of TS->X is the terminal S->S’ with a TS’->X: The image of ⊔(TS->X).

- The left construction of X->0S is the initial S’->S with a X->0S’: The image of I(X->0S).

- The left construction of B->FX is the initial X’->X with a B->FX’: The image of ∨(B->FX).

- The right construction of Δ1•->S is the terminal •->• with a Δ1•->S: The image of Δ•(Δ1•->S).

- The left construction of S->Δ∅• is absurd, but can still be written as the image of Δ•(S->Δ∅•).

- The history of ∨B->X is the terminal B->B’ with a ∨B’->X: Breaks the pattern! F(∨B->X) does not have the information to determine the history.In fact, ⊔T, I0, ∨F, Δ•Δ1 and Δ•Δ∅ are all identity, only F∨ isn’t.

She left her wand on the bar and she didn’t immediately lose? Moody must have expected her to have another wand, given that she left her wand on the bar theatrically.

Suppose the bridge is safe iff A() would decide to cross?