Yeah, I think I can make peace with that. Another way to think of it is that we can keep the reward structure of the original Newcomb’s problem, but instead of saying “Omega is almost always right” we add another person Bob (maybe the mad scientist who built Omega) who’s willing to pay you a billion dollars if you prove Omega wrong. Then minimaxing indeed leads to oneboxing. Though I guess the remaining question is why minimaxing is the right thing to do. And if randomizing is allowed, the idea of Omega predicting how you’ll randomize seems a bit dodgy as well.
cousin_it
I’m not sure your definition has much to do with consciousness, as it would also be satisfied by an AI that runs on an Intel processor and whose utility function says all AIs should run on Intel processors.
But in Newcomb’s problem, the agent’s reward in case of wrong prediction is already defined. For example, if the agent oneboxes but the predictor predicted twoboxing, the reward should be zero. If you change that to +infinity, aren’t you open to the charge of formalizing the wrong problem?
Sorry, I wrote some nonsense in another comment and then deleted it. I guess the point is that UDT (which I agree with) recommends nonequilibrium behavior in this case.
I just figured out why Newcomb’s problem feels “slippery” when we try to describe it as a twoplayer game. If we treat the predictor as a player with some payoffs, then the strategy pair {twobox, predict twoboxing} will be a Nash equilibrium, because each strategy is the best response to the other. That’s true no matter which payoff matrix we choose for the predictor, as long as correct predictions lead to higher payoffs.
Let’s change the problem a bit: assume you’re starting with nonzero capital c, so the formula becomes (bt+c)e^(at). If c>b/a, the derivative of that formula at t=0 is negative, so you need to stop immediately. That shows the decision to stop doesn’t depend only on a and b, but also on current capital. So basically “at each moment in time you face the exact same problem” is wrong. The naive solution is the right one: you should stop when c=b/a, which means t=1/a in the original problem.
Agreed—I’ve seen, and made, quite a few such changes as well. After each big upheaval it’s worth spending some time grabbing the low hanging fruit. My only gripe is that I don’t think this type of change is sufficient over a project’s lifetime. Deeper product change has a way of becoming necessary.
I agree that A/B tests aren’t evil, and are often useful. All I’m saying is, sometimes they give ammo to dark pattern thinking in the minds of people within your company, and reversing that process isn’t easy.
Another ethical consideration is that most A/B tests aren’t aimed to help the user, but to improve metrics that matter to the company, like engagement or conversion rate. All the sketchy stuff you see on the web—sign up for your free account, install our mobile app, allow notifications, autoplay videos, social buttons, fixed headers and footers, animated ads—was probably justified by A/B testing at some point.
I think incremental change is a bit overrated. Sure, if you have something that performs so well that chasing 1% improvements is worth it, then go for it. But don’t keep tweaking forever: you’ll get most of the gains in the first few months, and they will total about +20%, or maybe +50% if you’re a hero.
If your current thing doesn’t perform so well, it’s more costeffective to look for big things that could bring +100% or +1000%. A/B tests are useful for that too, but need to be done differently:

Come up with a big thing that could have big impact. For example, shortform.

Identify the assumptions behind that thing. For example, “users will write shortform” or “users will engage with others’ shortform”.

Come up with cheap ways to test these assumptions. For example, “check the engagement on existing posts that are similar to shortform” or “suggest to some power users that they should make shortform posts and see how much engagement they get”. At this step you may end up looking at metrics, looking at competitors, or running cheap A/B tests.

Based on the previous steps, change your mind about which thing you want to build, and repeat these steps until you’re pretty sure it will succeed.

Build the thing.

I’m not sure that works. Imagine there are two theories about a certain button:

Pressing the button gives you a dollar and does nothing else

Pressing the button gives you a dollar and also causes an unobservable person to suffer
These theories give the same observations, but recommend different actions, so to me they seem to have different meanings.

Some qualities are separable (the mass of an object is the sum of masses of its parts) and others aren’t (the enjoyability of a book isn’t the sum of enjoyabilities of its pages). I’m not sure human value is separable. At least some of my values seem to be about relationships between people, not just individual experiences.
Maybe a bit off topic, but to me a future filled with robotaxis, automated cashiers, drone delivery and other such things would feel kind of scary and alien. I’ve always lived in cities where you need constant human interaction to get by, and for myself I’d like things to stay that way or even more so.
(12/?)
The uncertainty principle, or something like it.
When you measure a qubit cos(φ)0> + sin(φ)1>, the result has variance (1cos(4φ))/4. (I’m skipping over the trig calculations here and below.) If you have a choice between either measuring the qubit, or rotating it by an angle ψ and then measuring it, then the sum of variances of the two operations is at least (1cos(2ψ))/2. In particular, if ψ=π/4, the sum of variances is at least ^{1}⁄_{2}.
So no matter what state a qubit is in, and how precisely you know its state, there are two possible measurements such that the sum of variances of their results is at least ^{1}⁄_{2}. The reason is that the bases corresponding to these measurements are at an angle to each other, not aligned. Apparently in the real world, an object’s “position” and “momentum” are two such measurements, so there’s a limit on their joint precision.
You can also carry out one measurement and then the other, but that doesn’t help—after the first measurement you have variance in the second. Moreover, after the second you have fresh variance in the first. This lets you get an infinite stream of fair coinflips from a single qubit: start with 0>, rotate by π/4, measure, rotate by π/4, measure...
(11/?)
Superdense coding.
Alice is told two classical bits of information and sends a qubit to Bob, who can then recover the two classical bits. Again, it relies on Alice and Bob sharing a prepared pair beforehand. It’s the opposite of quantum teleportation, where Alice sends two classical bits and Bob can recover a qubit.
First let’s talk about bases. This is the usual basis: 00>, 01>, 10>, 11>. This is the Bell basis: (00> + 11>)/√2, (00>  11>)/√2, (10> + 01>)/√2, (10>  01>)/√2. Check for yourself that each two vectors are orthogonal to each other. To “measure a state in a different basis” means to apply a rotation from one basis into another, then measure. For example, if you have a state and you know that it’s one of the Bell basis states, you can figure out which one, by rotating into the usual basis and measuring.
One cool thing about the Bell basis is that you can change any basis vector into any other basis vector by operations on the first qubit only. For example, rotating the first qubit by 90 degrees turns (00> + 11>)/√2 into (10>  01>)√2. Flipping the first qubit gives (10> + 01>)√2, and so on.
Now superdense coding is easy. Alice and Bob start by sharing two halves of a Bell state. Then depending on which two classical bits Alice needs to send, she either leaves the state alone or rotates into one of the other three, by operating only on her qubit. Then she sends her qubit to Bob, who now has both qubits and can rotate them into the usual basis and measure them, recovering the two classical bits.
I think this AI design has a bigger problem. Imagine PAL is choosing whether to give Dave regular coffee or poisoned coffee that causes a lot of suffering. If PAL simulates both scenarios, that causes a lot of simulated suffering. Bostrom called this problem “mind crime”.
(10/?)
Quantum teleportation.
It’s not a very good name, maybe we should call it “sending qubits over a classical channel using a quantum onetime pad”. The idea is that Alice and Bob are far apart from each other and have a prepared pair of qubits, and Alice also has some qubit in unknown state. She can measure it in a clever way and send some classical bits to Bob, which allows him to recreate the unknown qubit. Both qubits of the prepared pair are spent in the process.
Let’s say the unknown qubit is a0>+b1>, and the prepared pair is (00> + 11>)/√2. So the whole state is (a0> + b1>) ⊗ (00> + 11>)/√2 = (a000> + a011> + b100> + b111>)/√2. Alice can access the first two qubits and Bob can access the third.

Alice flips the second qubit depending on the first, leading to (a000> + a011> + b110> + b101>)/√2.

Alice applies a combined reflection and rotation to the first qubit, which is known as Hadamard gate: 0> becomes (0> + 1>)/√2, while 1> becomes (0>  1>)/√2. So the whole state becomes (a000> + a100> + a011> + a111> + b010>  b110> + b001>  b101>)/2.

We can rewrite that as ((00> ⊗ (a0> + b1>)) + (01> ⊗ (a1> + b0>)) + (10> ⊗ (a0>  b1>)) + (11> ⊗ (a1>  b0>)))/2.

Notice something funny? In the above state, measuring the first two qubits gives you exactly enough information to know which operation to apply to the third qubit to get a0> + b1>. So if Alice measures both her qubits and sends the two classical bits to Bob, he can reconstruct the unknown qubit on his end. And if Carol eavesdrops on the message without having access to Bob’s qubit, she won’t learn anything, because the four possible messages have probability ^{1}⁄_{4} each, regardless of a and b.
If the unknown qubit was entangled with something else, that also survives the transfer, though proving that requires a bit more calculation.

(9/?)
Black box functions and Grover’s algorithm.
Say we want our quantum computer to make calls to a black box function f(x) from bit strings to bits. There’s a couple ways to formalize that. Phase oracle: x> becomes x> if f(x), otherwise x>. Binary oracle: x> ⊗ b> becomes x> ⊗ f(x) xor b>. These definitions only say what happens to basis vectors, and are extended to other vectors by linearity.
We can convert from a binary oracle to a phase oracle by using an extra qubit. Start with a state x> ⊗ (0>  1>)/√2 and apply a binary oracle, we end up with (1/√2) ⋅ (x1>  x0> if f(x), otherwise x0>  x1>). But that’s exactly (x> if f(x), otherwise x>) ⊗ (0>  1>)/√2. So x> has been changed as if by a phase oracle, and the extra qubit was left unchanged.
Grover’s algorithm: we have a black box function from Nbit strings to bits, which returns true for only one possible input. We want to find that input. Classically that requires O(2^N) calls to the black box, because you can’t do much better than brute force. With a quantum algorithm we can get O(√2^N) which is much faster.
Let’s say X is the input we’re trying to find, which is one of the basis vectors of the space. Let’s also define a vector S = (0> + 1>)/√2 ⊗ … ⊗ (0> + 1>)/√2, which is the sum of all 2^N basis vectors with equal amplitudes.

Initialize state as S

Apply phase oracle, corresponding to reflection through the plane perpendicular to X

Apply “Grover operator”, corresponding to reflection through the plane perpendicular to S and then central symmetry around 0

Repeat steps 2 and 3 a bunch of times
To understand it geometrically, draw the 2D plane containing the vectors X and S. Their dot product is 1/√2^N, which is a small positive number, so the angle between them is a bit less than 90 degrees. If our state starts out as S, the net effect of (2) and (3) is a slight rotation toward X, and each time we repeat it, we rotate again by the same angle. The angle is 2⋅arcsin(1/√2^N), which is proportional to 2/√2^N, so after (pi/4)⋅√2^N iterations we’re pretty close to X. Then we can measure and get X with high probability.
I glossed over implementing the Grover operator, but that shouldn’t be too hard, since from the first half of this comment it should be kinda clear how to reflect a vector through a plane.

It seems to me that your examples of B are mostly deontological, so it would be nice to have some C which represented virtue ethics as well.
This argument also proves that quantum computers can’t offer useful speedups, because otherwise evolution would’ve found them.
This argument also proves that Genghis Khan couldn’t have happened, because intelligence and power converge on caring about positive valence for all beings.
I think the orthogonality thesis is in good shape, if these are the strongest arguments we could find against it in almost a decade.