In the Parable of Predict-O-Matic, a subnetwork of the titular Predict-O-Matic becomes a mesa-optimiser and begins steering the future towards its own goals, independently of the rest of Predict-O-Matic. It does so in a way that sabotages the other subnetworks.

During one run, Lenat noticed that the number in the Worth slot of one newly discovered heuristic kept rising, indicating that Eurisko had made a particularly valuable find. As it turned out the heuristic performed no useful function. It simply examined the pool of new concepts, located those with the highest Worth values, and inserted its name in their My Creator slots.

One thing I wondered is whether this could happen in humans, and if not, why it doesn’t. A simplified description of memory that I learned in a flash game is that “neural connections” are “strengthened” whenever they are “used”, which sounds sort of like gradients in RL if you don’t think about it too hard. Maybe the analogue of this would be some memory that “wants” you to remember it repeatedly at the expense of other memories. Trauma?

The prologue begins with a short story called the Tale of the Omega Team. It’s a wish-fulfilment pseudo-isekai about a bunch of effective altruist tech people working for not-Google called the Omegas who make an AGI and then use it to take over the world.

But a cybersecurity specialist on their team talked them out of the game plan [...] risk of Prometheus breaking out and seizing control of its own destiny [...] weren’t sure how its goals would evolve [...] go to great lengths to keep Prometheus confined

For some reason, the Omegas in the story claim that the Prometheus (the AI) might be unsafe, and then proceed to do things like have it write software which they then run on computers and let it produce long pieces of animated media and let it send blueprints of technologies to scientists. There is a cybersecurity expert in the team who just barely stops them from straight up leaving the whole thing unboxed, and I do not envy her job position.

(Prometheus is safe, it turns out, which I can tell because there are humans alive at the end of the story.)

[...] Omega-controlled [...] controlled by the Omegas [...] the Omegas harnessed Prometheus [...] the Omegas’ [...] the Omegas’ [...]

There’s also another odd thing where it says that the Omegas are using Prometheus as a tool to do things, instead of what’s clearly actually happening which is that Prometheus is achieving its goals with the Omegas being some lumps of atoms that it’s been pushing around according to its whims, as it has been since they decided to switch it on.

All-in-all, I like it. It wouldn’t be out of place on r/rational, if wish-fulfillment pseudo-isekai does happen then AGI sweeping aside the previous social order will be how (a real AGI would come close to some of the capabilities I’ve seen those protagonists have), and fiction about more plausible robopocalypses (or roboutopias) coming about is always great.

The same is true of Google search. I examined the top ten search results for each donation, with broadly similar results: mostly negative for Zuckerberg and Bezos, mostly positive for Gates.

With Gates’ philanthropy being about malaria, Zuckerberg’s being about Newark schools, and Bezos’ being about preschools.

Also, as far as I can tell, Moskovitz’ philanthropy is generally considered positively, though of course I would be in a bubble with respect to this. Also also, though I say this without really checking, it seems that people are pretty much all against the Sacklers’ donations to art galleries and museums.

Squinting at these data points, I can kind of see a trend: people favour philanthropy that’s buying utilons, and are opposed to philanthropy that’s buying status. They like billionaires funding global development more than they like billionaires funding local causes, and they like them funding art galleries for the rich least of all.

Which is basically what you’d expect if people were well-calibrated and correctly criticising those who need to be taken down a peg.

and they like them funding art galleries for the rich least of all.

What are these art galleries “for the rich”? Your link mentions the National Gallery, the Tate Gallery, the Smithsonian, the Louvre, the Guggenheim, the Sackler Museum at Harvard, the Metropolitan Museum of Art, and the American Museum of Natural History as recipients of Sackler money. All of them are open to everyone. The first three are free and the others charge in the region of $15-$25 (as do the National Gallery and the Tate Gallery for special exhibitions, but not the bulk of their displays). The hostility to Sackler money has nothing to do with “how dare they be billionaires”, but is because of the (allegedly) unethical practices of the pharmaceutical company that the Sacklers own and owe their fortune to. No-one had any problem with their donations before.

Which is basically what you’d expect if people were well-calibrated and correctly criticising those who need to be taken down a peg.

I see nothing correct in the ethics of the crab bucket.

The simplicity prior is that you should assign a prior probability 2^-L to the description of length L. This sort of makes intuitive sense, since it’s what you’d get if you generated the description through a series of coinflips...

… except there are 2^L descriptions of length L, so the total prior probability you’re assigning is sum(2^L * 2^-L) = sum(1) = unnormalisable.

You can kind of recover this by noticing that not all bitstrings correspond to an actual description, and for some encodings their density is low enough that it can be normalised (I think the threshold is that less than 1/L descriptions of length L are “valid”)...

...but if that’s the case, you’re being fairly information inefficient because you could compress descriptions further, and why are you judging simplicity using such a bad encoding, and why 2^-L in that case if it doesn’t really correspond to complexity properly any more? And other questions in this cluster.

I am confused (and maybe too hung up on something idiosyncratic to an intuitive description I heard).

Was this meant to be a reply to my bit about the Solmonoff prior?

If so, in the algorithmic information literature, they usually fix the unnormalizability stuff by talking about Prefix Turing machines. Which corresponds to only allowing TM descriptions that correspond to a valid Prefix Code.

But it is a good point that for steeper discounting rates, you don’t need to do that.

It was inspired by yours—when I read your post I remembered that there was this thing about Solomonoff induction that I was still confused about—though I wasn’t directly trying to answer your question so I made it its own thread.

Imagine two prediction markets, both with shares that give you $1 if they pay out and $0 otherwise.

One is predicting some event in the real world (and pays out if this event occurs within some timeframe) and has shares currently priced at $X.

The other is predicting the behaviour of the first prediction market. Specifically, it pays out if the price of the first prediction market exceeds an upper threshhold $T before it goes below a lower threshhold $R.

Is there anything that can be said in general about the price of the second prediction market? For example, it feels intuitively like if T >> X, but R is only a little bit smaller than X, then assigning a high price to shares of the second prediction market violates conservation of evidence—is this true, and can it be quantified?

Over the past few days I’ve been reading about reinforcement learning, because I understood how to make a neural network, say, recognise handwritten digits, but I wasn’t sure how at all that could be turned into getting a computer to play Atari games. So: what I’ve learned so far. Spinning Up’s Intro to RL probably explains this better.

(Brief summary, explained properly below: The agent is a neural network which runs in an environment and receives a reward. Each parameter in the neural network is increased in proportion to how much it increases the probability of making the agent do what it just did, and how good the outcome of what the agent just did was.)

Reinforcement learners play inside a game involving an agent and an environment. On turn t, the environment hands the agent an observation ot, and the agent hands the environment an action at. For an agent acting in realtime, there can be sixty turns a second; this is fine.

The environment has a transition function which takes an observation-action pair otat and responds with a probability distribution over observations on the next timestep ot+1; the agent has a policy that takes an observation ot and responds with a probability distribution over actions to take at.

The policy is usually written as π, and the probability that π outputs an action a in response to an observation o is π(a|o). In practise, π is usually a neural network that takes observations as input and has actions as output (using something like a softmax layer to give a probability distribution); the parameters of this neural network are θ, and the corresponding policy is πθ.

At the end of the game, the entire trajectory τ=o1a1o2a2…oTaT is assigned a score, R(τ), measuring how well the agent has done. The goal is to find the policy πθ that maximises this score.

Since we’re using machine learning to maximise, we should be thinking of gradient descent, which involves finding the local direction in which to change the parameters θ in order to increase the expected value of R by the greatest amount, and then increasing them slightly in that direction.

In other words, we want to find ∇θEτ∼πθ[R(τ)].

Writing the expectation value in terms of a sum over trajectories, this is ∇θ∑τ∈D(P(τ|θ)R(τ)) = ∑τ∈D(∇θP(τ|θ)R(τ)), where P(τ|θ) is the probability of observing the trajectory τ if the agent follows the policy πθ, and D is the space of possible trajectories.

The probability of seeing a specific trajectory happen is the product of the probabilities of any individual step on the trajectory happening, and is hence P(τ|θ)=∏Tt=1πθ(at|ot)E(ot|at−1ot−1) where E(ot+1|atot) is the probability that the environment outputs the observation ot+1 in response to the observation-action pair atot . Products are awkward to work with, but products can be turned into sums by taking the logarithm - lnP(τ|θ)=∑Tt=1lnπθ(at|ot)+lnE(ot|at−1ot−1) .

The gradient of this is ∇θlnP(τ|θ)=∑Tt=1∇θlnπθ(at|ot)+∇θlnE(ot|at−1ot−1) . But what the environment does is independent of θ, so that entire term vanishes, and we have ∇θlnP(τ|θ)=∑Tt=1∇θlnπθ(at|ot) . The gradient of the policy is quite easy to find, since our policy is just a neural network so you can use back-propagation.

Our expression for the expectation value is just in terms of the gradient of the probability, not the gradient of the logarithm of the probability, so we’d like to express one in terms of the other.

Conveniently, the chain rule gives ∇θlnP(τ|θ)=1P(τ|θ)∇θP(τ|θ) , so ∇θP(τ|θ)=P(τ|θ)∇θlnP(τ|θ) . Substituting this back into the original expression for the gradient gives

∑τ∈D(P(τ|θ)∇θlnP(τ|θ)R(τ)) ,

and substituting our expression for the gradient of the logarithm of the probability gives

∑τ∈D(P(τ|θ)∑Tt=1∇θlnπθ(at|ot)R(τ)) .

Notice that this is the definition of the expectation value of ∇θlnπθ(at|ot)R(τ) , so writing the sum as an expectation value again we get

Eτ∼πθ[∑Tt=1∇θlogπθ(at|st)R(τ)].

You can then find this expectation value easily by sampling a large number of trajectories (by running the agent in the environment many times), calculating the term inside the brackets, and then averaging over all of the runs.

Neat!

(More sophisticated RL algorithms apply various transformations to the reward to use information more efficiently, and use various gradient descent tricks to use the gradients acquired to converge on the optimal parameters more efficiently)

Here are three statements I believe with a probability of about 1/9:

The two 6-sided dice on my desk, when rolled, will add up to 5.

An AI system will kill at least 10% of humanity before the year 2100.

Starvation was a big concern in ancient Rome’s prime (claim borrowed from Elizabeth’s Epistemic Spot Check post).

Except I have some feeling that the “true probability” of the 6-sided die question is pretty much bang on exactly ^{1}⁄_{9}, but that the “true probability” of the Rome and AI xrisk questions could be quite far from ^{1}⁄_{9} and to say the probability is precisely ^{1}⁄_{9} seems… overconfident?

From a straightforward Bayesian point of view, there is no true probability. It’s just my subjective degree of belief! I’d be willing to make a bet at ^{8}⁄_{1} odds on any of these, but not at worse odds, and that’s all there really is to say on the matter. It’s the number I multiply by the utilities of the outcomes to make decisions.

One thing you could do is imagine a set of hypotheses that I have that involve randomness, and then I have a probability distribution over which of these hypotheses is the true one, and by mapping each hypothesis to the probability it assigns to the outcome my probability distribution over hypotheses becomes a probability distribution over probabilities. This is sharply around ^{1}⁄_{9} for the dice rolls, and widely around ^{1}⁄_{9} for AI xrisk, as expected, so I can report 50% confidence intervals just fine. Except sensible hypotheses about historical facts probably wouldn’t be random, because either starvation was important or it wasn’t, that’s just a true thing that happens to exist in my past, maybe.

I like jacobjacob’s interpretation of a probability distribution over probabilities as an estimate of what your subjective degree of belief would be if you thought about the problem for longer (e.g. 10 hours). The specific time horizon seems a bit artificial (extreme case: I’m going to chat with an expert historian in 10 hours and 1 minute) but it does work and gives me the kind of results that makes sense. The advantage of this is that you can quite straightforwardly test your calibration (there really is a ground truth) - write down your 50% confidence interval, then actually do the 10 hours of research, and see how often the degree of belief you end up with lies inside the interval.

In the Parable of Predict-O-Matic, a subnetwork of the titular Predict-O-Matic becomes a mesa-optimiser and begins steering the future towards its own goals, independently of the rest of Predict-O-Matic. It does so in a way that sabotages the other subnetworks.

I am reminded of one specification problem that a run of Eurisko faced:

One thing I wondered is whether this could happen in humans, and if not, why it doesn’t. A simplified description of memory that I learned in a flash game is that “neural connections” are “strengthened” whenever they are “used”, which sounds sort of like gradients in RL if you don’t think about it too hard. Maybe the analogue of this would be some memory that “wants” you to remember it repeatedly at the expense of other memories. Trauma?

Tulpas??

Life 3.0 Liveblog/Review ThreadPreludeThe prologue begins with a short story called the

Tale of the Omega Team. It’s a wish-fulfilment pseudo-isekai about a bunch of effective altruist tech people working for not-Google called the Omegas who make an AGI and then use it to take over the world.For some reason, the Omegas in the story claim that the Prometheus (the AI) might be unsafe, and then proceed to do things like have it write software which they then run on computers and let it produce long pieces of animated media and let it send blueprints of technologies to scientists. There is a cybersecurity expert in the team who just barely stops them from straight up leaving the whole thing unboxed, and I do not envy her job position.

(Prometheus is safe, it turns out, which I can tell because there are humans alive at the end of the story.)

There’s also another odd thing where it says that the Omegas are using Prometheus as a tool to do things, instead of what’s clearly actually happening which is that Prometheus is achieving its goals with the Omegas being some lumps of atoms that it’s been pushing around according to its whims, as it has been since they decided to switch it on.

All-in-all, I like it. It wouldn’t be out of place on r/rational, if wish-fulfillment pseudo-isekai does happen then AGI sweeping aside the previous social order will be how (a real AGI would come close to some of the capabilities I’ve seen those protagonists have), and fiction about more plausible robopocalypses (or roboutopias) coming about is always great.

In Against Against Billionaire Philanthropy, Scott says

With Gates’ philanthropy being about malaria, Zuckerberg’s being about Newark schools, and Bezos’ being about preschools.

Also, as far as I can tell, Moskovitz’ philanthropy is generally considered positively, though of course I would be in a bubble with respect to this. Also also, though I say this without really checking, it seems that people are pretty much all against the Sacklers’ donations to art galleries and museums.

Squinting at these data points, I can kind of see a trend: people favour philanthropy that’s buying utilons, and are opposed to philanthropy that’s buying status. They like billionaires funding global development more than they like billionaires funding local causes, and they like them funding art galleries for the rich least of all.

Which is basically what you’d expect if people were well-calibrated and correctly criticising those who need to be taken down a peg.

What are these art galleries “for the rich”? Your link mentions the National Gallery, the Tate Gallery, the Smithsonian, the Louvre, the Guggenheim, the Sackler Museum at Harvard, the Metropolitan Museum of Art, and the American Museum of Natural History as recipients of Sackler money. All of them are open to everyone. The first three are free and the others charge in the region of $15-$25 (as do the National Gallery and the Tate Gallery for special exhibitions, but not the bulk of their displays). The hostility to Sackler money has nothing to do with “how dare they be billionaires”, but is because of the (allegedly) unethical practices of the pharmaceutical company that the Sacklers own and owe their fortune to. No-one had any problem with their donations before.

I see nothing correct in the ethics of the crab bucket.

The simplicity prior is that you should assign a prior probability 2^-L to the description of length L. This sort of makes intuitive sense, since it’s what you’d get if you generated the description through a series of coinflips...

… except there are 2^L descriptions of length L, so the total prior probability you’re assigning is sum(2^L * 2^-L) = sum(1) = unnormalisable.

You can kind of recover this by noticing that not all bitstrings correspond to an actual description, and for some encodings their density is low enough that it can be normalised (I think the threshold is that less than 1/L descriptions of length L are “valid”)...

...but if that’s the case, you’re being fairly information inefficient because you could compress descriptions further, and why are you judging simplicity using such a bad encoding, and why 2^-L in that case if it doesn’t really correspond to complexity properly any more? And other questions in this cluster.

I am confused (and maybe too hung up on something idiosyncratic to an intuitive description I heard).

Was this meant to be a reply to my bit about the Solmonoff prior?

If so, in the algorithmic information literature, they usually fix the unnormalizability stuff by talking about Prefix Turing machines. Which corresponds to only allowing TM descriptions that correspond to a valid Prefix Code.

But it is a good point that for steeper discounting rates, you don’t need to do that.

It was inspired by yours—when I read your post I remembered that there was this thing about Solomonoff induction that I was still confused about—though I wasn’t directly trying to answer your question so I made it its own thread.

Imagine two prediction markets, both with shares that give you $1 if they pay out and $0 otherwise.

One is predicting some event in the real world (and pays out if this event occurs within some timeframe) and has shares currently priced at $X.

The other is predicting the behaviour of the first prediction market. Specifically, it pays out if the price of the first prediction market exceeds an upper threshhold $T before it goes below a lower threshhold $R.

Is there anything that can be said in general about the price of the second prediction market? For example, it feels intuitively like if T >> X, but R is only a little bit smaller than X, then assigning a high price to shares of the second prediction market violates conservation of evidence—is this true, and can it be quantified?

Over the past few days I’ve been reading about reinforcement learning, because I understood how to make a neural network, say, recognise handwritten digits, but I wasn’t sure how at all that could be turned into getting a computer to play Atari games. So: what I’ve learned so far. Spinning Up’s Intro to RL probably explains this better.

(

Brief summary, explained properly below: The agent is a neural network which runs in an environment and receives a reward. Each parameter in the neural network is increased in proportion to how much it increases the probability of making the agent do what it just did, and how good the outcome of what the agent just did was.)Reinforcement learners play inside a game involving an agent and an environment. On turn t, the environment hands the agent an observation ot, and the agent hands the environment an action at. For an agent acting in realtime, there can be sixty turns a second; this is fine.

The environment has a transition function which takes an observation-action pair otat and responds with a probability distribution over observations on the next timestep ot+1; the agent has a policy that takes an observation ot and responds with a probability distribution over actions to take at.

The policy is usually written as π, and the probability that π outputs an action a in response to an observation o is π(a|o). In practise, π is usually a neural network that takes observations as input and has actions as output (using something like a softmax layer to give a probability distribution); the parameters of this neural network are θ, and the corresponding policy is πθ.

At the end of the game, the entire trajectory τ=o1a1o2a2…oTaT is assigned a score, R(τ), measuring how well the agent has done. The goal is to find the policy πθ that maximises this score.

Since we’re using machine learning to maximise, we should be thinking of gradient descent, which involves finding the local direction in which to change the parameters θ in order to increase the expected value of R by the greatest amount, and then increasing them slightly in that direction.

In other words, we want to find ∇θEτ∼πθ[R(τ)].

Writing the expectation value in terms of a sum over trajectories, this is ∇θ∑τ∈D(P(τ|θ)R(τ)) = ∑τ∈D(∇θP(τ|θ)R(τ)), where P(τ|θ) is the probability of observing the trajectory τ if the agent follows the policy πθ, and D is the space of possible trajectories.

The probability of seeing a specific trajectory happen is the product of the probabilities of any individual step on the trajectory happening, and is hence P(τ|θ)=∏Tt=1πθ(at|ot)E(ot|at−1ot−1) where E(ot+1|atot) is the probability that the environment outputs the observation ot+1 in response to the observation-action pair atot . Products are awkward to work with, but products can be turned into sums by taking the logarithm - lnP(τ|θ)=∑Tt=1lnπθ(at|ot)+lnE(ot|at−1ot−1) .

The gradient of this is ∇θlnP(τ|θ)=∑Tt=1∇θlnπθ(at|ot)+∇θlnE(ot|at−1ot−1) . But what the environment does is independent of θ, so that entire term vanishes, and we have ∇θlnP(τ|θ)=∑Tt=1∇θlnπθ(at|ot) . The gradient of the policy is quite easy to find, since our policy is just a neural network so you can use back-propagation.

Our expression for the expectation value is just in terms of the gradient of the probability, not the gradient of the logarithm of the probability, so we’d like to express one in terms of the other.

Conveniently, the chain rule gives ∇θlnP(τ|θ)=1P(τ|θ)∇θP(τ|θ) , so ∇θP(τ|θ)=P(τ|θ)∇θlnP(τ|θ) . Substituting this back into the original expression for the gradient gives

∑τ∈D(P(τ|θ)∇θlnP(τ|θ)R(τ)) ,

and substituting our expression for the gradient of the logarithm of the probability gives

∑τ∈D(P(τ|θ)∑Tt=1∇θlnπθ(at|ot)R(τ)) .

Notice that this is the definition of the expectation value of ∇θlnπθ(at|ot)R(τ) , so writing the sum as an expectation value again we get

Eτ∼πθ[∑Tt=1∇θlogπθ(at|st)R(τ)].

You can then find this expectation value easily by sampling a large number of trajectories (by running the agent in the environment many times), calculating the term inside the brackets, and then averaging over all of the runs.

Neat!

(More sophisticated RL algorithms apply various transformations to the reward to use information more efficiently, and use various gradient descent tricks to use the gradients acquired to converge on the optimal parameters more efficiently)

Here are three statements I believe with a probability of about 1/9:

The two 6-sided dice on my desk, when rolled, will add up to 5.

An AI system will kill at least 10% of humanity before the year 2100.

Starvation was a big concern in ancient Rome’s prime (claim borrowed from Elizabeth’s Epistemic Spot Check post).

Except I have some feeling that the “true probability” of the 6-sided die question is pretty much bang on exactly

^{1}⁄_{9}, but that the “true probability” of the Rome and AI xrisk questions could be quite far from^{1}⁄_{9}and to say the probability is precisely^{1}⁄_{9}seems… overconfident?From a straightforward Bayesian point of view, there is no true probability. It’s just my subjective degree of belief! I’d be willing to make a bet at

^{8}⁄_{1}odds on any of these, but not at worse odds, and that’s all there really is to say on the matter. It’s the number I multiply by the utilities of the outcomes to make decisions.One thing you could do is imagine a set of hypotheses that I have that involve randomness, and then I have a probability distribution over which of these hypotheses is the true one, and by mapping each hypothesis to the probability it assigns to the outcome my probability distribution over hypotheses becomes a probability distribution over probabilities. This is sharply around

^{1}⁄_{9}for the dice rolls, and widely around^{1}⁄_{9}for AI xrisk, as expected, so I can report 50% confidence intervals just fine. Except sensible hypotheses about historical facts probably wouldn’t be random, because either starvation was important or it wasn’t, that’s just a true thing that happens to exist in my past, maybe.I like jacobjacob’s interpretation of a probability distribution over probabilities as an estimate of what your subjective degree of belief would be if you thought about the problem for longer (e.g. 10 hours). The specific time horizon seems a bit artificial (extreme case: I’m going to chat with an expert historian in 10 hours and 1 minute) but it does work and gives me the kind of results that makes sense. The advantage of this is that you can quite straightforwardly test your calibration (there really is a ground truth) - write down your 50% confidence interval, then actually do the 10 hours of research, and see how often the degree of belief you end up with lies inside the interval.