My own most recent pet theory is that the process of branching is deeply linked to thermalization, so to find model systems we should look to things modeling the flow of heat/entropy—e.g. a system coupled to two heat baths at different temperatures.
interstice
It’s easy enough to get a single sensory datum — sample a classical state according to the Born probabilities, sample some coordinates, pretend that there’s an eyeball at those coordinates, record what it sees. But once we’ve done that, how do we get our next sense datum?
This doesn’t seem like it should be too hard—if you have some degrees of freedom which you take as representing your ‘eyeball’, and a preferred basis of ‘measurement states’ for that eyeball, repeatedly projecting onto that measurement basis will give sensible results for a sequence of measurements. Key here is that you don’t have to project e.g. all the electrons in the universe onto their position basis—just the eyeball DOF onto their preferred ‘measurement basis’(which won’t look like projecting the electrons onto their position basis either), and then the relevant entangled DOF in the rest of the universe will automatically get projected onto a sensible ‘classical-like’ state. The key property about the universe’s evolution that would make this procedure sensible is non-interference between the ‘branches’ produced by successive measurements. i.e. if you project onto two different eyeball states at time 1, then at time 2, those states will be approximately non-interfering in the eyeball basis. This is formalized in the consistent histories approach to QM.
What’s somewhat trickier is identifying the DOF that make a good ‘eyeball’ in the first place, and what the preferred basis should be. More broadly it’s not even known what quantum theories will give rise to ‘classical-like’ states at all. The place to look to make progress here is probably the decoherence literature, also quantum darwinism and Jess Riedel’s work.
If you view the laws of physics as the minimal program capable of generating our observations, the Born rule is no more problematic than any other part of the laws of physics. If our universe was sampled according to a different rule, it would look completely different, just the same as if the terms in the Lagrangian were changed.
If a thing has two main distinct parts, it seems reasonable to say that the thing is half part-1 and half part-2. This does not necessarily imply that the parts are equally difficult to create, although that would be a reasonable prior if you didn’t know much about how the parts worked.
I mean, it’s not exactly provable from first principles, but using the architecture of AIXI as a heuristic for what a general intelligence will look like seems to make sense to me. ‘Do reinforcement learning on a learned world model’ is, I think, also what many people think a GAI may end up in fact looking like, e.g., and saying that that’s half decision theory and half predictive model doesn’t seem too far off.
Is there any evidence that this is actually a general inductor, i.e. that as a prior it dominates some large class of functions? From skimming the paper it sounds like this could be interesting progress in ILP, but not necessarily groundbreaking or close to being a fully general inductor. At the moment I’d be more concerned about the transformer architecture potentially being used as (part of) a general inductor.
However to think about Newcomb’s problem entails “casting yourself” as the agent and predictor both, with a theoretically unlimited amount of time to consider strategies for the agent to defeat the predictor, as well as for the predictor to defeat the agent.
I don’t think so. Newcomb’s problem is meant to be a simple situation where an agent must act in an environment more computationally powerful than itself. The perspective is very much meant to be that of the agent. If you think that figuring out how to act in an environment more powerful than yourself is uninteresting, you must be pretty bored, since that describes the situation all of us find ourselves in.
Are you claiming that the problem arises when the agent tries to predict its own behavior, or when the predictor tries to predict the agent’s behavior? Either way, I don’t think this makes Newcomb incoherent. Even if the agent can’t solve the halting problem in general, there are programs that can solve it in specific cases, including for themselves. And the predictor can be assumed to have greater computational resources than the agent, e.g. it can run for longer, or has a halting oracle if you really want the type of the agent to be ‘general Turing machine’, which means it can avoid self-reference paradoxes.
Recognition code 927, I am a potato.
Ah, by ‘unitary’ I mean a unitary operator, that is an operator which preserves the Hilbert measure. It’s an axiom of quantum mechanics that time evolution is represented by a unitary operator.
Fair point about the probable finitude of time(but wouldn’t it be better if our theory could handle the possibility of infinite time as well?)
The argument I made there was that we should consider observer-moments to be ‘real’ according to their Hilbert measure, since that is what we use to predict our own sense-experiences. This does imply that observer-weight will be preserved over time, since unitary evolution preserves the measure(as you say, this also proves it is conserved by splitting into branches, since you can consider that to be projecting onto different subspaces)
Even without unitarity, you shouldn’t expect the total amount of observer-weight to increase exponentially in time, since that would cause the total amount of observer-weight to diverge, giving undefined predictions.
Fair, although I do think your theory might be ultimately self-contradictory ;)
Instead or arguing that here, I’ll link an identical argument I had somewhere else and let you judge if I was persuasive.
Greg Egan.
I don’t think the branching factor of the simulation matters, since the weight of each individual branch decreases as the number of branches increases. The Born measure is conserved by branching.
A 100% chance at $1 million is less valuable than a 1% chance at $100 million
The opposite, right?
I think it is. The post is not intended to be a list of things the author believes, but rather a collection of high-level narratives that various people use when thinking about the impact of their decisions on the world. As such, you wouldn’t really expect extensive evidence supporting them, since the post isn’t trying to claim that they are correct.
Seconding shminux, found this explanation really helpful.
I’m not sure if this counts as a ‘distillation’, but I’d like to see a good overview/history of UDASSA/UDT as approaches to anthropics and metaphysics. I think this is probably the single most significant piece of intellectual progress produced by LW, besides the arguments for AI x-risk. And yet, most users seem to be unaware, judging by the periodic independent re-discoveries of some of the ideas.
(I guess people are familiar with UDT as an acausal decision theory, but I think the applications to anthropics and metaphysics are less well-known, and IMO more interesting)
Fair enough. Personally I enjoy GW’s pragmatic style.
Interesting(and funny!). I would appreciate more posts on this topic or other “gears-y rundown from a lawyer” type posts.