“Finally, note that one way to stop a search from creating an optimization daemon is to just not push it too hard.”—An “optimisation demon” doesn’t have to try to optimise itself to the top. What about a “semi-optimisation demon” that tries to just get within the appropriate range?
I’m confused, I’m claiming determinism, not indeterminism
BTW, I published the draft, although fairness isn’t the main topic and only comes up towards the end.
“But to solve many x-risks we don’t probably need full-blown superintelligence, but just need a good global control system, something which combines ubiquitous surveillance and image recognition”—unlikely to happen in the forseeable future
I’ve actually had similar thoughts myself about why developing AI sooner wouldn’t be that good. Technology isn’t the barrier in most places to human flourishing, but governance.
Prevention of the creation of other potentially dangerous superintelligences
Solving existential risks in general
Further update: Do you want to cause good to be done or do you want to be in a be in a world where good is done? That’s basically what this question comes down to.
It still doesn’t seem like defining a “fair” class of problems is that useful”—discovering one class of fair problems lead to CDT. Another lead to TDT. This theoretical work is seperate from the problem of producing pragmatic algorithms that deal with unfairness, but both approaches produce insights.
“This meta decision theory would itself be a decision theory that does well on both types of problems so such a decision theory ought to exist”—I currently have a draft post that does allow some kinds of rewards based on algorithm internals to be considered fair and which basically does the whole meta-decision theory thing (that section of the draft post was written a few hours after I asked this question which is why my views in it are slightly different).
I don’t quite understand the question, but unfair refers to the environment requiring the internals to be a particular way. I actually think it is possible to allow some internal requirements to be considered fair and I discuss this in one of my draft posts. Nonetheless, it works as a first approximation.
“ASP doesn’t seem impossible to solve (in the sense of having a decision theory that handles it well and not at the expense of doing poorly on other problems) so why define a class of “fair” problems that excludes it?”—my intuition is the opposite, that doing well on such problems means doing poorly on others.
I already acknowledged in the real post that there exist problems that are unfair, so I don’t know why you think we disagree there.
“My thinking about this is that a problem is fair if it captures some aspect of some real world problem”—I would say that you have to accept that the real world can be unfair, but that doesn’t make real world problems “fair” in the sense gestured at in the FDT paper. Roughly, it is possible to define a broad class of problems such that you can have an algorithm that optimally handles all of them, for example if the reward only depends on your choice or predictions of your choice.
“It seems unsatisfactory that increased predictive power can harm an agent”—that’s just life when interacting with other agents. Indeed, in some games, exceeding a certain level of rationality provides an incentive for other players to take you out. That’s unfair, but that’s life.
It doesn’t really explain what happened though?
“probably AGI complete”—As I said, B is equivalently powerful to A, so the idea is that both should be AGIs. If A or B can break out by themselves, then there’s no need for a technique to decide whether to release A or not.
The concept of reachability lets you amplify a policy, select a policy worse than the amplification, then amplify this worse policy. The problem is that a policy that is worse may actually amplify better than a policy that is better. So I don’t find this definition very useful unless we also have an algorithm for selecting the optimal worse policy to amplify. Unfortunately, that problem doesn’t seem very tractable at all.
There could be specific circumstances where you know that another team will release a misaligned AI next week, but most of the time you’ll have a decent chance that you could just make a few more tweaks before releasing.
1) Subjectively distinguishable needs to be clarified. It can either a) that a human receives enough information/experience to distinguish themselves b) that a human will remember information/experience in enough detail to distinguish themselves from another person. The later is more important for real-world anthropics problems and results in significantly more copies.
2) “In most areas, we are fine with ignoring the infinity and just soldiering on in our local area”—sure, but SSA is inherently non-local. It applies over the whole universe, not just the Hubble Volume. If we’re going to use an approximation to handle our inability to model infinities, we should be using a large universe, large enough to break your model, rather than a medium sized one.
A considers A’ to be a different agent so it won’t help A’ for nothing. But there could be some issues with acausal cooperation that I haven’t really thought about enough to have a strong opinion on.