They hired Edward Kmett, Haskell goliath.
Don’t forget OpenAIs undisclosed research program, which according to recent leaks seems to be GPT-2 with more types of data.
And any other secret AI programs out there that are at less risk of leakage because the journalists don’t know where to snoop around. By Merlin, let’s all hope they’re staying in touch with MIRI and/or OpenAI to coordinate on things.
I expect many paths to lead there, though once things start happening it will all be over very fast, one way or the other, before another path has time to become relevant.
I don’t expect this world would survive its first accident. What would that even look like? An AI is rapidly approaching the short time window where its chances of taking over the world are between 1% and 99%, but it discounts utility by a factor of 10 per day, and so as it hits 10% it would rather try its hand than wait a day for the 90%, so we get a containable breakout?
The subagent problem remains: How do you prevent it from getting someone else to catastrophically maximize paperclips and leave it at its power level?
Two priors could indeed start out diverging such that you cannot reach one from the other with finite evidence. Strange loops help here:
One of the hypotheses the brain’s prior admits is that the universe runs on math. This hypothesis predicts what you’d get by having used a mathematical prior from day one. Natural philosophy (and, by today, peer pressure) will get most of us enough evidence to favor it, and then physicist’s experiments single out description length as the correct prior.
But the ways in which the brain’s prior diverges are still there, just suppressed by updating; and given evidence of magic we could update away again if math is bad enough at explaining it.
Yes. Modelspace is huge and we’re only exploring a smidgen. The busy beaver sequence hints at how much you can do with a small number of parts and exponential luck. I think feeding a random number generator into a compiler could theoretically have spawned an AGI in the eighties. Given a memory tape, transformers (and much simpler architectures) are Turing-complete. Even if all my reasoning is wrong, can’t the model just be hardcoded to output instructions on how to write an AGI?
I’m not convinced that utility aggregation can’t be objective.
We want to aggregate utilities because of altruism and because it’s good for everyone if everyone’s AI designs aggregate utilities. Altruism itself is an evolutionary adaptation with similar decision-theoretic grounding. Therefore if we use decision theory to derive utility aggregation from first principles, I expect a method to fall out for free.
Imagine that you find yourself in control of an AI with the power to seize the universe and use it as you command. Almost everyone, including you, prefers a certainty of an equal share of the universe to a lottery’s chance at your current position. Your decision theory happens to care not only about your current self, but also about the yous in timelines where you didn’t manage to get into this position. You can only benefit them acausally, by getting powerful people in those timelines to favor them. Therefore you look for people that had a good chance of getting into your position. You use your cosmic power to check their psychology for whether they would act as you are currently acting had they gotten into power, and if so, you go reasonably far to satisfy their values. This way, in the timeline where they are in power, you are also in a cushy position.
This scenario is fortunately not horrifying for those who never had a chance to get into your position, because chances are that someone that you gave ressources directly or indirectly cares about them. How much everyone gets is now just a matter of acausal bargaining and the shape of their utility returns in ressources granted.
It intuitively seems like you need merely make the interventions run at higher permissions/clearance than the hyperparameter optimizer.
What do I mean by that? In Haskell, so-called monad transformers can add features like nondeterminism and memory to a computation. The natural conflict that results (“Can I remember the other timelines?”) is resolved through the order in which the monad transformers were applied. (One way is represented as a function from an initial memory state to a list of timelines and a final memory state, the other as a list of functions from an initial memory state to a timeline and a final memory state.) Similarly, a decent type system should just not let the hyperparameter optimizer see the interventions.
What this might naively come out to is that the hyperparameter optimizer just does not return a defined result unless its training run is finished as it would have been without intervention. A cleverer way I could imagine it being implemented is that the whole thing runs on a dream engine, aka a neural net trained to imitate a CPU at variable resolution. After an intervention, the hyperparameter optimizer would be run to completion on its unchanged dataset at low resolution. For balance reasons, this may not extract any insightful hyperparameter updates from the tail of the calculation, but the intervention would remain hidden. The only thing we would have to prove impervious to the hyperparameter optimizer through ordinary means is the dream engine.
Have fun extracting grains of insight from these mad ramblings :P
Natural transformations can be composed (in two ways) - how does your formulation express this?
But the pattern was already defined as [original category + copy + edges between them + path equivalences] :(
Now we just take our pattern and plug it into our pattern-matcher, as usual.
Presumably, the pattern is the query category. What is the target category? (not to be confused with the part of the pattern you called target—use different names?)
Sounds like my https://www.lesswrong.com/posts/yEa7kwoMpsBgaBCgb/towards-a-new-impact-measure#XPXRf9RghnsypQi3M :).
That seems silly, given the money on the line and that you can have your ML architecture take this into account.
decided to invest in a high-end studio
decided to invest in a high-end studio
I didn’t catch that this was a lie until I clicked the link. The linked post is hard to understand—it seems to rely on the reader being similar enough to the author to guess at context. Rest assured that you are confusing someone.
So the valuation of any propositional consequence of A is going to be at least 1, with equality reached when it does as much of the work of proving bottom as it is possible to do in propositional calculus. Letting valuations go above 1 doesn’t seem like what you want?
Then that minimum does not make a good denominator because it’s always extremely small. It will pick phi to be as powerful as possible to make L small, aka set phi to bottom. (If the denominator before that version is defined at all, bottom is a propositional tautology given A.)
a magma [with] some distinguished element
minL,ϕ(A,ϕ⊢L⊥) where ϕ is a propositional tautology given A
Propositional tautology given A means A⊢ϕ, right? So ϕ=⊥ would make L small.
An agent might care about (and acausally cooperate with) all versions of himself that “exist”. MWI posits more versions of himself. Imagine that he wants there to exist an artist like he could be, and a scientist like he could be—but the first 50% of universes that contain each are more important than the second 50%. Then in MWI, he could throw a quantum coin to decide what to dedicate himself to, while in CI this would sacrifice one of his dreams.
“I have trouble getting myself doing the right thing, focusing on what selfish reasons I have to do it helps.” sounds entirely socially reasonable to me. Maybe that’s just because we here believe that picking and choosing what x=selfish arguments to listen to is not aligned with x=selfishness.
RAUP is penalized whenever the action you choose changes the agent’s ability to attain other utilities. One thing an agent might do to leave that penalty at zero is to spawn a subagent, tell it to take over the world, and program it such that if the agent ever tells the subagent it has been counterfactually switched to another reward function, the subagent is to give the agent as much of that reward function as the agent might have been able to get for itself, had it not originally spawned a subagent.
This modification of my approach came not because there is no surgery, but because the penalty is |Q(a)-Q(Ø)| instead of |Q(a)-Q(destroy itself)|. QRi is learned to be the answer to “How much utility could I attain if my utility function were surgically replaced with Ri?”, but it is only by accident that such a surgery might change the world’s future, because the agent didn’t refactor the interface away. If optimization pressure is put on this, it goes away.
If I’m missing the point too hard, feel free to command me to wait till the end of Reframing Impact so I don’t spend all my street cred keeping you talking :).
Assessing its ability to attain various utilities after an action requires that you surgically replace its utility function with a different one in a world it has impacted. How do you stop it from messing with the interface, such as by passing its power to a subagent to make your surgery do nothing?