I feel like… no, it is not very interesting, it seems pretty trivial? We (agents) have goals, we have relationships between them, like “priorities”, we sometimes abandon goals with low priority in favor of goals with higher priorities. We also can have meta-goals like “how should my systems of goals look like” and “how to abandon and adopt intermediate goals in a reasonable way” and “how to do reflection on goals” and future superintelligent systems probably will have something like that. All of this seems to me coming in package with concept of “goal”.
quetzal_rainbow
“A reasonably powerful world model” would correctly predict that being FDT agent is better than being EDT and modify itself into FDT agent, because there are several problems where both CDT and EDT fail in comparison with FDT (see original FDT paper).
Like this?
What about things like fun, happiness, eudamonia, meaning? I certainly think that excluding brain damage/very advanced brainwashing, you are not going to eat babies or turn planets into paperclips.
In real world, most of variables correlates with each other. If you take action that correlates most with high utility, you are going to throw away a lot of resources.
“Estimate overconfidence” implies that estimate can be zero!
I think the problem here is distinguishing between terminal and instrumental goals? Most of people probably don’t run apple pie business because they have terminal goals about apple pie business. They probably want money, status, want to be useful and provide for their families and I expect this goals to be very persistent and self-preseving.
How about: https://arxiv.org/abs/2102.04518
“Under current economic incentives and structure” we can have only “no alignment”. I was talking about rosy hypotheticals. My point was “either we are dead or we are sane enough to stop, find another way and solve problem fully”. Your scenario is not inside the set of realistic outcomes.
You are conflating two definitions of alignment, “notkilleveryoneism” and “ambitious CEV-style value alignment”. If you have only first type of alignment, you don’t use it to produce good art, you use it for something like “augment human intelligence so we can solve second type of alignment”. If your ASI is aligned in second sense, it is going to deduce that humans wouldn’t like being coddled without capability to develop their own culture, so it will probably just sprinkle here and there inspiring examples of art for us and develop various mind-boggling sources of beauty like telepathy and qualia-tuning.
I should note that while your attitude is understandable, event “Roko said his confident predictions out loud” is actually good, because we can evaluate his overconfidence and update our models accordingly.
“The board definitely isn’t planning this” is not the same as “the board have zero probability of doing this”. It can be “the board would do this if you apply enough psychological pressure through media”.
I think these are, judging from available info, kinda two opposite stories? The problem of SBF was that nobody inside EA was in position to tell him “you are an asshole who steals clients money, you are fired”.
More general, any attempts to do something more effective will blow up a lot of things because trying to something more effective than business-as-usual is an outside-distribution problem and you can’t simply choose to not go outside.
This is so boring that it’s begging for responce with “Yes, We Have Noticed The Skulls”
It’s a problem in a sense that you need to make your systems either weaker or very expensive (in terms of alignment tax, see, for example, davidads’ Open Agency Architecture) relative to unconstrained systems.
“A source close to Altman” means “Altman” and I’m pretty sure that he is not very trustworthy party at the moment.
The main problem I see here is generality of some niches that are more powerful. For example, nanotech-designer AI can start by thinking only about molecular structures, but eventually it stumbles upon situations “I need to design nanotech swarm that is aligned to constructor goal” or “what if I am a pile of computing matter that was created by other nanotech swarms (technically, all multicellular life is a multitude of nanotech swarms)”, “what if my goal is not aligned with goal of nanotech swarm that created me”, etc.
Their Responsible AI team was in pretty bad shape after recent lay-offs. I think Facebook just decided to cut costs.
The trick with FDT is that FDT agents never receive the letter and never pay. FDT payoff is p*(-1000000), where p is a probability of infestation. EDT payoff is p*(-1000000) + (1-p)*(-1000), which seems to me speaking for itself.