CDT gives into blackmail (such as the basilisk), whereas timeless decision theories do not.
My personal suspicion is that an AI being indifferent between a large class of outcomes matters little; it’s still going to absolutely ensure that it hits the pareto frontier of its competing preferences.
Have you read / are you interested in reading Project Lawful? It eventually explores this topic in some depth—though mostly after a million words of other stuff.
I think “existential risk” is a bad name for a category of things that isn’t “risks of our existence ending.”
I mostly think the phrase “psychologically addictive” is way less clear than necessary to communicate to me.
I think I would write the paragraph as something vaguely like:
“The physiological withdrawal symptoms of Benzodiazepines can be avoided—but often people have a bad time coming of Benzodiazepines because they start relying on them over other coping mechanisms. So doctors try to avoid them.”
It seems possible to come up with something that is both succinct and actually communicates the gears.
The front page going down doesn’t actually make people who want to check the latest posts unable to due so; it’s so easy to circumvent that I think the front page going down is nearly costless
That said I do think the symbolic meaning is neat
This is really just a “what is your utility function and what is your prior on the bonus” question, I guess? There is no clearly correct answer with just the information given.
General relativity seems like a little bit too strong a premise to me
Can this be partially fixed by using uBlock Origin or whatever to hide certain elements of the page? I’d expect it to help at least imperfectly, not sure if you’ve tried it.
I don’t think the point of the detailed stories is that they strongly expect that particular thing to happen? It’s just useful to have a concrete possibility in mind.
I bet this is mostly a training data limitation.
Someone at Google allegedly explicitly said that there wasn’t any possible evidence which would cause them to investigate the sentience of the AI.
I don’t think human level AIs are safe, but I also think it’s pretty clear they’re not so dangerous that it’s impossible to use them without destroying the world. We can probably prevent them from being able to modify themselves, if we are sufficiently careful.
“A human level AI will recursively self improve to superintelligence if we let it” isn’t really that solid an argument here, I think.
I don’t think it is completely inconceivable that Google could make an AI which is surprisingly close to a human in a lot of ways, but it’s pretty unlikely.
But I don’t think an AI claiming to be sentient is very much evidence: it can easily do that even if it is not.
Even if it takes years, the “make another AGI to fight them” step would… require solving the alignment problem? So it would just give us some more time, and probably not nearly enough time.
We could shut off the internet/all our computers during those years. That would work fine.
So you think that, since morals are subjective, there is no reason to try to make an effort to control what happens after the singularity? I really don’t see how that follows.
I don’t understand precisely what question you’re asking. I think it’s unlikely we will happen to solve alignment by any method in the time frame between an AGI going substantially superhuman and the AGI causing doom.
Eliezer’s argument from the recent post:
The reason why nobody in this community has successfully named a ‘pivotal weak act’ where you do something weak enough with an AGI to be passively safe, but powerful enough to prevent any other AGI from destroying the world a year later—and yet also we can’t just go do that right now and need to wait on AI—is that nothing like that exists. There’s no reason why it should exist. There is not some elaborate clever reason why it exists but nobody can see it. It takes a lot of power to do something to the current world that prevents any other AGI from coming into existence; nothing which can do that is passively safe in virtue of its weakness. If you can’t solve the problem right now (which you can’t, because you’re opposed to other actors who don’t want to be solved and those actors are on roughly the same level as you) then you are resorting to some cognitive system that can do things you could not figure out how to do yourself, that you were not close to figuring out because you are not close to being able to, for example, burn all GPUs. Burning all GPUs would actually stop Facebook AI Research from destroying the world six months later; weaksauce Overton-abiding stuff about ‘improving public epistemology by setting GPT-4 loose on Twitter to provide scientifically literate arguments about everything’ will be cool but will not actually prevent Facebook AI Research from destroying the world six months later, or some eager open-source collaborative from destroying the world a year later if you manage to stop FAIR specifically. There are no pivotal weak acts.
So do you think that instead we should just be trying to not make an AGI at all?
I think it is very unlikely that they need so much time as to make it viable to solve AI Alignment by then.
Edit: Looking at the rest of the comments, it seems to me like you’re under the (false, I think) impression that people are confident a superintelligence wins instantly? Its plan will likely take time to execute. Just not any more time than necessary. Days or weeks, it’s pretty hard to say, but not years.