Is that purely because they think AI-driven-extinction is almost certain or is it a combination of that and “even if we survive we probably won’t need retirement money anyway”?
iamthouthouarti
Let’s hope that continues
Are you at all worried about whether Claude Mythos being accidentally trained against CoT will corrupt future Claude models? Furthermore, I don’t understand how we can get reliable CoT monitoring if it’s included in a model’s training data, otherwise won’t the issue just continue to manifest in different ways?
But wait, wouldn’t doing things like saving for retirement still make sense? Or is p(we all die) really that high
Thanks for clarifying. I thought it might be something like that but wasn’t sure.
or that you shouldn’t decide how much to invest in impact based on the overall survival probability (I’ve been playing a lot of video games)
I don’t really understand what the video games comment has to do with what was preceding it.
If you get my argument, can you steelman it?
I get that your argument is essentially as follows:
1.) Solving the problem of what values to put into an ai, even given the other technical issues being solved, is impossibly difficult in real life.
2.) To prove the problem’s impossible difficulty, here’s a much kinder version of reality where the problem still remains impossible.
I don’t think you did 2, and it requires me to already accept 1 is true, which I think it probably isn’t, and I think that most would agree with me on this point, at least in principle.
Which of these four things do you disagree with?
I don’t disagree with any of them. I doubt there’s a convincing argument that could get me to disagree with any of those as presented.
What I am not convinced of, is that given all those assumptions being true, certain doom necessarily follows, or that there is no possible humanly tractable scheme which avoids doom in whatever time we have left.
I’m not clever enough to figure out what the solution is mind you, nor am I especially confident that someone else is necessarily going to. Please don’t confuse me for someone who doesn’t often worry about these things.
I think everyone sane agrees that we’re doomed and soon.
Even as a doomer among doomers, you, with respect, come off as a rambling madman.
The problem is that the claim you’re making, such that alignment is so doomed that Eliezer Yudkowsky, one of the most if not the most of pessimistic voices among alignment people, is still somehow over optimistic about humanity’s prospects, is unsubstantiated.
It’s a claim, I think, that deserves some substantiation. Maybe you believe you’ve already provided as much. I disagree.
I’m guessing you’re operating on strong intuition here; and you know what, great, share your model of the world! But you apparently made this post with the intention to persuade, and I’m telling you you’ve done a poor job.
EDIT: To be clear, even if I were somehow granted vivid knowledge of the future through precognition, you’d still seem crazy to me at this point.
I’m just trying to destroy the last tiny shreds of hope.
In what version of reality do you think anyone has hope for an ai alignment Groundhog Day?
I’m sorry if I’m misunderstanding- but is your claim that Yudkowsky’s model actually does tell us for certain, or some extremely close approximation of ‘certain’, about what’s going to happen?
As I was reading this, I remembered that we had a conversation about your timelines about a year ago, I think. If I recall correctly they were already short (~50% before 2030?). Have they dropped further since then?
I accept that trying to figure out the overall tractability of the problem far enough in advance isn’t a useful thing to dedicate resources to. But nevertheless, researchers seem to have expectations when it comes to alignment difficulty regardless, despite not having a “clearer picture”. For the researchers who think that alignment is probably tractable, I would love to hear about why they think so.
To be clear, I’m talking about researchers who are worried about AI x-risk but aren’t doomers. I would like to gain more insight into what they are hoping for, and why their expectations are reasonable.
This comment got me to change the wording of the question slightly. “so many” was changed to “most”.
You answered the question in good faith, which I’m thankful for, but I don’t feel your answer engaged with the content of the post satisfactorily. I was asking about the set of researchers who think alignment, at least in principle, is probably not hopeless, who I suspect to be the majority. If I failed to communicate that, I’d definitely appreciate if you could give me advice on how to make my question more clear.
Nevertheless I do agree with everything you’re saying, though we may be thinking of different things here when we use the word “many”.
[Question] Can someone explain to me why most researchers think alignment is probably something that is humanly tractable?
And then there’s me who was so certain until now that any time people talk about x-risk they mean it to be synonymous with extinction. It does make me curious though, what kind of scenarios are you imagining in which misalignment doesn’t kill everyone? Do more people place a higher credence on s-risk than I originally suspected?
Thank you! I think I understand this position a good deal more now.
“the presence of which I take the OP to describe as reassuring”
I get the sense from this, and from the rest of your comment here that you think we should in fact not find this even mildly reassuring. I’m not going to argue with such a claim, because I don’t think such an effort on my part would be very useful to anyone. However, if I’m not completely off base or I’m not overstating your position (which I totally could be) , then could you go into some more detail as to why you think that we shouldn’t find their presence reassuring at all?
I never meant to claim that my position was “clever people don’t seem worried so I shouldn’t be”. If that’s what you got from me, then that’s my mistake. I’m incredibly worried as a matter of fact, and much more importantly, everyone I mentioned also is to some extent or another, as you already pointed out. What I meant to say but failed to was that there’s enough disagreement in these circles that near-absolute confidence in doom seems to be jumping the gun. That argument also very much holds against people who are so certain that everything will go just fine.
I guess most of my disagreement comes from 4. Or rather, the implication that having an exact formal specification of human values ready to be encoded is necessarily the only way that things could possibly go well. I already tried to verbalize as much earlier, but maybe I didn’t do a good job of that either.
I apologize for my ignorance, but are these things what people are actually trying in their own ways? Or are they really trying the thing that seems much, much crazier to me?
This is becoming less and less about the actual OP, but I really do still want to ask—do you think it is a near-certainty though? (Like >99% chance of AI killing us all soon I mean)