Substack: https://substack.com/@simonlermen
X/Twitter: @SimonLermenAI
Substack: https://substack.com/@simonlermen
X/Twitter: @SimonLermenAI
In the past, one person has brought up the cleaner wrasse thing with me as a kind of proof that sentience must be widespread among even the least intelligent animals. That paper seems to do experiments like showing the fish a photograph of itself where it has a mark on it’s belly and then observe the fish “scraping” its belly (touching the sand of the aquarium) -- despite not having a mark on its body. This seems obviously different from the regular mirror test and it’s unclear if these fish even have such good visual abilities of non-moving objects. And they seem to believe in their paper this proves an even higher ability of self-recognition (so while chimps can only recognize themselves when they move the fish can just recognize their own body from a photograph, like a human could). “This is largely because explicit tests of the two potential mechanisms underlying MSR are still lacking: mental image of the self and kinesthetic visual matching. Here, we test the hypothesis that MSR ability in cleaner fish, Labroides dimidiatus, is associated with a mental image of the self, in particular the self-face, like in humans.”
I don’t even have to spell out how the priors look here, but this doesn’t even seem theoretically possible? How’d the fish know what it looks like? I mean the simplest explanation is that they just use a small sample size and the fish just randomly started doing that. It also doesn’t seem to me that the fish really spent some time studying it’s photograph, could just be a case of letting the camera roll for a while until the fish once swims by the photo and then touches the sand a few times:
Not commenting about the whole conflict here. But I remember being once told by an EA member to better not go to some event with policymakers in the UK because my views sound a little too crazy.
threw a Molotov cocktail at Sam Altman
It was thrown at his home
I guess you are right
So how is that all going to hold up when you start your 100k AI researchers later this year or do fully automated AI research soon after? If alignment will still hold up, who is going to verify what that nation of geniuses produces at superhuman speed? I think you are already optimistic in seeing any progress of alignment. There are clear discontinuities here: when are the models smarter than the researchers and can easily trick them, when can models do their own research all on their own, when do they have a meaningful shot at takeover or catastrophic risks? Hitting these discontinuities does not work well with a wait-and-see strategy.
Edit: I think there is an argument to be made that 1) something like RSI/handing off AI research will totally break what exists of AI alignment and 2) there are predictably big threshold effects in the future (https://www.lesswrong.com/posts/JqrZxQwmqmoCWXXxC/ai-can-suddenly-become-dangerous-despite-gradual-progress) such as when it gets smarter than human researchers – such that the AI can easily trick them – incremental alignment strategies won’t survive. However, as it stands, I am not making this point very well here and it just sounds too much like sneering for my comfort when there are valuable things here.
Do you have an example you could share here or privately of him being rude? The more I look into his stuff seems he regularly mocks other people and blocks anyone challenging him. I mean I appreciate him saying the obvious on the anthropic dow situation as a former trump admin guy.
I see the difference, and have updated my comment accordingly. he believes it is highly unlikely not impossible, though unclear what he means exactly (<1% perhaps?). I didn’t say he didn’t engage, just from your tweets it is not so clear he updated meaningfully; he did update his timelines probably based on recent advances. I still assume he talks about catastrophe in the “major disaster” kind of way, which is an unfortunate effect of using an unclear term here. Dean isn’t shy to use partisan/mocking language himself. i don’t like the idea of being talked about with mocking language but being unable to shoot back in a similar style but opinions may vary..
Thanks for the quote
He is simply updating his timelines
, and since then said AI causing human extinction is only “highly unlikely”,
Basically still supporting my thesis, I don’t see any reason he updated here because he says highly unlikely now.
then even more recently said that “ai present catastrophic risks” and “alignment may become a more central issue for me again depending on how well alignment seems to work for smarter-than-human widely deployed ai”.
I think you are misreading him here. From reading the rest of his stuff and his response I would say he is merely referring to AI causing a “catastrophe” like a major disaster. similar to a tornado ripping through.a town, AI hacking all the airports.
Thanks for the tweets again but I don’t see clear evidence here that engaging with the community on twitter has updated him much.
This is the kind of rhetoric Dean supports and praises: https://x.com/deanwball/status/2026325817291104728
“This instinct seems to infect the far left across lots of domains: immigration, crime fighting, and the national debt to name a few. You can tell they’re just sort of yearning to submit our society to outside forces: mobs, international councils, or communist China. … They don’t believe in order, except brutal order under their heels.” – blaming resistance to AI datacenters on far left lunatics.
This new post is also not exactly free of mocking language:
““the AI safety community” is that artificial superintelligence will be able to “do anything.” Now, most people in this world are much too smart to say literally these words, and so it might be fairer to put my criticism this way: “many people in ‘the AI safety community’ are way too willing to resort to extreme levels of hand-waviness when it comes to the supposed capabilities of superintelligent AI.” The tautological pattern of the AI safetyist mind is easy enough to recognize once you encounter it a few times: “Well of course superintelligence will be able to do that. After all, it’s superintelligence. And because superintelligence will obviously be able to do that, you must agree with me that banning superintelligence is an urgent necessity.””
So I feel like he should be able to handle my tone here, but will possibly adjust it a bit.
I don’t know but like it spawns a new pathogen each week, each very contagious and deadly. then it spreads pathogens that cause mass crop death. then come ai drones picking up larger groups of survivors. Then ground robots, small airborne drones. Then climate change +10C. One after the other. Whats impossible here?
What exactly did he update? I saw that post where he apparently shortened his timelines?
Thanks for picking out these quotes from him. However, I do think they seem to support my model of his views pretty much.
To be clear: I understand that he believes ai could cause catastrophic risk, but he’s probably thinking about serious catastrophic events here. Like money is lost, some people die.
I mean there are paths where non-superintelligence kills us. Like looks plausibly that we will just hand AI control over the military and give it direct access to bio labs.
that’s fair, i’ll think about softening this piece. though i don’t think he is engaging very well with other people here. Like the way he talks about the “doomers” is clearly mocking too:
“One common assumption (though less prevalent with time) among many people in “the AI safety community” is that artificial superintelligence will be able to “do anything.” Now, most people in this world are much too smart to say literally these words, and so it might be fairer to put my criticism this way: “many people in ‘the AI safety community’ are way too willing to resort to extreme levels of hand-waviness when it comes to the supposed capabilities of superintelligent AI.” The tautological pattern of the AI safetyist mind is easy enough to recognize once you encounter it a few times: “Well of course superintelligence will be able to do that. After all, it’s superintelligence. And because superintelligence will obviously be able to do that, you must agree with me that banning superintelligence is an urgent necessity.””
– from his 2023 text (that’s the name, he just posted it)
To answer briefly here: My understanding of Dean’s position is that he totally rules out believes it is highly unlikely that AI could wipe out humanity mainly based on this “superintelligence is not omnipotent” argument. we specifically seems to believe that superintelligence won’t ever gain the capability to do so. This is the superintelligence is going to be weak view. But it is pretty apparent to me that much less than superintelligence is sufficient to kill us. I don’t believe that AI strictly need to something along the lines of “exfiltration, persuasion, and scheming”, there are many ways for it to win. Clearly, there exist such ways, it is not impossible purely because ASI isn’t omnipotent.
Edit: he believes it is “highly unlikely” not impossible
I think plenty of their researchers work on the weekends too – trying to end the world must be quite the motivation
“Thanks to Claude 4.5 Sonnet for help and feedback. No part of the text was written by AI models.”
Could you describe a bit more how you used Claude, how they ideation took place?
My fear is that I would start out with a fuzzy idea of a circuit lookup table, and then talk to Claude and it eventually convinced me that this has massive implications for alignment. But I remain highly skeptical of this, there is a high risk of deviating into vibe thinking. I think that your arguments at multiple point leave the realm of valid reasoning and draw these wide unsupported conclusions. This is an easy way AI assisted alignment might fail.
Alignment: My guess is that most or almost all of these circuits are individually aligned through bog-standard RLHF/Constitutional AI. This works because the standard problems of edge instantiation and Goodhart’s law don’t show up as strongly, because the optimization mainly occurs by either:
I for example don’t concretely see what you are actually saying here, these circuits supposedly perform some aspect of some task and they are aligned each? aligned as in with the model spec, but this circuit does something like addition or some fact lookup?
I think this model is mostly correct, and also has implications for capabilities progress/the need to switch to another paradigm/overhaul parts of the current paradigm to reach wildly superhuman capabilities.
Again here i see an enormous conceptual leap, going from this very vague model to giving a very vague limitation of the current paradigm.
Another such leap (David Mannheim already posted this one):
For example:
The token bottleneck is real.
Sure, and so are limits like short term memory for humans. Doesn’t stop us.
I would be careful about this vibe based thinking, increasingly one benefit of humans might that they are less sycophantic than LLMs even if they are as smart so don’t take my critique too harshly here.
Palisade research has an ongoing fundraiser with 900k available matching funds from SFF, seems possible to get counterfactual matching here.
I briefly worked for Palisade Research as a contractor in the past and was a MATS student for Jeffrey in the past. I believe Jeffrey gets AI alignment difficulty and Palisade is doing important work reaching out to policy makers and communications to the public. In particular, he gets that we are possibly very close to RSI and the time to existentially dangerous superhuman AI could be very short from there.
Read more about it here: https://x.com/JeffLadish/status/2033990617622319490
Yeah I probably wouldn’t have included the birds and stones metaphor if it was up to me and would have just explained the idea
I don’t know what Eliezer thinks about this but the problem appears to me that a lot of those things cancel out:
Advancing alignment research < Advancing capabilities research
Hardening society, making AI takeover more difficult <(?) making us reliant on AI/ making them harder to shut down/ AIs damaging society (like massive scams making it harder for real humans to trust each other)
Demonstrating risks < demonstrating gains, making AI labs rich and able to bribe the government
Improving epistemics < damaging epistemics through widespread deepfakes, bots, etc