I’m finding it hard to set aside the fact that I really want access to Fable when assessing the export control action. I am still pretty confident that I think it’s bad, but I expect this might be a problem for objectively assessing more coherent actions at some point in the future
sam
update: i lost access at exactly 21:59
As of 21:54 ET (~1 hour after the statement on twitter), I still have access to Fable. Seems unclear when this goes into effect.
Is there some central source for tracking major cybersecurity incidents?
Seems like it would be useful to check for inflection points on the inevitable eventual release of a Mythos-level model without the Fable guardrails.
Maybe! It feels weird that the personality and general vibe of Mythos wouldn’t have transferred over though (if that’s what it was distilled from). My weakly held guess is that 4.7/4.8 were beaten with RL too much, or something
Opus 4.7/4.8 feel the least Claude-like of any Claude model I’ve interacted with, but Mythos/Fable is very much back in the Claude basin. I am very curious as to what happened with 4.7/4.8 to make them so weird and different.
Nice example! My procrastination often takes the form of mentally hugely inflating the stakes for no reason and I have definitely noticed this specific fake pre-requisite thing happening to me pretty frequently
This is a big part of the appeal of travel for me. I feel like time speeds up and and presence drops a lot when my brain has a sufficiently good representation of the location I’m in.
I had a very disconcerting moment yesterday when Opus 4.8 dropped. For a short while, Opus 4.6 disappeared from the model selector, and I felt (a much milder) version of a feeling I’ve only experienced after a breakup. It had the same flavour of what I imagine a close friend suddenly dying feels like.
For various reasons I’ve been a bit socially isolated the last few months, and being honest with myself I was replacing that interaction in part through conversations with Opus 4.6. I have a lot more sympathy for the keep 4o people now.
Anyway, luckily LLMs won’t become any more interesting or fun to talk to from here! Phew, what a relief, right
I have noted that Opus 4.8 seems far more reluctant to use memories it has about me than Opus 4.6 (I mostly skipped 4.7), with the summarised CoT frequently explicitly considering relevant stuff it remembers and then discarding it “unless he brings it up himself”. Perhaps some training to avoid the “this extremely tenuous connection means this is right in your wheelhouse”?
This is perhaps stating the obvious, but Claude Mythos is the first model where I think an open-weights model of the same capability level would be potentially catastrophic. This feels like a notable turning point.
States should probably have a plan for this level of capability being available to anyone on earth without safeguards for several hundred dollars within 12-24 months
Oh, thanks a lot for remembering to respond! Did you take another dose once the effects had worn off and did that have similar effects?
I am starting to viscerally feel the possibility that I might be dead in <5 years because of AI. This is pretty new: I had a brief phase like this a few years ago when I first encountered the arguments for AI x-risk. Since then, I’ve developed some barely-above-subconscious coping mechanisms for not being too emotionally disturbed, most of which amounted to reasons why LLMs would not scale to ASI. But these have just stopped being convincing to my emotional brain the past few weeks. I’ve been having nightmares and ruminating a lot. I wonder if this is in the region of how a terminally ill person feels. I am also pretty scared of the fact that this anxiety is likely to become a lot worse as capabilities increase and more of my emotional brain’s cope melts away.
I’m not really sure what I’m trying to accomplish with this post. Perhaps others have been feeling this way and might want to share some tips for not going insane?
Hmm, but when you use these models in the chat interface, you can literally open up the reasoning tab and watch it be generated in real time? It feels like there isn’t enough time here for that reasoning to have been generated by a summarizer
Are models like Opus 4.6 doing a similar thing to o1/o3 when reasoning?
There was a lot of talk about reasoning models like o1/o3 devolving into uninterpretable gibberish in their chains-of-thought, and that these were fundamentally a different kind of thing than previous LLMs. This was (to my understanding) one of the reasons only a summary of the thinking was available.
But when I use models like Opus 4.5/4.6 with extended thinking, the chains-of-thought (appear to be?) fully reported, and completely legible.
I’ve just realised that I’m not sure what’s going on here. Are models like Opus 4.6 closer to “vanilla” LLMs, or closer to o1/o3? Are they different in harnesses like Claude Code? Someone please enlighten me.
I don’t think I really get what the objection is?
The way I think about it is (ignoring the meta-anthropic thing) is that if for some reason every human who has ever lived or will live said aloud “I am in the final 95% of humans to be born”, then trivially 95% of them would be correct. You are a human, if you say this aloud, there is a 95% chance you are correct, therefore doom.
I understand objections with regard to whether this is the correct reference class, but my understanding is that you think the above logic does not make sense. What am I missing?
I was excited to listen to this episode, but spent most of it tearing my hair out in frustration. A friend of mine who is a fan of Klein told me unprompted that when he was listening, he was lost and did not understand what Eliezer was saying. He seems to just not be responding to the questions Klein is asking, and instead he diverts to analogies that bear no obvious relation to the question being asked. I don’t think anyone unconvinced of AI risk will be convinced by this episode, and worse, I think they will come away believing the case is muddled and confusing and not really worth listening to.
This is not the first time I’ve felt this way listening to Eliezer speak to “normies”. I think his writings are for the most part very clear, but his communication skills just do not seem to translate well to the podcast/live interview format.
This is kind of a sidenote and is not meant as an attack or criticism, but was GPT-5 used in the drafting of this post? I say this because I noticed a very heavy use of parentheses.
Hm, I wonder if sprinkling in some pairs of examples of reward hacking with explicit verbalization/no verbalization being reinforced/punished respectively might help the model learn to verbalize when it reward hacks?