Does anyone have thoughts on Muse Spark that they’ve written up? Do we have any speculations on what its good at/it’s size/whether it has good/bad post-training?
kaiwilliams
The default advice blogging advice I’ve heard is that “obvious” topics often make for good posts because they are often non-obvious to readers, so one should strongly default towards posting if the concern that it is too obvious.
But maybe you’re making this judgement even with that prior in mind? I’d be curious to see one of these “obvious” posts.
I realize that part of the goal is to make the LLM portions unobtrusive, but would it be possible to make LLM sections have a collapse button at the top? (Or bottom). By default they can be open.
When reading current LessWrong posts that have LLM sections, I find myself mostly skipping LLM sections and appreciate when someone has placed them in a collapsible.
I was going to comment this about Vermont too, though perhaps a bit less intense than upstate NY. One of the culture shocks of moving away from VT for college was realizing that not many other Americans had faced the feeling that the way of life was ending.
Cool work! I would be curious to see what attractor states exist between models? What happens if you plug up a Gemini model with a Claude model, e.g.?
I was just dragged through Demons for a book club, so I was amused to read this. At least it means the time I spent reading that wasn’t in vain.
There’s some stuff that feels a little bit weird here. The author says they left in early 2024 and then spent the “following months” reading Dostoevsky and writing this essay. Was the essay a bit older and only got put up? (Has to be relatively recently edited, if it was run through 4.5). Who are the editors alluded to at the very end? Is it supposed to be Tim Hwang? A little bit more transparency would be much appreciated (the disclaimer about Opus 4.5 being used for anonymization was only added on the 24th after some people had pointed out that it sounded rather AI-written.).
Another weirdness: why did Hwang put up another microsite about Demons that’s written by an anonymous author “still working in industry” that has clear LLM-writing patterns at basically the same time? https://shigalyovism.com/. Though this one is much less in-depth.
Can anyone with more experience in the frontier labs/the uniparty give a sanity check for whether this seems like it was written by someone who is who they say they are?
Ah right right—I remember reading that post. The subscribe form using dynomiiiiiiiiiight makes sense, especially given how I prompted Llama: I pasted the post in and then appended Author:
I am curious if there’s a way to get an instruction tuned model to role play being a base model, and see if they do better at truesight than regular instruction tuned models. Like, why do chat models get worse? Is it that the assistant character is bad at that? Plenty of interesting questions here.
Llama 3.1 405B base: dynomiiiiiiiiiight
I resampled it a couple times and it added a couple of i’s to your handle consistently (despite getting your url dynomight.net, so it clearly knows you). Not quite sure why. Weird that base models are so much better at this.
Thanks for compiling! It feels apt that the name of the top caller is Will Mentor.
Do Lesswrong quick takes count as social media? :)
On this dataset, I find that Gemini 3 Pro gets 60% of 2-hop questions right and 34% of 3-hop questions right.
I initially got tripped up by the wording here: I thought this was 60% accuracy on 2-hop questions in a forward pass, not with 300 filler tokens, which aren’t mentioned until later in the post.
It’s a good piece, but wanted to comment in case someone else gets confused at the same spot.
I came here to comment the exact same thing. I wonder if 2-hop latent reasoning is correlated well with Simple-QA scores.
Kudos to Deepmind for being the first to release output watermarking and a semi-public detector. You can nominally sign up for it here.
Afaict, some of this is now in the Gemini app. But if not, feel free to ping me (I have access).
The only public instance of this change being pointed out was a LessWrong comment by someone unaffiliated with Anthropic.
Nitpick: an outside reporter also noticed this on the day of the release and wrote up a story on it. It didn’t seem to get much traction though.
I thought the “past chats” feature was a tool to look at previous chats, which only happens if the user asks for it, basically. (I.e., there wasn’t a change to the system prompt). So I’m a bit surprised that it seemed to make a difference around sycophancy for you? But maybe I’m misunderstanding something.
Labor costs are much higher in the US, which I think plays into this. So it’s easier in Europe to not be reliant on the credit card model.
Response to your thoughts after the yoda timer
Why are you so certain it’s dangerous to try once even at the beginning? My guess is that it won’t immediately be particularly compelling, but get more so over time as they have time to do RL on views or whatever they are trying to do.
But I also have a large error bar. This might, in the near future, be less compelling than either of us expect. It’s genuinely difficult to make compelling products, and maybe Sora 2 isn’t good enough for this.
I’m more concerned about Youtube Shorts to be honest, in the long term.
How does ARC-AGI’s replication of the HRM result and ablations update you? [Link].
Basically, they claim that the HRM wasn’t important; instead it was the training process behind it that had most of the effect.
I wonder if there’s a question-asking game, preferably one-on-one that would encourage this? Something akin to NYT’s 44 questions to make anyone fall in love, but instead 44 questions to stare into the abyss. Getting the right interlocutor and the right questions would be hard to do though.
It’s not a game, but it is a structured activity.
I’m skeptical that you can really get the abyss in small doses. Maybe there’s also a progressive activity where the first exercises are small things to admit about oneself, before progressing to more and more difficult questions.