I’m a computational cognitive scientist studying decision-making and introspection in humans and LLMs. I earned my PhD at Harvard in 2022, and have been a postdoc at Princeton since then.
Adam Morris
Tests of LLM introspection need to rule out causal bypassing
Is there reason to think that Bores or Wiener are not trustworthy or lack integrity? Genuine question, asking because it could affect my donation choices. (I couldn’t tell from your post if there were, e.g., rumors floating around about them, or if you were just using this as an example of a key question that you thought was missed in Neyman’s analysis.)
Self-interpretability: LLMs can describe complex internal processes that drive their decisions
Got it. Okay thanks!
Earnest question: For both this & donating to Alex Bores, does it matter whether someone donates sooner rather than a couple months from now? For practical reasons, it will be easier for me to donate in 2026--but if it will have a substantially bigger impact now, then I want to do it sooner.
One small suggestion: When I read this, I genuinely couldn’t tell whether “Gray swans: None detected this week” was a joke (like you were pretending to look for literal gray/black swans), or if it meant something serious. After reading your website, my guess is that it’s meant to be serious—but I’m still not sure, and if it is serious then I don’t know what it means. (My understanding is that “black swan” means an unexpected, highly improbable / out of distribution event, so it wasn’t clear to me what it would mean in this context to be generally looking for global gray/black swans.) Might be worth clarifying or finding other terminology, if you want readers like me to quickly grok what you mean.
We haven’t had one yet! But we only did it ~3 times. Obviously people are more careful than they’d normally be while dancing on the slippery floor.
I’ll add to this list: If you have a kitchen with a tile floor, have everyone take their shoes off, pour soap and water on the floor, and turn it into a slippery sliding dance party. It’s so fun. (My friends and I used to call it “soap kitchen” and it was the highlight of our house parties.)
Printable book of some rationalist creative writing (from Scott A. & Eliezer)
I see, that makes sense. Thank you!
Can you help me see this point? Why not correct it in the dataset? (Assuming that the dataset hasn’t yet been used to train any models)
I’m long overdue here, but thank you so much for doing this!! I’ve been wanting this for a long time and just discovered this post :)
see my comment above—I (ironically) meant aphasia
hahaha I actually also meant aphasia :P
This is ~even more~ anecdotal, but me and several of my friends have noticed increased anosmia since the pandemic, but critically starting before any of us got covid (and including friends who never got it). We conjectured that it could be from some combination of very high stress levels for a long time + social isolation? Just to add some data points to the mix.
[Question] How are you currently modeling COVID contagiousness?
Pretty much all the writing I’ve read by Holocaust survivors says that this was not true, that the experience was unambiguously worse than being dead, and that the only thing that kept them going was the hope of being freed. (E.g. according to Victor Frankl in “Man’s Search for Meaning”, all the prisoners in his camp agreed that, not only was it worse than being dead, it was so bad that any good experiences after being freed could not make up for it how bad it was. Why they didn’t kill themselves is an interesting question that he explores a bit in the book.) Are there any Holocaust survivors who claim otherwise?
Thanks for the thoughtful response, that perspective makes sense. I take your point that ACT-R is unique in the ways you’re describing, and that most cognitive scientists are not working on overarching models of the mind like that. I think maybe our disagreement is about how good/useful of an overarching model ACT-R is? It’s definitely not like in physics, where some overarching theories are widely accepted (e.g. the standard model) even by people working on much more narrow topics—and many of the ones that aren’t (e.g. string theory) are still widely known about and commonly taught. The situation in cog sci (in my view, and I think in many people’s views?) is much more that we don’t have an overarching model of the mind in anywhere close to the level of detail/mechanistic specificity that ACT-R posits, and that any such attempt would be premature/foolish/not useful right now. Like, I think if you polled cognitive scientists, the vast majority would disagree with the title of your post—not because they think there’s a salient alternative, but because they think that there is no theory that even comes close to meriting the title of “best-validated theory of cognition” (even if technically one theory is ahead of the others). Do you know what I mean? Of course, even if most cognitive scientists don’t believe in ACT-R in that way, that alone doesn’t mean that ACT-R is wrong.. I’m curious about the evidence that Terry is talking about above. I just think the field would look really, really different if we actually had a halfway-decent paradigm/overarching model of the mind. And it’s not like ACT-R is some unknown idea that is poised to take over the field once people learn about it. Everyone knew about it in the 90s, and then it fell out of widespread use—and my prior on why that happened is that people weren’t finding it super useful. (Although like I said, I’m really curious to learn more about what Terry/other contemporary people are doing with it!)
No you’re right, it doesn’t say how they should be combined. My assumption—and I suspect the assumption of the authors—is that we have no good widely-accepted overarching model of the mind, and that the best we can agree on is a list of ingredients (and even that list was controversial, e.g. in the commentaries on the paper). I think that’s the reason I, implicitly, was viewing the paper as a contemporary alternative to ACT-R. But I take your point that it’s doing different things.
Fascinating point, I think you’re right. Just to repeat your point in my own words: The problem is that, if the activation steering makes the model want to talk about the injected concept, and if it knows that saying “yes, I received an injection” will give it a chance to talk about the concept later in the response, then it will say “yes” in order to talk about the concept later (even if it actually had no metacognitive awareness of the injection). Is that what you’re saying?