Because it’s obviously annoying and burning the commons. Imagine if I made a bot that posted the same comment on every post of less wrong, surely that wouldn’t be acceptable behavior.
John Steidley
The finish was quite a jump for me. I guess I could go and try to stare at your parenthesis and figure it out myself, but mostly I feel somewhat abandoned at that step. I was excited when I found 1, 2, 4, 8… = −1 to be making sense, but that excitement doesn’t quite feel sufficient for me to want to decode the relationships between the terms in those two(?) patterns and all the relevant values
Zack, the second line of your quoted lyrics should be “I guess *we already...”
I’m currently one of the four members of the core team at CFAR (though the newest addition by far). I also co-ran the Prague Workshop Series in the fall of 2022. I’ve been significantly involved with CFAR since its most recent instructor training program in 2019.
I second what Eli Tyre says here. The closest thing to “rationality verification” that CFAR did in my experience was the 2019 instructor training program, which was careful to point out it wasn’t verifying rationality broadly, just certifying the ability to teach one specific class.
I wasn’t replying to Quintin
I can’t tell what you mean. Can you elaborate?
I think this comment would be better placed as a reply to the post that I’m linking. Perhaps you should put it there?
My summary: Give gifts using the parts of your world-model that are strongest. Usually the answer isn’t going to end up being based on your understanding of their hobby.
Window AC units don’t actually pull air from outside.
https://homeairguides.com/how-does-a-window-air-conditioner-work/
Hey, I’ve been looking into air quality quite a bit recently. I have several questions.
What air quality sensor are you using? How are you getting outdoor data?
I suspect some of the confusion in the results may be due to circulation within the home and monitor placement. Have you thought much about circulation?
Additionally, it looks like indoor PM2.5 is tracking outdoor PM2.5. Have you thought much about other sources of ventilation?
It doesn’t sound hard at all. The things Gwern is describing are the same sort of thing that people do for interpretability where they, eg, find an image that maximizes the probability of the network predicting a target class.
Of course, you need access to the model, so only OpenAI could do it for GPT-3 right now.
I’ve was thinking along similar lines!
From my notes from 2019-11-24: “Deontology is like the learned policy of bounded rationality of consequentialism”
Welcome!
(I work at Palisade)
I claim that your summary of the situation between Neel’s work and Palisade’s work is badly oversimplified. For example, Neel’s explanation quoted here doesn’t fully explain why the models sometimes subvert shutdown even after lots of explicit instructions regarding the priority of the instructions. Nor does it explain the finding that moving instructions from the user prompt to the developer prompt actually /increases/ the behavior.
Further, that CoT that Neel quotes has a bit in it about “and these problems are so simple”, but Palisade also tested whether using harder problems (from AIME, iirc) had any effect on the propensity here and we found almost no impact. So, it’s really not as simple as just reading the CoT and taking the model’s justifications for its actions at face value (as Neel, to his credit, notes!).
Here’s a twitter thread about this involving Jeffrey and Rohin: https://x.com/rohinmshah/status/1968089618387198406
Here’s our full paper that goes into a lot of these variations: https://arxiv.org/abs/2509.14260