AnnaSalamon comments on Help keep AI under human control: Palisade Research 2026 fundraiser

AnnaSalamon 19 Dec 2025 9:49 UTC
4 points
−4
I do think Palisade is operating in the realm of “trying to persuade people of stuff, and that is pretty fraught”
I haven’t had that much contact with Palisade, but I interpreted them as more like “trying to interview people, see how they think, and provide them info they’ll find useful, and let their curiosities/updates/etc be the judge of what they’ll find useful”, which is … not fraught.

Or rather, as somewhere in between this and “trying to persuade people of stuff”, but close enough to the former that I’m in favor, which I’m usually not for “persuasion” orgs.
Am I wrong?
- Raemon 19 Dec 2025 19:04 UTC
  8 points
  8
  Parent
  I haven’t had that much contact with Palisade, but I interpreted them as more like “trying to interview people, see how they think, and provide them info they’ll find useful, and let their curiosities/updates/etc be the judge of what they’ll find useful”,
  I don’t think I’d describe their work this way. I think they are aiming to be honorable / honest / have integrity but I don’t think they are aiming for this degree of light-touch.
- Eli Tyre 19 Dec 2025 21:32 UTC
  7 points
  0
  Parent
  I haven’t had that much contact with Palisade, but I interpreted them as more like “trying to interview people, see how they think, and provide them info they’ll find useful, and let their curiosities/updates/etc be the judge of what they’ll find useful”, which is … not fraught.
  We have done things that are kind of like that (though I wouldn’t describe it that way), but it isn’t the main thing that we’re doing.
  
  Specifically, in 2025, we did something like 65 test sessions in which met with small groups of participants (some of these were locals, mostly college students, who we met in our office, some were people recruited via survey sites that we met in zoom), and try to explain the AI situation, as we understand it, to them. We would pay these test participants.
  
  Through that process, we could see how these participants were misunderstanding us, and what things they were confused about, and what follow up questions they had. We would then iterate on the content that we were presenting and try again with new groups of participants.
  
  By default, these sessions were semi-structured conversations. Usually, we had some specific points that we wanted explain, or a frame or metaphor we want to try. Often we had prepared slides, and in the later sessions, we were often “giving a presentation” that was mostly solidified down to the sentence level.
  I would not describe this as “provide them info they’ll find useful, and let their curiosities/updates/etc be the judge of what they’ll find useful”.
  That said, the reason we were doing this in small groups is give the participants the affordance to interrupt and ask questions and flag if something seemed wrong or surprising. And we were totally willing to go on tangents from our “lesson plan”, if that seemed like where the participants were at. (Though by the time we had done 15 of these, we had already built up a sense of what the dependencies were, and so usually sticking to the “lesson plan” would answer their confusions faster than deviating, but it was context-dependent, just like any teaching environment.)
  We did also have some groups that seemed particularly engaged/ interested / invested in understanding. We invited those groups back for followup sessions that were explicitly steered by their curiosity: they would ask about anything they were confused about, and we would try to do our best to answer. But these kinds of sessions were the minority, maybe 3 out of 65ish.
  Notably, the point of doing all this is to produce scalable communication products that do a good job of addressing people’s actual tacit beliefs, assumptions, and cruxes about AI. The goal was to learn what people’s background views are, and what kinds of evidence they’re surprised by, so that we can make videos or similar that can address specific common misapprehensions effectively.
- Jan_Kulveit 19 Dec 2025 16:20 UTC
  4 points
  2
  Parent
  Also not much contact, but my impression is you can rougly guess what their research results would be by looking at their overall views and thinking about what evidence you can find to show it. Which seems fair to characterize as advocacy work? (Motivated research?)
  The diff to your description is the info provided is not only conditional on “the info they’ll find useful” but also somewhat on “will likely move their beliefs toward conclusions Palise hopes them to reach”.
  - Jeffrey Ladish 19 Dec 2025 20:05 UTC
    12 points
    8
    Parent
    you can rougly guess what their research results would be by looking at their overall views and thinking about what evidence you can find to show it
    
    I don’t think this frame make a lot of sense. There’s not some clean distinction between “motivated research” and … “unmotivated research”. It’s totally fair to ask whether the research is good, and whether the people doing the research are actually trying to figure out what is true or whether they have written their conclusion at the bottom of the page. But the fact that we have priors doesn’t mean that we have written our conclusion at the bottom of the page!
    
    E.g. Imagine a research group who thinks cigarettes likely cause cancer. They are motivated to show that cigarettes cause cancer, because they think this is true and important. And you could probably guess the results of their studies knowing this motivation. But if they’re good researchers, they’ll also report negative results. They’ll also be careful not to overstate claims. Because while it’s true that cigarettes cause cancer, it would be bad to do publish things that have correct conclusions but bad methodologies! It would hurt the very thing that the researchers care about—accurate understanding of the harms that cigarettes cause!
    
    My colleagues and I do not think current models are dangerous (for the risks we are most concerned about—loss of control risks). We’ve been pretty clear about this. But we think we can learn things about current models that will help us understand risks from future models. I think our chess work and our shutdown resistant work demonstrate some useful existence proofs about reasoning models. They definitely updated my thinking about how RL training shapes AI motivations. And I was often not able to predict in advance what models would do! I did expect models to rewrite the board file given the opportunity and no other way to win. I wasn’t able to predict exactly which models would do this and which would not. I think it’s quite interesting that the rates of this behavior were quite different for different models! I also think it’s interesting how models found different strategies than I expected, like trying to replace their opponent and trying to use their own version of stockfish to get moves. It’s not surprising in retrospect, but I didn’t predict it.
    
    Our general approach is to try to understand the models to the best of our ability and accurately convey the results of our work. We publish all our transcripts and code for these experiments so others can check the work and run their own experiments. We think these existence proofs have important implications for the world, and we speak about these very publicly because of this.