So when I look at palisaderesearch.org from bottom to top, it looks like a bunch of stuff you published doesn’t have much at all to do with Agentic AI but does have to do with scary stories, including some of the stuff you exclude in this paragraph:
...
That’s actually everything on your page from before 2025. And maybe… one of those kinda plausibly about Agentic AI, and the rest aren’t.
Yep, I think that’s about right.
I joined Palisade in November of 2024, so I don’t have that much context on stuff before then. Jeffrey can give a more informed picture here.
But my impression is that Palisade was started as a “scary demos” and “cyber evals” org but over time its focus has shifted and become more specific as we’ve engaged with the problem.
We’re still doing some things that are much more in the category of “scary demos” (we spent a bunch of staff time this year on a robotics project for FLI that will most likely not really demonstrate any of our core argument steps). But we’re moving away from that kind of work. I don’t think that just scaring people is very helpful, but showing people things that update their implicit model of what AI is and what kinds of things it can do, can be helpful.
Does “emotionally impactful” here mean you’re seeking a subset of scary stories?
Not necessarily, but often.
Here’s an example of a demo (not research) that we’ve done in test briefings: We give the group a simple cipher problem, and have them think for a minute or two about how they would approach solving the problem (or in the longer version, give them ten minutes to actually try to solve it). Then we give the problem to DeepSeek r1[1], and have the participants watch the chain of thought as r1 iterates through hypotheses, and solves the problem (usually much faster than the participants did,).
Is this scary? Sometimes participants might be freaked out by it. But more often, their experience is something like astonishment or surprise. (Though to be clear, they’re also sometimes nonplussed. Not everyone finds this interesting or compelling.) This demonstration violates their assumptions about what AI is—often they insist that the models can’t be creative or can’t really think, but we stop getting that objection after we do this demo.[2] It hits a crux for them.
An earlier version of this demo involved giving the the AI a graduate level math problem, and watching it find a solution. This was was much less emotionally impactful, because people couldn’t understand the problem, or understand what the AI was thinking in the chain of thought. It was just a math thing that was over their heads. It just felt like “the computer is good at doing computer stuff, whatever.” It didn’t hit an implicit crux.
We want to find examples that are more like cipher-problem-based demo and less like the math-problem-based demo.
Notably the example above is about a demonstration of some capabilities that are well understood by people who are following the AI field. But we’re often aiming for a similar target when doing research.
Redwood’s alignment faking work, Apollo’s chain of thought results, and Palisade’s own shutdown resistance work are all solid examples of research that hit people’s cruxes in this way.
When people hear about those datapoints, they have reactions like “the AI did WHAT?” or “wow, that’s creepy.” That is, when non-experts are exposed to these examples, they revise their notion of what kind of thing AI is and what it can do.
These results vary in how surprising they were to experts who are closely following the field. Palisade is interested in doing research that improves the understanding of the most informed people trying to understand AI. But we’re also interested in producing results that make important points accessible and legible to non-experts, even if they’re are broadly predictable to the most informed experts.
I totally agree that the AI_experiments.meets_some_other_criteria() is probably a feature of your loop. But I don’t know if you meant to be saying that it’s an and or an or here.
If I’m understanding your question right, it’s an “and”.
Though having written the above, I think “emotionally impactful” is more of a proxy. The actual thing that I care about is “this is evidence that will update some audience’s implicit model about something important.” That does usually come along with an emotional reaction (eg surprise, or sometimes fear), but that’s a proxy.
Interestingly, I think many of the participants would still verbally endorse “AI can’t be creative” after we do this exercise. But they stop offering that as an objection, because it no longer feels relevant to them.
Yep, I think that’s about right.
I joined Palisade in November of 2024, so I don’t have that much context on stuff before then. Jeffrey can give a more informed picture here.
But my impression is that Palisade was started as a “scary demos” and “cyber evals” org but over time its focus has shifted and become more specific as we’ve engaged with the problem.
We’re still doing some things that are much more in the category of “scary demos” (we spent a bunch of staff time this year on a robotics project for FLI that will most likely not really demonstrate any of our core argument steps). But we’re moving away from that kind of work. I don’t think that just scaring people is very helpful, but showing people things that update their implicit model of what AI is and what kinds of things it can do, can be helpful.
Not necessarily, but often.
Here’s an example of a demo (not research) that we’ve done in test briefings: We give the group a simple cipher problem, and have them think for a minute or two about how they would approach solving the problem (or in the longer version, give them ten minutes to actually try to solve it). Then we give the problem to DeepSeek r1[1], and have the participants watch the chain of thought as r1 iterates through hypotheses, and solves the problem (usually much faster than the participants did,).
Is this scary? Sometimes participants might be freaked out by it. But more often, their experience is something like astonishment or surprise. (Though to be clear, they’re also sometimes nonplussed. Not everyone finds this interesting or compelling.) This demonstration violates their assumptions about what AI is—often they insist that the models can’t be creative or can’t really think, but we stop getting that objection after we do this demo.[2] It hits a crux for them.
An earlier version of this demo involved giving the the AI a graduate level math problem, and watching it find a solution. This was was much less emotionally impactful, because people couldn’t understand the problem, or understand what the AI was thinking in the chain of thought. It was just a math thing that was over their heads. It just felt like “the computer is good at doing computer stuff, whatever.” It didn’t hit an implicit crux.
We want to find examples that are more like cipher-problem-based demo and less like the math-problem-based demo.
Notably the example above is about a demonstration of some capabilities that are well understood by people who are following the AI field. But we’re often aiming for a similar target when doing research.
Redwood’s alignment faking work, Apollo’s chain of thought results, and Palisade’s own shutdown resistance work are all solid examples of research that hit people’s cruxes in this way.
When people hear about those datapoints, they have reactions like “the AI did WHAT?” or “wow, that’s creepy.” That is, when non-experts are exposed to these examples, they revise their notion of what kind of thing AI is and what it can do.
These results vary in how surprising they were to experts who are closely following the field. Palisade is interested in doing research that improves the understanding of the most informed people trying to understand AI. But we’re also interested in producing results that make important points accessible and legible to non-experts, even if they’re are broadly predictable to the most informed experts.
If I’m understanding your question right, it’s an “and”.
Though having written the above, I think “emotionally impactful” is more of a proxy. The actual thing that I care about is “this is evidence that will update some audience’s implicit model about something important.” That does usually come along with an emotional reaction (eg surprise, or sometimes fear), but that’s a proxy.
We could use any reasoning model that has a public chain of thought, at this point, but at the time we started doing this we were using r1.
Interestingly, I think many of the participants would still verbally endorse “AI can’t be creative” after we do this exercise. But they stop offering that as an objection, because it no longer feels relevant to them.