My name is Robert Adragna, and I’ve been working with Dovetail this winter fellowship cohort on Agent Foundations. Specifically, I’ve been trying to better understand what background assumptions the Natural Abstractions Hypothesis (NAH) makes about the world, and whether they might be learned in existing LLM systems. Specific questions that I’m exploring include:
Is the Platonic Representation Hypothesis from deep learning evidence for the Natural Abstractions Hypothesis?
Is it possible to construct a dataset which represents the world in a completely unbiased way?
How can Natural Abstractions be both universal & observer/goal dependant?
What would it take to empirically test the NAH?
I’ve been lurking on LessWrong since 2024, when I got interested in AI Safety, and am very excited to spend more time engaging with the community.
Hi everyone!
My name is Robert Adragna, and I’ve been working with Dovetail this winter fellowship cohort on Agent Foundations. Specifically, I’ve been trying to better understand what background assumptions the Natural Abstractions Hypothesis (NAH) makes about the world, and whether they might be learned in existing LLM systems. Specific questions that I’m exploring include:
Is the Platonic Representation Hypothesis from deep learning evidence for the Natural Abstractions Hypothesis?
Is it possible to construct a dataset which represents the world in a completely unbiased way?
How can Natural Abstractions be both universal & observer/goal dependant?
What would it take to empirically test the NAH?
I’ve been lurking on LessWrong since 2024, when I got interested in AI Safety, and am very excited to spend more time engaging with the community.