I am a researcher in AI Interpretability at the Zuse Institute in Berlin.
Mostly interested in building theoretical foundation for Interpretability that works even if the agents have an incentive to hide their interpretations.
Current theme: default
Less Wrong (text)
Less Wrong (link)
Tiberius
I am a researcher in AI Interpretability at the Zuse Institute in Berlin.
Mostly interested in building theoretical foundation for Interpretability that works even if the agents have an incentive to hide their interpretations.