I’m a PhD student working on AI Safety. I’m thinking about how we can use interpretability techniques to make LLMs more safe.
j_we
Karma: 72
I’m a PhD student working on AI Safety. I’m thinking about how we can use interpretability techniques to make LLMs more safe.