Beth Barnes

Safety researcher at OpenAI.

Another list of the­o­ries of im­pact for interpretability

Beth Barnes13 Apr 2022
Re­v­erse-en­g­ineer­ing us­ing interpretability

Beth Barnes29 Dec 2021
Risks from AI persuasion

Beth Barnes24 Dec 2021
Some thoughts on why ad­ver­sar­ial train­ing might be useful

Beth Barnes8 Dec 2021
