Beth Barnes

Karma: 1,295

Safety researcher at OpenAI. Views are my own and not those of my employer.

Another list of the­o­ries of im­pact for interpretability

Beth Barnes13 Apr 2022 13:29 UTC
27 points
1 comment5 min readLW link

Re­v­erse-en­g­ineer­ing us­ing interpretability

Beth Barnes29 Dec 2021 23:21 UTC
20 points
0 comments5 min readLW link

Risks from AI persuasion

Beth Barnes24 Dec 2021 1:48 UTC
67 points
15 comments31 min readLW link

Some thoughts on why ad­ver­sar­ial train­ing might be useful

Beth Barnes8 Dec 2021 1:28 UTC
9 points
5 comments3 min readLW link