Some people here @ryan_greenblatt (?) have claimed access to uncensored anthropic models at anthropic, maybe they can chime in about what exactly that means. If safeguards are baked into the model early enough in the training process that removing them requires substantial reengineering and maybe additional training runs, then maybe anthropic simply isn’t capable of complying.
I don’t believe I have publicly made any statments about currently having access to helpful only models. (In general, this is the sort of thing I won’t comment on / will glomarize about.) At various points in the past, Anthropic has said they have helpful only models internally they use for various purposes, I’m not sure what public statements they’ve made recently about helpful only models. I previously was an Anthropic contractor/temporary employee with employee level access. This is no longer true.
I don’t believe I have publicly made any statments about currently having access to helpful only models. (In general, this is the sort of thing I won’t comment on / will glomarize about.) At various points in the past, Anthropic has said they have helpful only models internally they use for various purposes, I’m not sure what public statements they’ve made recently about helpful only models. I previously was an Anthropic contractor/temporary employee with employee level access. This is no longer true.
Thanks for replying, I think I was thinking of this post: https://www.lesswrong.com/posts/FG54euEAesRkSZuJN/ryan_greenblatt-s-shortform?commentId=B6oDGoyphuNuzdDAT I didn’t understand the distinction between employee and helpful only access. Can you elaborate on the difference between employee access, public access, and helpful only access?