Redwood Research and Constellation
Nate Thomas
Karma: 501
We’re Redwood Research, we do applied alignment research, AMA
Apply to the Constellation Visiting Researcher Program and Astra Fellowship, in Berkeley this Winter
Thanks, Neel! It should be fixed now.
To anyone reading this who wants to work on or discuss FHI-flavored work: Consider applying to Constellation’s programs (the deadline for some of them is today!), which include salaried positions for researchers.
Note that it’s unsurprising that a different model categorizes this correctly because the failure was generated from an attack on the particular model we were working with. The relevant question is “given a model, how easy is it to find a failure by attacking that model using our rewriting tools?”