It’s fiction, I’m vaguely talking about myself as “you” here but I’m getting at some instinct here basically. Thanks for linking that, I hadn’t seen it and that’s kind of exactly what I was getting at.
ceselder
Possibly yes, but I don’t think that’s a legitimate safety concern since this can already be done very easily with other techniques. And for this technique you would need to model diff with a nonrefusal prompt of the bad concept in the first place, so the safety argument is moot. But sounds like an interesting research question
This makes sense honestly. I guess you would still run the risk of a non-vegan seeing you do these things and going “ha! hypocrite!” but I don’t know how real that risk is honestly.
Maybe a term like Extinction-(risk)-Level-Super-Intelligence or ELSI for short may be a more productive term to use than asi or agi
you are completely correct
I think this is true to an extent. But not fully.
I think its quite unlikely that funding certain kinds of essential AI safety research leads you to more profitable AI.Namely mechinterp, preventing stuff like scheming. Not all AI safety research is aimed at getting the user to follow a prompt, yet the research may be very important for stuff like existential risk.
The opportunity cost is funding research into how you can make your model more engaging, performant or cheaper. I would be suprised if these things aren’t way more effective for your dollar.
Yeah I can see that analogy, I just don’t think most non-rationalist types have realized this
Isn’t it very likely that AI safety research is one of the very first things to be cut if AI companies start to have less access to VC money? I don’t think the company has a huge incentive for AI safety training, particularly in a way that people allocating funding would understand. Isn’t this a huge problem? Maybe this has been adressed and I missed it.
This is very hard to answer. I just tried to write down basically everything. The noise kind of stopped after a while. it was a very strange sensation