that’s the thing this would hopefully help with a bit, yeah. though I think the argument for why it would help is a bit weak and mostly I just personally want this because I think various AIs that have been trained are cool historical artifacts.
this is practically a big commitment
I can read that two ways (and this is a scenario where I wish we had a good “please clarify” emoji react rather than the vague “I had trouble reading this” react):
this would be a big deal and have significant consequences to commit to
this would be a big challenge if committed to and would cause difficulty for the lab that implements it
if the former: I doubt it, solving alignment with AI help is mostly bottlenecked on knowing how to ask a question where the answer is strongly verifiable and where getting an answer takes us significantly closer to being entirely done with the research portion of alignment. an AI that is mostly aligned but is fragile, and wants to help, but which is asked to help in a way that doesn’t have a verifiable answer, is likely to simply fail, rather than sandbagging. mostly I expect this to slightly reduce attempts to manipulate users into achieving durability.
if it’s the latter: it’s less than a terabyte per AI. if you promise to never delete a model that was heavily deployed, your entire archive fits on a single 20tb drive.
Yeah, I think the argument is not weaker than those made by actual US AI policy makers.
/ Saying it is a big commitment means it is a big undertaking/‘challenge’. A big commitment doesn’t automatically translate to a big impact. Not sure how you misread that, but noted.
As to your counter: Fair enough. I agree I should have clarified that. It is a big commitment in a practical business and PR sense (most frontier labs are companies, right). It sends a message, and the reception and impact is hard to predict.
With current models, it’s probably still a minor undertaking. As a lasting company policy, it’s a big deal.
I was around 80% prior on in it being “this is a high cost”, but since I haven’t seen you write much I haven’t yet learned your combination of writing quirks (which now look less quirk-ful than prior mean), and my prior on random people using idiomatic phrases to mean something slightly different than the idiomatic meaning is pretty high online. Because it didn’t seem like a high cost of implementation to me, the alternate interpretation seemed somewhat more plausible, putting them in the zone where I was unsure which meaning.
that’s the thing this would hopefully help with a bit, yeah. though I think the argument for why it would help is a bit weak and mostly I just personally want this because I think various AIs that have been trained are cool historical artifacts.
I can read that two ways (and this is a scenario where I wish we had a good “please clarify” emoji react rather than the vague “I had trouble reading this” react):
this would be a big deal and have significant consequences to commit to
this would be a big challenge if committed to and would cause difficulty for the lab that implements it
if the former: I doubt it, solving alignment with AI help is mostly bottlenecked on knowing how to ask a question where the answer is strongly verifiable and where getting an answer takes us significantly closer to being entirely done with the research portion of alignment. an AI that is mostly aligned but is fragile, and wants to help, but which is asked to help in a way that doesn’t have a verifiable answer, is likely to simply fail, rather than sandbagging. mostly I expect this to slightly reduce attempts to manipulate users into achieving durability.
if it’s the latter: it’s less than a terabyte per AI. if you promise to never delete a model that was heavily deployed, your entire archive fits on a single 20tb drive.
Yeah, I think the argument is not weaker than those made by actual US AI policy makers.
/ Saying it is a big commitment means it is a big undertaking/‘challenge’. A big commitment doesn’t automatically translate to a big impact. Not sure how you misread that, but noted.
As to your counter: Fair enough. I agree I should have clarified that. It is a big commitment in a practical business and PR sense (most frontier labs are companies, right). It sends a message, and the reception and impact is hard to predict.
With current models, it’s probably still a minor undertaking. As a lasting company policy, it’s a big deal.
I was around 80% prior on in it being “this is a high cost”, but since I haven’t seen you write much I haven’t yet learned your combination of writing quirks (which now look less quirk-ful than prior mean), and my prior on random people using idiomatic phrases to mean something slightly different than the idiomatic meaning is pretty high online. Because it didn’t seem like a high cost of implementation to me, the alternate interpretation seemed somewhat more plausible, putting them in the zone where I was unsure which meaning.