As someone currently engaged in creating high-stakes sensitive evals.… I think the biggest issue is that the questions themselves necessarily contain too much dangerous information, even without any answers attached.
To get around this, you’d need to dilute the real questions with lots of less relevant questions, with no clear way to distinguish the degree of relevance of the questions.
And even then, there are many questions it still wouldn’t be safe to ask, amidst any plausible number of distractors.
My best thought for how to handle sensitive evals in a way that doesn’t require the eval-author and model-owner to trust each other or reveal anything private to each other is to have a third party org that both parties trust. The third party org would be filling a role similar to an escrow company. This AI-eval-escrow org would have a highly secure compute cluster, and some way of proving to the interacting parties that all their private info was deleted after the transaction. The evals group sends the escrow company their private evals, the AI group sends a copy of their model, the escrow company runs the eval and reports the result, then provably deletes all the private eval and model data.
Kaggle does this in an unsecure way for hosted ML competitions. Competitors submit a model, and receive back a score from the private eval.
That’s an interesting idea. In its simplest form, the escrow could have draconic NDAs with both parties, even if it doesn’t have the technology to prove deletion. In general, I’m excited about techniques that influence the type of relations that players can have with each other.
However, one logistical difficulty is getting a huge model from the developer’s (custom) infra on the hypothetical escrow infra… It’d be very attractive if the model could just stay with the dev somehow...
Yes, I’ve been imagining some sort of international inspector official, who arranges for this to be set up within the infra of the developer’s datacenter, and takes responsibility for ensuring the logs and evals get deleted afterwards.
As someone currently engaged in creating high-stakes sensitive evals.… I think the biggest issue is that the questions themselves necessarily contain too much dangerous information, even without any answers attached.
To get around this, you’d need to dilute the real questions with lots of less relevant questions, with no clear way to distinguish the degree of relevance of the questions.
And even then, there are many questions it still wouldn’t be safe to ask, amidst any plausible number of distractors.
My best thought for how to handle sensitive evals in a way that doesn’t require the eval-author and model-owner to trust each other or reveal anything private to each other is to have a third party org that both parties trust. The third party org would be filling a role similar to an escrow company. This AI-eval-escrow org would have a highly secure compute cluster, and some way of proving to the interacting parties that all their private info was deleted after the transaction. The evals group sends the escrow company their private evals, the AI group sends a copy of their model, the escrow company runs the eval and reports the result, then provably deletes all the private eval and model data. Kaggle does this in an unsecure way for hosted ML competitions. Competitors submit a model, and receive back a score from the private eval.
That’s an interesting idea. In its simplest form, the escrow could have draconic NDAs with both parties, even if it doesn’t have the technology to prove deletion. In general, I’m excited about techniques that influence the type of relations that players can have with each other.
However, one logistical difficulty is getting a huge model from the developer’s (custom) infra on the hypothetical escrow infra… It’d be very attractive if the model could just stay with the dev somehow...
Yes, I’ve been imagining some sort of international inspector official, who arranges for this to be set up within the infra of the developer’s datacenter, and takes responsibility for ensuring the logs and evals get deleted afterwards.