Max H comments on Ban development of unpredictable powerful models?

Max H 20 Jun 2023 3:07 UTC
LW: 4 AF: 2
0
AF
Nit: don’t you also need to require that the predicted (and actual) outputs are (apparently, at least) safe? Interpreted literally as written, developers would be allowed to deploy a model if they can reliably predict that it will cause harm.
- TurnTrout 20 Jun 2023 5:57 UTC
  6 points
  2
  Parent
  Interpreted literally as written, developers would be allowed to deploy a model if they can reliably predict that it will cause harm.
  That’s right. This test isn’t meant to cover “and it’s safe”, it’s meant to cover “and it’s predictable.” I meant to cover this in:
  I imagine applying this in conjuction with other evals, licensing, compute caps, and other controls.
  - Max H 20 Jun 2023 6:16 UTC
    2 points
    2
    Parent
    Ah, I see. I still think it’s worth thinking about how developers might try to pass this test (and others) adversarially, in order to deploy a model that they themselves are confident is harmful in at least some ways.
    
    I think such intentional deployments are much less likely to be totally catastrophic compared to accident risk, but still risky, and not totally implausible (they don’t necessarily require extreme malice or misanthropy on the part of developers, just a misguided or negligent view of safety concerns, perhaps due to motivated reasoning due to career or profit potential.)