TLW comments on Personal imitation software

TLW 8 Mar 2022 6:23 UTC
4 points
The program could identify where it has the lowest certainty of what the person would say or do, and directly ask the person to fill in those gaps.
...assuming the model’s certainty model is itself accurate^[1]. And that the resulting information is actually useful to the model.
(As an obvious example for the latter, me rolling a d20^[2] and saying the result will likely have low confidence, but isn’t particularly useful to the model...)
1. ^
  See also e.g. many adversarial attacks against computer vision systems, where the predictor predicts extremely confidently^[3] that the perturbed apple is actually an ostrich.
2. ^
  or e.g. loading up random.org, if you feel a d20 isn’t sufficiently random.
3. ^
  e.g. this classic attack https://openai.com/blog/multimodal-neurons/ where a 85.6% confidence of an apple being an apple turns into a 99.7% confidence that the apple with a handwritten label of ‘iPod’ is an ipod.