Mark Xu comments on Ban development of unpredictable powerful models?

Mark Xu 11 Jul 2023 5:22 UTC
LW: 7 AF: 4
0
AF
Here are some things I think you can do:
- Train a model to be really dumb unless I prepend a random secret string. The goverment doesn’t have this string, so I’ll be able to predict my model and pass their eval. Some precedent in: https://en.wikipedia.org/wiki/Volkswagen_emissions_scandal
- I can predict a single matrix multiply just by memorizing the weights, and I can predict ReLU, and I’m allowed to use helper AIs.
- I just train really really hard on imitating 1 particular individual, then have them just say whatever first comes to mind.
- TurnTrout 18 Jul 2023 23:48 UTC
  LW: 2 AF: 2
  0
  AF Parent
  Thanks for the comment! Quick reacts: I’m concerned about the first bullet, not about 2, and bullet 3 seems to ignore top- $k$ probability prediction requirements (the requirement isn’t to just ID the most probable next token). Maybe there’s a recovery of bullet 3 somehow, though?