Charlie Steiner comments on Thoughts on the Alignment Implications of Scaling Language Models

Charlie Steiner 3 Jun 2021 13:43 UTC
LW: 8 AF: 4
0
AF
Great post! I very much hope we can do some clever things with value learning that let us get around needing AbD to do the things that currently seem to need it.

The fundamental example of this is probably optimizability—is your language model so safe that you can query it as part of an optimization process (e.g. making decisions about what actions are good), without just ending up in the equivalent of deepDream’s pictures of Maximum Dog.