By “code generating being automated,” I mean that humans will program using natural human language, without having to think about the particulars of data structures and algorithms (or syntax). A good enough LLM can handle all of that stuff itself, although it might ask the human to verify if the resulting program functions as expected.
Maybe the models will be trained to look for edge cases that technically do what the humans asked for but seem to violate the overall intent of the program. In other words, situations where the program follows the letter of the law (i.e., the program specifications) but not the spirit of the law.
Come to think of it, if you could get a LLM to look for such edge cases robustly, it might be able to help RL systems avoid Goodharting, steering the agent to follow the intuitive intent behind a given utility function.
By “code generating being automated,” I mean that humans will program using natural human language, without having to think about the particulars of data structures and algorithms (or syntax). A good enough LLM can handle all of that stuff itself, although it might ask the human to verify if the resulting program functions as expected.
Maybe the models will be trained to look for edge cases that technically do what the humans asked for but seem to violate the overall intent of the program. In other words, situations where the program follows the letter of the law (i.e., the program specifications) but not the spirit of the law.
Come to think of it, if you could get a LLM to look for such edge cases robustly, it might be able to help RL systems avoid Goodharting, steering the agent to follow the intuitive intent behind a given utility function.