I presume solutions do exist that aren’t prohibitively expensive, but someone has to figure out what they are and the clock is ticking.
I would argue that someone has: see my link-post The Best Way to Align an LLM: Is Inner Alignment Now a Solved Problem? for links to the seminal papers on it. The short version is: stop using Reinforcement Learning, just use SGD.
I would argue that someone has: see my link-post The Best Way to Align an LLM: Is Inner Alignment Now a Solved Problem? for links to the seminal papers on it. The short version is: stop using Reinforcement Learning, just use SGD.