This did not help me understand anything.
Helped me.
Huh. Interested in either shminux or janus spelling this out more for me.
I guess what I was trying to illustrate that if you train an LLM with RLHF, the analogy is squeezing the directionless network along a specific axis, but then you get both the friendly face and the evil face, two sides of the same coin squeeze.
This did not help me understand anything.
Helped me.
Huh. Interested in either shminux or janus spelling this out more for me.
I guess what I was trying to illustrate that if you train an LLM with RLHF, the analogy is squeezing the directionless network along a specific axis, but then you get both the friendly face and the evil face, two sides of the same
coinsqueeze.