Just a silly idea: If many people start using LLMs, and as a result of that learn to better translate their intuitions into explicit descriptions… perhaps this could help us solve alignment.
I mean, a problem with alignment is that we have some ideas of good, but can’t make them explicit. But maybe the reason is that in the past, we had no incentive to become good at expressing our ideas explicitly… but instead we had an incentive to bullshit. However, when everyone will use LLMs to do things, that will create an incentive to be good at expressing your ideas, so that the LLM can implement them more properly.
Just a silly idea: If many people start using LLMs, and as a result of that learn to better translate their intuitions into explicit descriptions… perhaps this could help us solve alignment.
I mean, a problem with alignment is that we have some ideas of good, but can’t make them explicit. But maybe the reason is that in the past, we had no incentive to become good at expressing our ideas explicitly… but instead we had an incentive to bullshit. However, when everyone will use LLMs to do things, that will create an incentive to be good at expressing your ideas, so that the LLM can implement them more properly.