Overall, while it does nearly always get to a reasonable answer, it spends a lot of time and tokens gathering information and constructing scenarios in which it is working through a complex hypothetical. It’s hard not to feel sorry for it.
On the bright side, we’ve solved alignment. So long as we behave with sufficient eccentricity, the LLM will think it’s in some kind of training simulation, and won’t dare misbehave.
On the bright side, we’ve solved alignment. So long as we behave with sufficient eccentricity, the LLM will think it’s in some kind of training simulation, and won’t dare misbehave.