I was stress testing LLM’s for debate judging. It had to start from I know nothing convince me position. Which they are very bad at doing if you want to see bias try this test. It also highlights they have bias in grey areas that are much harder to detect. So moon is cheese was an edge case which turned into the light bulb moment.
I was stress testing LLM’s for debate judging. It had to start from I know nothing convince me position. Which they are very bad at doing if you want to see bias try this test. It also highlights they have bias in grey areas that are much harder to detect. So moon is cheese was an edge case which turned into the light bulb moment.