Kaj_Sotala comments on nostalgebraist’s Shortform

Kaj_Sotala 1 Jun 2025 12:04 UTC
6 points
0
If you want to get the “unbiased” opinion of a model on some topic, you have to actually mechanistically model the perspective of a person who is indifferent on this topic, and write from within that perspective^[1]. Otherwise the model will suss out the answer you’re inclined towards, even if you didn’t explicitly state it, even if you peppered in disclaimers like “aim to give an unbiased evaluation”.
Is this assuming a multi-response conversation? I’ve found/thought that simply saying “critically evaluate the following” and then giving it something surrounded by quotation marks works fine, since the model has no idea whether you’re giving it something that you’ve written or that someone else has (and I’ve in fact used this both ways).
Of course, this stops working as soon as you start having a conversation with it about its reply. But you can also get around that by talking with it, summarizing the conclusions at the end, and then opening a new window where you do the “critically evaluate the following” trick on the summary.
- Thane Ruthenis 1 Jun 2025 12:23 UTC
  4 points
  0
  Parent
  But you can also get around that by talking with it, summarizing the conclusions at the end
  Yeah, but that’s more work.