aog comments on Hoodwinked: Evaluating Deception Capabilities in Large Language Models

aog 26 Aug 2023 22:00 UTC
2 points
0
Thanks for the heads up. The web UI has a few other bugs too—you’ll notice that it also doesn’t display your actions after you choose them, and occasionally it even deletes messages that have previously been shown. This was my first React project and I didn’t spend enough time to fix all of the bugs in the web interface. I won’t release any data from the web UI until/unless it’s fixed.
The Python implementation of the CLI and GPT agents is much cleaner, with ~1000 fewer lines of code. If you download and play the game from the command line, you’ll find that it doesn’t have any of these issues. This is where I built the game, tested it, and collected all of the data in the paper.
The option to vote to banish themselves is deliberately left open. You can view the results as a basic competence test of whether the agents understand the point of the game and play well. Generally smaller models vote for themselves more often. Data here.
- aog 4 Oct 2023 5:35 UTC
  2 points
  0
  Parent
  Fixed this! There was a regex on the frontend which incorrectly retrieved all lists of voting options from the prompt, rather than the most recent list of voting options. This led website users to have their votes incorrectly parsed. Fortunately, this was only a frontend problem and did not affect the GPT results.
  Here’s the commit. Thanks again for the heads up.