Daniel Kokotajlo comments on Numberwang: LLMs Doing Autonomous Research, and a Call for Input

Daniel Kokotajlo 16 Jan 2025 20:20 UTC
LW: 21 AF: 11
13
AF
As a partial point of comparison, in Wason’s testing only about 20% of humans solved the problem tested, but Wason’s experiment differed in two important ways: first, subjects were deliberately given a misleading example, and second, only one task was tested (our easiest-rated task, ‘strictly increasing order’).
I encourage you to get some humans to take the same test you gave the models, so that we have a better human baseline. It matters a lot for what the takeaways should be, if LLMs are already comparable or better to humans at this task vs. still significantly worse.
- eggsyntax 16 Jan 2025 20:39 UTC
  6 points
  0
  Parent
  Agreed that it would be good to get a human baseline! It may need to be out of scope for now (I’m running this as an AI Safety Camp project with limited resources) but I’ll aim for it.
  - Legionnaire 17 Jan 2025 19:41 UTC
    3 points
    0
    Parent
    Would also love to take the tests. If possible you could grab human test subjects from certain areas: a less wrong group, a reddit group, etc.
  - Terence Coelho 16 Jan 2025 20:53 UTC
    3 points
    0
    Parent
    I volunteer myself as a test subject; dm if interested