Raemon comments on What are the most interesting / challenging evals (for humans) available?

Raemon 27 Dec 2024 17:48 UTC
4 points
1
Clarification (I’ll add this to the OP):
The ideal that I’m looking for are things that will take a smart researcher (like 95th percentile alignment researcher, i.e. there are somewhere between 10-30 people who might count) at least 30 minutes to solve the problem, and most alignment researchers maybe would have a 50% change of figuring it out in 1-3 hours.
The ideal is that people have to:
a) go through a period of planning, and replanning
b) spend at least some time feeling like the problem is totally opaque and they don’t have traction.
c) have to reach for tools that they don’t normally reach for.
It may be that we just don’t have evals at this level yet, and I might take what I can get, but, it’s what I’m aiming for.
I’m not trying to make an IQ test – my sense from the literature is that you basically can’t raise IQ through training. So many people have tried. This is very weird to me – subjectively it is just really obvious to me that I’m flexibly smarter in many ways than I was in 2011 when I started the rationality project, and this is due to me having a lot of habits I didn’t used to have. The hypotheses I currently have are:
- You just have to be really motivated to do transfer learning, and a genuinely inspiring / good teacher, and it’s just really hard to replicate this sort of training scientifically
- IQ is mostly measuring “fast intelligence”, because that’s what cost-effective to measure in large enough quantities to get a robust sample. i.e. it measures whether you can solve questions in like a few minutes which mostly depends on you being able to intuitively get it. It doesn’t measure your ability to figure out how to figure something out that requires longterm planning, which would allow a lot of planning skills to actually come into play.
Both seem probably at least somewhat true, but the latter one feels like a clearer story for why there would be potential (at least theoretically) in the space I’m exploring – IQ test take a few hours to take. It would be extremely expensive to do the theoretical statistically valid version of the thing I’m aiming at.
My explicit goal here is to train researchers who are capable of doing the kind of work necessary in worlds where Yudkowsky is right about the depth/breadth of alignment difficulty.

Raemon comments on What are the most interesting /​ challenging evals (for humans) available?

Raemon comments on What are the most interesting / challenging evals (for humans) available?