Is one takeaway of your post that we should consider current safety research as more about training human researchers than about the actual knowledge obtained from the research?
I think there’s a lot of truth to this—modern LLMs are kind of competence multiplier, where some competence values are negative (perhaps a competence exponentiator?).
I find that I can extract value from LLMs only if I’m asking about something that I almost already know. That way I can judge whether an answer is getting at the wrong thing, assess the relevance of citations, and verify a correct answer rapidly and highly robustly if it is offered (which is important because typically a series of convincing non-answers or wrong answers comes first).
Though LLMs seem to be getting more useful in the best case, they also seem to be getting more dangerous in the worst case, so I am not sure whether this dynamic will soften or sharpen over time.
Is one takeaway of your post that we should consider current safety research as more about training human researchers than about the actual knowledge obtained from the research?
I didn’t intend it that way, though admittedly that is a valid reading. From my own point of view both functions seem significant.
I think there’s a lot of truth to this—modern LLMs are kind of competence multiplier, where some competence values are negative (perhaps a competence exponentiator?).
I find that I can extract value from LLMs only if I’m asking about something that I almost already know. That way I can judge whether an answer is getting at the wrong thing, assess the relevance of citations, and verify a correct answer rapidly and highly robustly if it is offered (which is important because typically a series of convincing non-answers or wrong answers comes first).
Though LLMs seem to be getting more useful in the best case, they also seem to be getting more dangerous in the worst case, so I am not sure whether this dynamic will soften or sharpen over time.