but we would love to work with someone with an ML background as part of future work involving a human prompt baseline.
My impression, as someone with such a background, is that prompt engineering isn’t as important as it used to be. In the days of GPT-3, we were searching a largely unrefined space of model outputs, and text that correlated with the kinds of writers that produced good work had a stellar impact on output quality. Nowadays, though, a tightly-optimized RLHF/RLVR training pipeline doesn’t leave nearly as much alpha on the table.
I’ll admit that this is primarily intuition and vibes, but if it’s sufficiently easy/cheap to do so, tossing a barebones human-written prompt out and seeing how a model does could be a fair test. I recall that I didn’t need anything particularly detailed when I had Claude implement a simple multithreaded PPO repo a few months back to see if it could.
My objection to this is that anything explicitly optimized to behave like a human will sound much more like a human than like a rock. It’s a metric that any kind of language model is inherently Goodharting—relying on it essentially amounts to willingly going along with the ELIZA effect.
I still object to this term. Any self-regulating process is introspectively aware, to the extent that studies on LLMs show them to be such. PID controllers are, broadly, ‘introspectively aware’, they model their next actions as a function of what they’re doing.