Is this Claude-specific? Did you fingerprint the model, or just the provider? Were any noticeably better or worse (in terms of clarity and utility to what the human wanted you to understand)?
If you’re allowing/recommending LLMs for submissions, you would do very well to publish some markdown instructions to the LLM in terms of how you want information presented and prioritized. Unless you’re grading on that as well, in which case, tell the humans that it’s important.
The other advice for consuming LLM content is to use an LLM to reformat/reframe for your needs. You should be able to, with a small amount of investment in thinking about the purpose and developing a few markdown instructions, just normalize all applications to your preferred style.
For a lot of written communication, there’s never been that much connection between style and content, though when it was all wetware this was unclear, as IQ shone through both, causing some correlation. Now that correlation is thoroughly broken, and you need to focus on content.
I am not sure if it’s Claude-specific, probably not but I rarely use other models so I wouldn’t know yet. Also the vast majority of candidates used Claude to polish their writing.
I think it was a mistake to allow LLMs for writing. The thinking was that it is hard to prevent this so we should rather ask people to disclose how they used LLMs. If I would write instructions for that stage again I would include something like “don’t use LLMs for writing, it dillutes your original thoughts with slop language and it also doesn’t give you any benefit because your responses will look like the responses of 50 other candidates”
Regarding LLMs for rephrasing into my preferred style, that is an interesting idea that I haven’t tried. I would be somewhat concerned that I now read the candidates prompts filtered through two LLMs instead of just one, which again removes a little bit of signal, but potentially this would not result in meaningful changes to the final scores and make it quicker.
Is this Claude-specific? Did you fingerprint the model, or just the provider? Were any noticeably better or worse (in terms of clarity and utility to what the human wanted you to understand)?
If you’re allowing/recommending LLMs for submissions, you would do very well to publish some markdown instructions to the LLM in terms of how you want information presented and prioritized. Unless you’re grading on that as well, in which case, tell the humans that it’s important.
The other advice for consuming LLM content is to use an LLM to reformat/reframe for your needs. You should be able to, with a small amount of investment in thinking about the purpose and developing a few markdown instructions, just normalize all applications to your preferred style.
For a lot of written communication, there’s never been that much connection between style and content, though when it was all wetware this was unclear, as IQ shone through both, causing some correlation. Now that correlation is thoroughly broken, and you need to focus on content.
I am not sure if it’s Claude-specific, probably not but I rarely use other models so I wouldn’t know yet. Also the vast majority of candidates used Claude to polish their writing.
I think it was a mistake to allow LLMs for writing. The thinking was that it is hard to prevent this so we should rather ask people to disclose how they used LLMs. If I would write instructions for that stage again I would include something like “don’t use LLMs for writing, it dillutes your original thoughts with slop language and it also doesn’t give you any benefit because your responses will look like the responses of 50 other candidates”
Regarding LLMs for rephrasing into my preferred style, that is an interesting idea that I haven’t tried. I would be somewhat concerned that I now read the candidates prompts filtered through two LLMs instead of just one, which again removes a little bit of signal, but potentially this would not result in meaningful changes to the final scores and make it quicker.