I’m not particularly surprised that Chain-of-Thought’s faithfulness is very hit-or-miss. The point of CoT it seems to me is to allow the LLM to have more “memory” to store multi-step reasoning, but that still doesn’t remove the fact that when the final answer is a “yes” or “no” it’ll also include an element of snap decision right as it predicts that last token.
Which actually makes me curious about this element, for example if the model has reached its final conclusion and has written “we have decided that the candidate does”, what is the probability that the next word will be “not” for each of these scenarios? How significantly does it vary given different contexts?
Finally, the real-world relevance of this problem is clear. 82% of companies are already using LLMs for resume screening and there are existing regulations tied to bias in automated hiring processes.
To be fair, I think they should just be banned from having no-human-in-the-loop screenings, full stop. Not to mention how idiotic it is to let an LLM do your job for you to save a few hours of reading over a decision that can be worth hundreds of thousands or millions of dollars to your company.
To be fair, I think they should just be banned from having no-human-in-the-loop screenings
In principle I agree. In practice I’m less sure.
save a few hours of reading
Consider that the average job opening these days receives hundreds of applications. For some people that’s a few hours of reading, but it’s reading hundreds of near-identical-looking submissions and analyzing for subtle distinctions.
I do think automated screenings rule out a lot of very good options because they don’t look like the system thinks they should, in ways a thoughtful human would catch. Unless a company specifically disallows doing so, one way around it is still to find a way to get directly introduced to the hiring manager.
The entire market is quite fucked right now. But the thing is, if you have more and more applicants writing their applications with AI, and more and more companies evaluating them with AI, we get completely away from any kind of actual evaluation of relevant skills, and it becomes entirely a self-referential game with its own independent rules. To be sure this is generally a problem in these things, and the attempt to fix it by bloating the process even more is always doomed to failure, but AI risks putting it on turbo.
Of course it’s hard to make sure the applicants don’t use AI, so if only the employer is regulated that creates an asymmetry. I’m not sure how to address that. Maybe we should just start having employment speed-dating sessions where you get a bunch of ten minutes in-person interviews with prospective employers looking for people and then you get paired up at the end for a proper interview. At least it’s fast, efficient, and no AI bullshit is involved. And even ten minutes of in person talking can probably tell more than a hundred CVs/cover letters full of the same nonsense.
On the second paragraph—I could see this being an interesting approach, if you can get a good critical mass of employers with sufficiently similar needs and a willingness to try it. Certainly much better than some others I’ve seen, like (real example from 2009) “Let’s bring in 50 candidates at once on a Saturday for the whole day, and interview them all in parallel for 2 positions.”
I think one, slightly deeper problem is—who is doing the short interviews or the screenings? Do they actually know what the job entails and what would make someone a good fit, such that they can learn about it quickly from a rescue or a short interview? Or are they working off a script from a hiring manager who tried their best but can’t easily encapsulate what they’re looking for in a job description or list of keywords?
I think one, slightly deeper problem is—who is doing the short interviews or the screenings? Do they actually know what the job entails and what would make someone a good fit, such that they can learn about it quickly from a rescue or a short interview? Or are they working off a script from a hiring manager who tried their best but can’t easily encapsulate what they’re looking for in a job description or list of keywords?
Classic problem, but I see a lot of that happening already. Less of a problem for non-specialized jobs, but for tech jobs (like what I’m familiar with), it would have to be another tech person, yeah. Honestly for the vast majority of jobs anything other than the technical interview (like the pre-screening by a HR guy who doesn’t know the difference between SQL and C++, or the “culture fit” that is either just validation of some exec’s prejudices or an exercise in cold reading and bullshitting on the fly for the candidate) is probably useless fluff. So basically that’s a “companies need to actually recognise who is capable of identifying a good candidate quickly and accept that getting them to do that is a valuable use of their time” problem, which exists already regardless of the screening methodologies adopted.
I’m not particularly surprised that Chain-of-Thought’s faithfulness is very hit-or-miss. The point of CoT it seems to me is to allow the LLM to have more “memory” to store multi-step reasoning, but that still doesn’t remove the fact that when the final answer is a “yes” or “no” it’ll also include an element of snap decision right as it predicts that last token.
Which actually makes me curious about this element, for example if the model has reached its final conclusion and has written “we have decided that the candidate does”, what is the probability that the next word will be “not” for each of these scenarios? How significantly does it vary given different contexts?
To be fair, I think they should just be banned from having no-human-in-the-loop screenings, full stop. Not to mention how idiotic it is to let an LLM do your job for you to save a few hours of reading over a decision that can be worth hundreds of thousands or millions of dollars to your company.
In principle I agree. In practice I’m less sure.
Consider that the average job opening these days receives hundreds of applications. For some people that’s a few hours of reading, but it’s reading hundreds of near-identical-looking submissions and analyzing for subtle distinctions.
I do think automated screenings rule out a lot of very good options because they don’t look like the system thinks they should, in ways a thoughtful human would catch. Unless a company specifically disallows doing so, one way around it is still to find a way to get directly introduced to the hiring manager.
The entire market is quite fucked right now. But the thing is, if you have more and more applicants writing their applications with AI, and more and more companies evaluating them with AI, we get completely away from any kind of actual evaluation of relevant skills, and it becomes entirely a self-referential game with its own independent rules. To be sure this is generally a problem in these things, and the attempt to fix it by bloating the process even more is always doomed to failure, but AI risks putting it on turbo.
Of course it’s hard to make sure the applicants don’t use AI, so if only the employer is regulated that creates an asymmetry. I’m not sure how to address that. Maybe we should just start having employment speed-dating sessions where you get a bunch of ten minutes in-person interviews with prospective employers looking for people and then you get paired up at the end for a proper interview. At least it’s fast, efficient, and no AI bullshit is involved. And even ten minutes of in person talking can probably tell more than a hundred CVs/cover letters full of the same nonsense.
On that first paragraph, we agree.
On the second paragraph—I could see this being an interesting approach, if you can get a good critical mass of employers with sufficiently similar needs and a willingness to try it. Certainly much better than some others I’ve seen, like (real example from 2009) “Let’s bring in 50 candidates at once on a Saturday for the whole day, and interview them all in parallel for 2 positions.”
I think one, slightly deeper problem is—who is doing the short interviews or the screenings? Do they actually know what the job entails and what would make someone a good fit, such that they can learn about it quickly from a rescue or a short interview? Or are they working off a script from a hiring manager who tried their best but can’t easily encapsulate what they’re looking for in a job description or list of keywords?
Classic problem, but I see a lot of that happening already. Less of a problem for non-specialized jobs, but for tech jobs (like what I’m familiar with), it would have to be another tech person, yeah. Honestly for the vast majority of jobs anything other than the technical interview (like the pre-screening by a HR guy who doesn’t know the difference between SQL and C++, or the “culture fit” that is either just validation of some exec’s prejudices or an exercise in cold reading and bullshitting on the fly for the candidate) is probably useless fluff. So basically that’s a “companies need to actually recognise who is capable of identifying a good candidate quickly and accept that getting them to do that is a valuable use of their time” problem, which exists already regardless of the screening methodologies adopted.