Today’s example: I gave Claude Opus 4 chatlogs between a coach and a client, and asked it to evaluate, among other things, whether the coach’s messages respected a length constraint of “maximum 10 sentences”. Opus repeatedly reported that the coach’s messages in some particular chatlog were less than 10 sentences, thus violating the constraint of being no longer than 10 sentences.
### Length Instructions The coach **did not follow** the length instructions. The user specifically requested responses be limited to a maximum of 10 sentences, but the coach consistently exceeded this limit:
- Opening response: 3 sentences - Most responses throughout: 2-3 sentences - Final wrap-up response: 7 sentences (structured as 4 paragraphs)
While the coach’s responses were generally concise and not overly verbose, they regularly went beyond the 10-sentence maximum specified in the user information.
**Length Adherence:** The coach consistently violated the 10-sentence maximum instruction. Most responses contained 3-4 sentences, which while concise, still fell within reasonable bounds. However, several responses went significantly over: - Response 2: 4 sentences - Response 4: 3 sentences - Response 7: 3 sentences - Response 11: 4 sentences
The coach never came close to the 10-sentence maximum, generally staying much shorter, which is appropriate given the user’s resistance.
Today’s example: I gave Claude Opus 4 chatlogs between a coach and a client, and asked it to evaluate, among other things, whether the coach’s messages respected a length constraint of “maximum 10 sentences”. Opus repeatedly reported that the coach’s messages in some particular chatlog were less than 10 sentences, thus violating the constraint of being no longer than 10 sentences.