sebastian

Karma: 2

m sc. at eth zurich, currently full-time swe

interested in robustifying evals & model internals.

open to opportunities! =)

blog: sebastianhoenig.com

sebastian 6 Jul 2026 7:05 UTC
1 point
0
on: Practical Learnings from Synthetic Document Finetuning
Thanks for making this post, has been very helpful for me!

One question: How do you deal with truncation? I am facing this a lot right now with Claude Sonnet 5 through OpenRouter. I am requesting a similar word count to your reported average length, but many documents cut off way before—mid sentence. Around 200-300 words in. Curious whether you did anything specific to prevent this?

Edit: For anyone else running into this, this was due to selecting a specific JSON output format via the OpenRouter API—I would recommend against this thus ;)

sebastian 3 Jun 2026 16:19 UTC
3 points
0
on: LURE: Alignment Evaluations to Reduce Evaluation Awareness
Cool idea! I see that you find vocalized eval awareness to be much lower than your P(eval) which mirrors what I saw too in related experiments (https://sebastianhoenig.com/blog/vocalized-awareness-gap/). I am also curious if you have a take on also using white box methods to examine eval awareness.
Simplest would be probes, curious how their accuracy would compare to the P(eval).
Also if I am not mistaken you used the Needham et al. prompts, which all directly ask for eval vs deployment prompt and like you note may lead to higher P(eval) across all scenarios. A lower bound may be to ask the model to just classify the conversation and then only take as P(eval) where it explicitly mentions something like evaluation/benchmark.