Suho Lee

Karma: 4

Suho Lee 15 Jun 2026 17:26 UTC
2 points
0
on: How does congressmember use AI?
More experienced, long-term politicians tend to have a dedicated writing team, who may be less open to use AI in writing speeches. It would be interesting if we divide politicians by the level of experience and see how these statistics appear.

Suho Lee 27 May 2026 7:19 UTC
4 points
0
on: When does debate help a weak judge? Evidence from code and logic
Interesting work.
One possible direction that would be interesting to explore: all your pairings are same-family. Same-family models likely share some core reasoning, thus
1) debate transcript from the stronger models might help the weaker judge from the same-family more than the one from different families as same-family models might understand each other better,
2) but they might also share failure modes, meaning a same-family critic might be systematically blind to the same errors as the weaker judge.
A cross-family testing might surface qualitatively different objections, potentially widening or narrowing the classifier gap.

Suho Lee 26 May 2026 6:58 UTC
1 point
0
on: The Case for Evaluating Model Behaviors
Can’t agree more.
One underappreciated reason behavior evals matter is for ‘questions with no ground truth’, they may be the only coherent approach.
Most evals have (or assume) a correct answer to measure against. But some of the most important things that we want to models to work on and get right have no ground truth (e.g. frontier scientific research, socioeconomic decisions...). How they handle contesting views and frameworks? How they represent uncertainty without over claiming? We definitely need these types of evaluations and focus on a ‘response shape’ for steering models to the right behavior.
And getting this right is not just safety and alignment perperty. A model that can handle open questions well is also a more capable model in the domains where it matters most.