Justin Olive

Karma: 48

Extending Inspect Framework: Integrating Weights & Biases

Qi Guo, Matan Shtepel, Daniel Polatajko and Justin Olive

20 Sep 2025 1:10 UTC

2 points

0 comments3 min readLW link

[Question] Feedback request: `eval-crypt` a simple utility to mitigate eval contamination.

Matan Shtepel, Justin Olive and Daniel Polatajko

2 Aug 2025 17:04 UTC

9 points

4 comments2 min readLW link

Justin Olive 22 Jun 2025 9:54 UTC
2 points
0
on: Incorrect Baseline Evaluations Call into Question Recent LLM-RL Claims
FYI: this post was included in the Last Week in AI Podcast—nice work!
My team at Arcadia Impact is currently working on standardising evaluations
I notice that you cite instance-level results (2023 Burnell et al.) as being a recommendation. Is there anything else that you think the open-source ecosystem should be doing?

For example, we’re thinking about:
- Standardised implementations of evaluations (i.e. Inspect Evals)
- Aggregating and reporting results for evaluations (see beta release of the Inspect Evals Dashboard)
- Tools for automated prompt variation and elicitation*
*This statement especially had me thinking about this: “These include using correct formats and better ways to parse answers from responses, using recommended sampling temperatures, using the same max_output_tokens, and using few-shot prompting to improve format-following.”

LASR Labs Spring 2025 applications are open!

Erin Robertson, charlie_griffin, joehardie and Justin Olive

4 Oct 2024 13:44 UTC

38 points

0 comments4 min readLW link

Justin Olive 6 Mar 2023 0:30 UTC
1 point
0
in reply to: Dave Orr’s comment on: A concerning observation from media coverage of AI industry dynamics
Hi Dave, thanks for the great input from the insider perspective.
Do you have any thoughts on whether risk-aversion (either product-related or misalignment-risk) might be contributing to a migration of talent towards lower-governance zones?
If so, are there any effective ways to combat this that don’t translate to accepting higher levels of risk?

A concerning observation from media coverage of AI industry dynamics

Justin Olive5 Mar 2023 21:38 UTC

8 points

3 comments3 min readLW link