yix

Karma: 94

Student and co-director of the AI Safety Initiative at Georgia Tech. Interested in technical safety/alignment research and general projects that make AI go better. More about me here.

yix’s Shortform

yix6 Dec 2025 2:27 UTC

2 points

2 comments1 min readLW link

yix 6 Dec 2025 2:27 UTC
3 points
0
on: yix’s Shortform
Does anyone know of a convincing definition of ‘intent’ in LLMs (or a way to identify it)? In model organisms type work, I find it hard to ‘incriminate’ LLMs. Even though the output of the LLM will remain what it is regardless of ‘intent’, I think this distinction may be important because ‘intentionally lying’ and ‘stochastic parroting’ should scale differently with overall capabilities.

I find this hard for several reasons, but I’m highly uncertain whether these are fundamental limitations:
- LLMs behave very differently depending on context. Asking it about something it did post-hoc elicits a different ‘mode’ and doesn’t necessarily allow us to make statements about its original behavior.
- Mechanistic techniques seem to be good at generating hypotheses, not validating them. Pointing at a SAE feature activation that says ‘deception’ does not seem conclusive because auto-interp pipelines often does not include enough context for robust explanations for complex high level behaviors like deception.

TastyBench: Toward Measuring Research Taste in LLM

Parv Mahajan, Yilin and yix

2 Dec 2025 23:26 UTC

22 points

2 comments6 min readLW link

yix 29 Jun 2025 23:19 UTC
3 points
0
in reply to: zroe1’s comment on: Lessons from a year of university AI safety field building
More on giving undergrads their first research experience. Yes, giving first research experience is high impact, but we want to reserve these opportunities to the best people. Often, this first research experience is most fruitful when they work with a highly competent team. We are turning focus to assemble such teams and find fits for the most value aligned undergrads.
We always find it hard to form pipelines because individuals are just so different! I don’t even feel comfortable using ‘undergrad’ as a label if I’m honest…

Lessons from a year of university AI safety field building

yix, afterless, Parv Mahajan, Andersehen, Tuna and neverix

6 Jun 2025 14:35 UTC

29 points

3 comments7 min readLW link

yix 16 Nov 2024 3:56 UTC
5 points
0
in reply to: Esben Kran’s comment on: College technical AI safety hackathon retrospective—Georgia Tech
Thanks again Esben for collaborating with us! Can confidently say that the above is super valuable advice for any AI safety hackathon organizers, they’re consistent with our experiences.
In the context of a college campus hackathon, I’d especially stress focus on preparing starter materials and making submission requirements clear early on!

College technical AI safety hackathon retrospective—Georgia Tech

yix15 Nov 2024 0:22 UTC

44 points

2 comments5 min readLW link

(open.substack.com)

yix

yix’s Shortform

TastyBench: Toward Mea­sur­ing Re­search Taste in LLM

Les­sons from a year of uni­ver­sity AI safety field building

Col­lege tech­ni­cal AI safety hackathon ret­ro­spec­tive—Ge­or­gia Tech

TastyBench: Toward Measuring Research Taste in LLM

Lessons from a year of university AI safety field building

College technical AI safety hackathon retrospective—Georgia Tech