Honest take on AI safety programs

Everyone thinks they need to get into an AI safety program to do safety/alignment/interpretability research but that is false and anyone with enough interest can do independent research which paves the way for those programs and I’m going to talk about this in this post. This covers the generally followed process of the programs.

The Application Section

It’s the most exhaustive and grueling part of the applications but the most rewarding in my opinion. This part usually contains a self-reporting of the applicant’s info and my ideal section, mentor specific questions and what the mentors are expecting from each mentee. The mentor specific section contains the info or background knowledge you need to apply, each of them have different ideas and sections of the research which you must read before applying to their stream.

This part of reading is extremely rewarding if you are picking something that seems to interest or inspire you genuinely to work on this in the long run. Picking them is actually the first step but reading them is the time consuming part here. There exists multiple research papers tagged in their description, and multiple paper’s relating to them. You can use LLM’s to speedrun to learn the papers but also ensure run through a glance and catch the nuances missed by the LLM’s. While completing this all with LLM’s seem as the most appealing choice, it’s something you mustn’t do, if you do so, the whole aim of this section is just a waste of time and tokens.

The Work Test:

After completion and the assessment of the initial stage application, you’ll be invited by the mentors for a specific work test, while some of these are general code-signal coding assessment which I don’t want to cover here due to the amount of people extensively writing about this since it’s applicable generally in the field of CS.

The latter of the assessments are the ones which are based on or related to the research of the specific mentors. This is the part I want to talk about, here you get to work closely or on the pre-requisites of the research proposal of the mentor.

Some mentors have specific tests like implementing toy versions of a specific project, etc. while MATS like program has empirical tests which is completely based on the reasoning of topics from safety based scenarios.

The Interview:

I think the title itself speaks for itself, so I’m going to keep it straight and simple. It happens when the mentors are satisfied with your work test. In this part generally, you walk through the work test with the mentor and answer questions based on your specific decisions, the mentors expect very good reasoning skills and they are quite welcoming and helpful in specific cases if you are stuck like me(I have anxiety induced stuttering). After this you just wait for your decision.

WHY ?

Readers until now might think why do I explain all these process when it’s well known, it’s actually that, there are 3 scenarios:

People in the process completely slop this with LLM generated answers, code and really not struggling with the process, please stop that my dear fellows, it’s appealing not rewarding.
These are people that genuinely did well and got rejected due to minor issues like no mentor matches from matching algorithms due to preference ranking (Ex: SPAR) or moving forward other better canditate.

so for people in point 2, you are the ones that I really want to talk to in this post, you guys really understood the process and did well and understood the concepts and did the work test. If you think compute is constraint you can apply for Rapid grants, lambda labs research grant, etc.

These tests or the readings you did are the base to the ideas that emerge to you if are induced in the loop for a certain period of time and I’m a little solid proof of this. I was rejected from almost every safety programs I have applied to but I still didn’t give up, I continued working on the ideas or the work test, which lead to,

A Neurips Workshop Paper based on COT logit difference Amplification inspired by Santiago Aranguri work task for SPAR, he gave the idea to explore his work on Logit diff amplification, even nudged us to explore by applying this in COT. I was rejected by him, but I could produce a work in Neurips workshop.
A ICML Workshop paper based on memory based failure modes in agents, this was a example of genuine idea popping in my head but all credits to the ctrl-z paper (by tyler tracy and team (Read for MARS 4.0)) which allowed me to explore the confabulation in the models.
A novel blog post that’s upcoming based on activation steering, for this one I would credit the constant reading that induced on me in the application processes.

So the takeaway is clear, don’t give up and continue on your research/readings/work tasks and try to be independent and keep knocking the door, the work you did without giving up would really help you open that door.

The programs rejected me but those where the ones that gave me my ideas and skills in research that no one could’ve taught me, I’m still knocking on them, no way I’m going to stop and none of you should.