MATS scholars have gotten much better over time according to statistics like mentor feedback, CodeSignal scores and acceptance rate. However, some people don’t think this is true and believe MATS scholars have actually gotten worse.
So where are they coming from? I might have a special view on MATS applications since I did MATS 4.0 and 8.0. I think in both cohorts, the heavily x-risk AGI-pilled participants were more of an exception than the rule.
“at the end of a MATS program half of the people couldn’t really tell you why AI might be an existential risk at all.”—Oliver Habryka
I think this is sadly somewhat true, I talked with some people in 8.0 who didn’t seem to have any particular concern with AI existential risk or seemingly never really thought about that. However, I think most people were in fact very concerned about AI existential risk. I ran a poll at some point during MATS 8.0 about Eliezer’s new book and a significant minority of students seemed to have pre-ordered Eliezer’s book, which I guess is a pretty good proxy for whether someone is seriously engaging with AI X-risk.
I think I met some excellent people at MATS 8.0 but would not say they are stronger than 4.0, my guess is that quality went down slightly. I remember in 4.0 a few people that impressed me quite a lot, which I saw less in 8.0. (4.0 had more very incompetent people though).
Suggestions for recruitment
This might also apply for other Safety Fellowships.
Better metrics: My guess is that the recruitment process might need another variable to measure rather than academics/coding/ml experience. The kind of thing that Tim Hua (8.0 scholar) has who created an AI psychosis bench. Maybe something like LessWrong karma but harder to Goodhart.
More explicit messaging: Also it seems to me that if you build an organization that tries to fight against the end of the world from AI, somebody should say that. Might put off some people and perhaps that should happen early. Maybe the website should say: “AI could kill literally everyone, let’s try to do something!”. And maybe the people who heard this MATS thing is good to have on their CV to apply to a PhD or a lab to land a high paying job eventually would be put off by that. What I am trying to say is, if you are creating the Apollo Project and are trying to go to the Moon you should say this, not just vaguely: “we’re interested in aerospace challenges.”
Basic alignment test: Perhaps there should also be a test where people don’t have internet or LLM access and have to answer some basic alignment questions:
Why could a system that we optimize with RL develop power seeking drives?
Why might training an AI create weird unpredictable preferences in an AI?
Why would you expect something that is smarter than us to be very dangerous or why not?
Why should we expect a before and after transition/one critical shot at alignment or why not?
Familiarity with safety literature: In general, I believe the foundational voices like Paul Christiano and Eliezer are less read by safety researchers these days and that is despite philosophy of research mattering more than ever since AIs can do much of our research implementations now. Intuitively it seems to me that people with zero technical skill but high understanding are more valuable to AI safety than somebody with good skills who has zero understanding of AI safety. If someone is able to bring up and illustrate the main points of IABIED for example, I would be very impressed. Perhaps people could select one of a few preeminent voices in AI safety and repeat their basic views, again without access to the internet or an LLM.
Other Suggestions
Research direction: MATS doesn’t seem to have a real research direction, perhaps if there was a strong researcher in charge that could be better. (though could also backfire if they put all resources in the wrong direction) Imagine you would put someone very opinionated like Nate Soares in charge, he would probably remove 80% of mentors and reduce the program to 10-20 people. I am not sure here if this would work out well.
Reading groups on AI safety fundamentals: So should we just offer people to read some of the AI safety fundamentals during MATS? I remember before 4.0 started, we had to do a safety fundamentals online course. This was not the case for 8.0.
At this point AI is so much around us all, that I expect many people to have thought about the existential consequences. I am pessimistic for anyone who hasn’t yet sat down to really think about AI and came to the conclusion that it’s existentially dangerous. I don’t have a ton of hope that someone like that just needs a 1 hour course to deeply understand risks from AI. It might be necessary to select for people who already get it.
Perhaps the mentors changed, and the current ones put much more value on stuff like being good at coding, running ML experiments, etc, than on understanding the key problems, having conceptual clarity around AI X-risk, etc.
There’s certainly more of an ML-streetlighting effect. The most recent track has 5 mentors on “Agency”, out of whom (AFAICT), 2 work on “AI agents”, 1 works mostly on AI consciousness & welfare, and only two (Ngo & Richardson) work on “figuring out the principles of how [the thing we are trying to point at with the word ‘agency’] works”. MATS 3.0 (?) had 6 mentors focused on something in this ballpark (Wentworth & Kosoy, Soares & Hebbar, Armstrong & Gorman) (and the total number of mentors was smaller).
It might also be the case that there’s proportionally more mentors working for capabilities labs.
Intuitively it seems to me that people with zero technical skill but high understanding are more valuable to AI safety than somebody with good skills who has zero understanding of AI safety.
IMO not true. Maybe early on we needed really good conceptual work, and so wanted people who could clearly articulate pros / cons of Paul Christiano and Yudkowsky’s alignment strategies, etc. So it would have made sense to test accordingly. But I think this is less true now—most senior researchers have more good ideas than they can execute. So we’re bottlenecked by execution. Also the difficulty of doing good alignment research has increased, since we increasingly need to work with complex training setups, infrastructure etc. to keep up with advances in capabilities. This motivates requiring a high level of technical skill
I also think that if someone has literally zero technical skill their takes will not be calibrated / grounded, i.e. they are no more than an armchair theorist
Why could a system that we optimize with RL develop power seeking drives?
Why might training an AI create weird unpredictable preferences in an AI?
Why would you expect something that is smarter than us to be very dangerous or why not?
Why should we expect a before and after transition/one critical shot at alignment or why not?
I don’t think these should be considered strong criteria. IMO “believes in X-risk” is not a necessary pre-requisite to do great work for reducing X-risk. E.g. building good tooling for alignment research doesn’t require this at all.
Meta-point: I think the requirements for mentees are in practice mostly determined by specific mentors, and MATS mainly plays an indirect role via curating a “mentor portfolio” that reflects their agenda prioritization. It’s an empirical observation that mentors increasingly want to do empirical research, and I generally endorse deferring ~completely to mentors re: how they want to choose mentees, so I think this whole discussion is somewhat misguided. Maybe your point is more that it would be good to select mentors who want to do more conceptual alignment stuff, but that’s a separate discussion.
E.g. building good tooling for alignment research doesn’t require this at all.
What do you mean, of course it does, or at least something close to it? If you don’t care about it you just take the highest paying job, which will definitely not be to build good tooling for alignment research! Motivation is a necessary component for doing good work, and if you aren’t motivated to do good work by my lights, then you aren’t going to do good work, so good motivations are indeed necessary.
I think there exist people who don’t care a huge amount / feel relatively indifferent about X-risk, but with whom you can nonetheless form beneficial coalitions / make profitable transactions, useful for reducing X-risk. Building tools seems like one thing among many that can be contracted out.
“If they don’t care about X-risk they must be maximally money minded” seems fallacious—those are just two different motivations in the set of all motivations, It’s possible to be neither of those. And many things can motivate someone to want to do good work—intrinsic pride in the work, intellectual curiosity, etc
intrinsic pride in the work, intellectual curiosity
I mean, both of these seem like they will be more easily achieved by helping build more powerful AI systems than by building good tooling for alignment research.
Like I am not saying we can’t tolerate any diversity in why people want to work on AI Alignment, but like, this is an early career training program with no accountability. Selecting and cultivating motivation is by far the best steering tool we have! We should expect that if we ignore it, people will largely follow incentive gradients, or do kind of random things by our lights.
IMO not true. Maybe early on we needed really good conceptual work, and so wanted people who could clearly articulate pros / cons of Paul Christiano and Yudkowsky’s alignment strategies, etc. So it would have made sense to test accordingly. But I think this is less true now—most senior researchers have more good ideas than they can execute.
I don’t think this is a strong argument in favor of the situation being meaningfully different: senior researchers having more good ideas than they have time doesn’t seem like a very new thing at all (e.g. Evan wrote a list like this over three years ago).
More importantly, this doesn’t seem inconsistent with the claim being made. If you had mentors proposing projects in very similar areas or downstream of very similar beliefs, you might still benefit tremendously from people with good understanding of AI safety to work on different things. This depends on whether or not you think that the current project portfolio is close to as good as they can be though. I certainly think we would benefit heavily from more people thinking about what directions are good or not, and that a fair amount of current work suffers from not enough clear thinking about whether they’re useful or not.
That said, I am somewhat optimistic about MATS. I had very similar criticisms during MATS 5.0, when ~1/3-1/2 of all projects were in mech interp. If we’d kept funneling strong engineers to work on mech interp without the skills necessary to evaluate how useful it was, deferring to a specific set of senior researchers, I think the field would be in a meaningfully worse state today. MATS did pivot away from that afterward, which raised my opinion a fair amount (though I’m not sure what the exact mechanism here was).
Also the difficulty of doing good alignment research has increased, since we increasingly need to work with complex training setups, infrastructure etc. to keep up with advances in capabilities. This motivates requiring a high level of technical skill
I don’t think this is true? Like, it’s certainly true for some kinds of good alignment research, but imo very far from a majority.
I don’t think these should be considered strong criteria. IMO “believes in X-risk” is not a necessary pre-requisite to do great work for reducing X-risk. E.g. building good tooling for alignment research doesn’t require this at all.
I also don’t think it’s a necessary pre-requisite to do great alignment research, but MATS is more than the projects MATS scholars work on. For example, if MATS scholars consistently did good research during MATS and then went on to be hired to work on capabilities at OpenAI, I think that would be a pretty bad situation.
if MATS scholars consistently did good research during MATS and then went on to be hired to work on capabilities at OpenAI, I think that would be a pretty bad situation.
I agree. To be clear I support ‘value alignment’ tests, but that wasn’t part of the original claims being made
I don’t think this is just about value alignment. I think if people genuinely understood the arguments for why AI might go badly, they would be much less likely to work on capabilities at OpenAI—definitely far from zero, but for the subset of people who are likely to be MATS scholars, I think it would make a pretty meaningful difference.
Why could a system that we optimize with RL develop power seeking drives?
Why might training an AI create weird unpredictable preferences in an AI?
Why would you expect something that is smarter than us to be very dangerous or why not?
Why should we expect a before and after transition/one critical shot at alignment or why not?
I don’t think these should be considered strong criteria. IMO “believes in X-risk” is not a necessary pre-requisite to do great work for reducing X-risk. E.g. building good tooling for alignment research doesn’t require this at all.
I’ve updated somewhat—it’s true that mentors should likely be given a large say in who they admit to their projects, but are also likely to be myopic (i.e. optimize solely for “get this project done”). MATS might want to counterbalance that by also optimizing for good long-term candidates (who will reduce x-risk long-term). And there probably is a lot of room to select highly value-aligned candidates without compromising much on technical skill, given that MATS receives 100x as many applications as they can accept. (Though I still think there are much better tests of value alignment, and the questions above are likely to be easy to game.)
most senior researchers have more good ideas than they can execute
What do you mean with good idea?
My general impression of the field is that we lack ideas that are likely to solve AI alignment right now. To me that suggests that good ideas are scarce.
Comparing the average quality of participants might be misleading if impact on the field is dominated by the highest quality participants (and it very plausibly is).
A model that seems quite plausible to me is that early MATS participants, who were selected more for engagement with a then-niche field, turned out a bit worse on average than current MATS participants, who are selected for coding skills, but that the early MATS participants had higher variance, and so early MATS cohorts produced more people at the top end and had more overall impact.
(This is like 80% armchair reasoning from selection criteria and 20% thinking about what I’ve observed of different MATS cohorts.)
I think this also applies to other safety fellowships. There isn’t broad societal acceptance yet for the severity of the worst-case outcomes, and if you speak seriously about the stakes to a general audience then you will mostly get nervously laughed off.
MATS currently has “Launch your career in AI alignment & security” on the landing page, which indicates to me that it is branding itself as a professional upskilling program, and this matches the focus on job placements for alumni in its impact reports. With Ryan Kidd’s recent post on AI safety undervaluing founders, it may be possible that in the future they introduce a division which functions more purely as a startup accelerator. One norm in corporate environments is to avoid messaging which provokes discomfort. Even in groups which practice religion, few will have the lack of epistemic immunity required to align their stated eschatological beliefs with their actions, and I am grateful that this is the case.
Ultimately, the purpose of these programs, no matter how prestigious, is to bring people in who are not currently AI safety researchers and give them an environment which would help them train and mature into AI safety researchers. I believe you will find that even amongst those who are working full-time on AI safety, the proportion who are heavily x-risk AGI pilled has shrunk as the field has grown. People who are both x-risk AGI-pilled and meet the technical bar for MATS but aren’t already committed to other projects would be exceedingly rare.
Imagine you would put someone very opinionated like Nate Soares in charge, he would probably remove 80% of mentors and reduce the program to 10-20 people. I am not sure here if this would work out well.
Please make sure the course materials are actually good. The courses often have glaring issues, though they do seem receptive and did say they’ll update both times a pointed this out. Not sure if the latest updates have gone though yet.
I’m working on a course that will reliably cover the core concepts.
What’s going on with MATS recruitment?
MATS scholars have gotten much better over time according to statistics like mentor feedback, CodeSignal scores and acceptance rate. However, some people don’t think this is true and believe MATS scholars have actually gotten worse.
So where are they coming from? I might have a special view on MATS applications since I did MATS 4.0 and 8.0. I think in both cohorts, the heavily x-risk AGI-pilled participants were more of an exception than the rule.
“at the end of a MATS program half of the people couldn’t really tell you why AI might be an existential risk at all.”—Oliver Habryka
I think this is sadly somewhat true, I talked with some people in 8.0 who didn’t seem to have any particular concern with AI existential risk or seemingly never really thought about that. However, I think most people were in fact very concerned about AI existential risk. I ran a poll at some point during MATS 8.0 about Eliezer’s new book and a significant minority of students seemed to have pre-ordered Eliezer’s book, which I guess is a pretty good proxy for whether someone is seriously engaging with AI X-risk.
I think I met some excellent people at MATS 8.0 but would not say they are stronger than 4.0, my guess is that quality went down slightly. I remember in 4.0 a few people that impressed me quite a lot, which I saw less in 8.0. (4.0 had more very incompetent people though).
Suggestions for recruitment
This might also apply for other Safety Fellowships.
Better metrics: My guess is that the recruitment process might need another variable to measure rather than academics/coding/ml experience. The kind of thing that Tim Hua (8.0 scholar) has who created an AI psychosis bench. Maybe something like LessWrong karma but harder to Goodhart.
More explicit messaging: Also it seems to me that if you build an organization that tries to fight against the end of the world from AI, somebody should say that. Might put off some people and perhaps that should happen early. Maybe the website should say: “AI could kill literally everyone, let’s try to do something!”. And maybe the people who heard this MATS thing is good to have on their CV to apply to a PhD or a lab to land a high paying job eventually would be put off by that. What I am trying to say is, if you are creating the Apollo Project and are trying to go to the Moon you should say this, not just vaguely: “we’re interested in aerospace challenges.”
Basic alignment test: Perhaps there should also be a test where people don’t have internet or LLM access and have to answer some basic alignment questions:
Why could a system that we optimize with RL develop power seeking drives?
Why might training an AI create weird unpredictable preferences in an AI?
Why would you expect something that is smarter than us to be very dangerous or why not?
Why should we expect a before and after transition/one critical shot at alignment or why not?
Familiarity with safety literature: In general, I believe the foundational voices like Paul Christiano and Eliezer are less read by safety researchers these days and that is despite philosophy of research mattering more than ever since AIs can do much of our research implementations now. Intuitively it seems to me that people with zero technical skill but high understanding are more valuable to AI safety than somebody with good skills who has zero understanding of AI safety. If someone is able to bring up and illustrate the main points of IABIED for example, I would be very impressed. Perhaps people could select one of a few preeminent voices in AI safety and repeat their basic views, again without access to the internet or an LLM.
Other Suggestions
Research direction: MATS doesn’t seem to have a real research direction, perhaps if there was a strong researcher in charge that could be better. (though could also backfire if they put all resources in the wrong direction) Imagine you would put someone very opinionated like Nate Soares in charge, he would probably remove 80% of mentors and reduce the program to 10-20 people. I am not sure here if this would work out well.
Reading groups on AI safety fundamentals: So should we just offer people to read some of the AI safety fundamentals during MATS? I remember before 4.0 started, we had to do a safety fundamentals online course. This was not the case for 8.0.
At this point AI is so much around us all, that I expect many people to have thought about the existential consequences. I am pessimistic for anyone who hasn’t yet sat down to really think about AI and came to the conclusion that it’s existentially dangerous. I don’t have a ton of hope that someone like that just needs a 1 hour course to deeply understand risks from AI. It might be necessary to select for people who already get it.
Perhaps the mentors changed, and the current ones put much more value on stuff like being good at coding, running ML experiments, etc, than on understanding the key problems, having conceptual clarity around AI X-risk, etc.
There’s certainly more of an ML-streetlighting effect. The most recent track has 5 mentors on “Agency”, out of whom (AFAICT), 2 work on “AI agents”, 1 works mostly on AI consciousness & welfare, and only two (Ngo & Richardson) work on “figuring out the principles of how [the thing we are trying to point at with the word ‘agency’] works”. MATS 3.0 (?) had 6 mentors focused on something in this ballpark (Wentworth & Kosoy, Soares & Hebbar, Armstrong & Gorman) (and the total number of mentors was smaller).
It might also be the case that there’s proportionally more mentors working for capabilities labs.
Disagree somewhat strongly with a few points:
IMO not true. Maybe early on we needed really good conceptual work, and so wanted people who could clearly articulate pros / cons of Paul Christiano and Yudkowsky’s alignment strategies, etc. So it would have made sense to test accordingly. But I think this is less true now—most senior researchers have more good ideas than they can execute. So we’re bottlenecked by execution. Also the difficulty of doing good alignment research has increased, since we increasingly need to work with complex training setups, infrastructure etc. to keep up with advances in capabilities. This motivates requiring a high level of technical skill
I also think that if someone has literally zero technical skill their takes will not be calibrated / grounded, i.e. they are no more than an armchair theorist
I don’t think these should be considered strong criteria. IMO “believes in X-risk” is not a necessary pre-requisite to do great work for reducing X-risk. E.g. building good tooling for alignment research doesn’t require this at all.
Meta-point: I think the requirements for mentees are in practice mostly determined by specific mentors, and MATS mainly plays an indirect role via curating a “mentor portfolio” that reflects their agenda prioritization. It’s an empirical observation that mentors increasingly want to do empirical research, and I generally endorse deferring ~completely to mentors re: how they want to choose mentees, so I think this whole discussion is somewhat misguided. Maybe your point is more that it would be good to select mentors who want to do more conceptual alignment stuff, but that’s a separate discussion.
What do you mean, of course it does, or at least something close to it? If you don’t care about it you just take the highest paying job, which will definitely not be to build good tooling for alignment research! Motivation is a necessary component for doing good work, and if you aren’t motivated to do good work by my lights, then you aren’t going to do good work, so good motivations are indeed necessary.
I think there exist people who don’t care a huge amount / feel relatively indifferent about X-risk, but with whom you can nonetheless form beneficial coalitions / make profitable transactions, useful for reducing X-risk. Building tools seems like one thing among many that can be contracted out.
“If they don’t care about X-risk they must be maximally money minded” seems fallacious—those are just two different motivations in the set of all motivations, It’s possible to be neither of those. And many things can motivate someone to want to do good work—intrinsic pride in the work, intellectual curiosity, etc
I mean, both of these seem like they will be more easily achieved by helping build more powerful AI systems than by building good tooling for alignment research.
Like I am not saying we can’t tolerate any diversity in why people want to work on AI Alignment, but like, this is an early career training program with no accountability. Selecting and cultivating motivation is by far the best steering tool we have! We should expect that if we ignore it, people will largely follow incentive gradients, or do kind of random things by our lights.
I don’t think this is a strong argument in favor of the situation being meaningfully different: senior researchers having more good ideas than they have time doesn’t seem like a very new thing at all (e.g. Evan wrote a list like this over three years ago).
More importantly, this doesn’t seem inconsistent with the claim being made. If you had mentors proposing projects in very similar areas or downstream of very similar beliefs, you might still benefit tremendously from people with good understanding of AI safety to work on different things. This depends on whether or not you think that the current project portfolio is close to as good as they can be though. I certainly think we would benefit heavily from more people thinking about what directions are good or not, and that a fair amount of current work suffers from not enough clear thinking about whether they’re useful or not.
That said, I am somewhat optimistic about MATS. I had very similar criticisms during MATS 5.0, when ~1/3-1/2 of all projects were in mech interp. If we’d kept funneling strong engineers to work on mech interp without the skills necessary to evaluate how useful it was, deferring to a specific set of senior researchers, I think the field would be in a meaningfully worse state today. MATS did pivot away from that afterward, which raised my opinion a fair amount (though I’m not sure what the exact mechanism here was).
I don’t think this is true? Like, it’s certainly true for some kinds of good alignment research, but imo very far from a majority.
I also don’t think it’s a necessary pre-requisite to do great alignment research, but MATS is more than the projects MATS scholars work on. For example, if MATS scholars consistently did good research during MATS and then went on to be hired to work on capabilities at OpenAI, I think that would be a pretty bad situation.
I agree. To be clear I support ‘value alignment’ tests, but that wasn’t part of the original claims being made
I don’t think this is just about value alignment. I think if people genuinely understood the arguments for why AI might go badly, they would be much less likely to work on capabilities at OpenAI—definitely far from zero, but for the subset of people who are likely to be MATS scholars, I think it would make a pretty meaningful difference.
Reflecting on this a little bit:
I’ve updated somewhat—it’s true that mentors should likely be given a large say in who they admit to their projects, but are also likely to be myopic (i.e. optimize solely for “get this project done”). MATS might want to counterbalance that by also optimizing for good long-term candidates (who will reduce x-risk long-term). And there probably is a lot of room to select highly value-aligned candidates without compromising much on technical skill, given that MATS receives 100x as many applications as they can accept. (Though I still think there are much better tests of value alignment, and the questions above are likely to be easy to game.)
What do you mean with good idea?
My general impression of the field is that we lack ideas that are likely to solve AI alignment right now. To me that suggests that good ideas are scarce.
Good = on the pareto frontier of tractable and useful
I think we won’t outright ‘solve’ it (in some provable, ‘formal’ sense), for various reasons (timelines being short, alignment being hard etc)
But we might get close enough in practice by making lots of incremental progress along parallel directions.
Comparing the average quality of participants might be misleading if impact on the field is dominated by the highest quality participants (and it very plausibly is).
A model that seems quite plausible to me is that early MATS participants, who were selected more for engagement with a then-niche field, turned out a bit worse on average than current MATS participants, who are selected for coding skills, but that the early MATS participants had higher variance, and so early MATS cohorts produced more people at the top end and had more overall impact.
(This is like 80% armchair reasoning from selection criteria and 20% thinking about what I’ve observed of different MATS cohorts.)
I think this also applies to other safety fellowships. There isn’t broad societal acceptance yet for the severity of the worst-case outcomes, and if you speak seriously about the stakes to a general audience then you will mostly get nervously laughed off.
MATS currently has “Launch your career in AI alignment & security” on the landing page, which indicates to me that it is branding itself as a professional upskilling program, and this matches the focus on job placements for alumni in its impact reports. With Ryan Kidd’s recent post on AI safety undervaluing founders, it may be possible that in the future they introduce a division which functions more purely as a startup accelerator. One norm in corporate environments is to avoid messaging which provokes discomfort. Even in groups which practice religion, few will have the lack of epistemic immunity required to align their stated eschatological beliefs with their actions, and I am grateful that this is the case.
Ultimately, the purpose of these programs, no matter how prestigious, is to bring people in who are not currently AI safety researchers and give them an environment which would help them train and mature into AI safety researchers. I believe you will find that even amongst those who are working full-time on AI safety, the proportion who are heavily x-risk AGI pilled has shrunk as the field has grown. People who are both x-risk AGI-pilled and meet the technical bar for MATS but aren’t already committed to other projects would be exceedingly rare.
Is your sense here “a large majority” or “a small majority”? Just curious about the rough data here. Like more like 55% or more like 80%?
probably closer to 55%
I’m pretty sure this would work out poorly.
Please make sure the course materials are actually good. The courses often have glaring issues, though they do seem receptive and did say they’ll update both times a pointed this out. Not sure if the latest updates have gone though yet.
I’m working on a course that will reliably cover the core concepts.