My above comment (not focusing on the main post for a moment) does not claim that it’s easy to hire alignment researchers, but that “you can’t use money to hire experts because you can’t reliably identify them” is the wrong causal model to explain why hiring for alignment is difficult because it’s false: if that causal model were true, you’d expect no companies to be able to hire experts, which is not the case. Anyway, maybe this is nitpicking but to me something like “AI alignment is in its infancy so it’s harder to hire for it than for other fields” would be more convincing.
your initial post was built on a mistaken premise
I do miss a lot of background on what has been discussed and tried so far, in retrospect most of what I read on LW so far is Rationality: A-Z and the Codex, plus some of the posts in my feed.
If the library had a “A Short History of AI alignment” section I probably would have read it, maybe pinning something like that somewhere visible will help new users get up to speed on the subject more reliably? I do understand that this is a big time investment though
Nod. But, I think you are also wrong about the “you can hire hire experts” causal model, and “we tried this and it’s harder than you think” is entangled with why, and it didn’t seem that useful to argue the point more if you weren’t making more of an explicit effort to figure out where your model was wrong.
Normally, people can hire try to hire experts, but, it often doesn’t work very well. (I can’t find the relevant Paul Graham essay, but, if you don’t have the good taste to know what expertise looks like, you are going to end up hiring people who are good at persuading you they are experts, rather than actual experts).
It can work in vert well understood domains where it’s obvious what success looks like.
In domains where there is no consensus on what an expert would look like (and, since no one has solved the problem, expertise basically “doesn’t exist”).
(Note you didn’t actually argue that hiring experts works, just asserted it)
I agree it’d be nice to have a clearly written history of what has been tried. An awful lot of things have been tried though, and different people coming in would probably want different histories tailored for different goals, and it’s fairly hard to summarize. It could totally be done, but the people equipped to do a good job of it often have other important things to do and it’s not obviously the right call.
If you want to contribute to the overall situation I do think you should expect to need to have a pretty good understanding of the object level problem as well as what meta-level solutions have been tried. A lot of the reason meta-level solutions have failed is that people didn’t understand the object level problem well enough and scaled the wrong thing.
(try searching “postmortem” and maybe skim some of the things that come up, especially higher karma ones?)
My above comment (not focusing on the main post for a moment) does not claim that it’s easy to hire alignment researchers, but that “you can’t use money to hire experts because you can’t reliably identify them” is the wrong causal model to explain why hiring for alignment is difficult because it’s false: if that causal model were true, you’d expect no companies to be able to hire experts, which is not the case. Anyway, maybe this is nitpicking but to me something like “AI alignment is in its infancy so it’s harder to hire for it than for other fields” would be more convincing.
I do miss a lot of background on what has been discussed and tried so far, in retrospect most of what I read on LW so far is Rationality: A-Z and the Codex, plus some of the posts in my feed.
If the library had a “A Short History of AI alignment” section I probably would have read it, maybe pinning something like that somewhere visible will help new users get up to speed on the subject more reliably? I do understand that this is a big time investment though
Nod. But, I think you are also wrong about the “you can hire hire experts” causal model, and “we tried this and it’s harder than you think” is entangled with why, and it didn’t seem that useful to argue the point more if you weren’t making more of an explicit effort to figure out where your model was wrong.
Normally, people can hire try to hire experts, but, it often doesn’t work very well. (I can’t find the relevant Paul Graham essay, but, if you don’t have the good taste to know what expertise looks like, you are going to end up hiring people who are good at persuading you they are experts, rather than actual experts).
It can work in vert well understood domains where it’s obvious what success looks like.
In domains where there is no consensus on what an expert would look like (and, since no one has solved the problem, expertise basically “doesn’t exist”).
(Note you didn’t actually argue that hiring experts works, just asserted it)
I agree it’d be nice to have a clearly written history of what has been tried. An awful lot of things have been tried though, and different people coming in would probably want different histories tailored for different goals, and it’s fairly hard to summarize. It could totally be done, but the people equipped to do a good job of it often have other important things to do and it’s not obviously the right call.
If you want to contribute to the overall situation I do think you should expect to need to have a pretty good understanding of the object level problem as well as what meta-level solutions have been tried. A lot of the reason meta-level solutions have failed is that people didn’t understand the object level problem well enough and scaled the wrong thing.
(try searching “postmortem” and maybe skim some of the things that come up, especially higher karma ones?)