Running Lightcone Infrastructure, which runs LessWrong. You can reach me at habryka@lesswrong.com
habryka(Oliver Habryka)
They still make a lot less than they would if they optimized for profit (that said, I think most “safety researchers” at big labs are only safety researchers in name and I don’t think anyone would philanthropically pay for their labor, and even if they did, they would still make the world worse according to my model, though others of course disagree with this).
I think people who give up large amounts of salary to work in jobs that other people are willing to pay for from an impact perspective should totally consider themselves to have done good comparable to donating the difference between their market salary and their actual salary. This applies to approximately all safety researchers.
Seems like the thing to do is to have a program that happens after MATS, not to extend MATS. I think in-general you want sequential filters for talent, and ideally the early stages are as short as possible (my guess is indeed MATS should be a bit shorter).
I… really don’t see any clickbait here. If anything these titles feel bland to me (and indeed I think LW users could do much better at making titles that are more exciting, or more clearly highlight a good value proposition for the reader, though karma makes up for a lot).
Like, for god’s sake, the top title here is “Social status part 1/2: negotiations over object-level preferences”. I feel like that title is at the very bottom of potential clickbaitiness, given the subject matter.
It’s really hard to get any kind of baseline here, and my guess is it differs hugely between different populations, but my guess (based on doing informal fermis here a bunch of times over the years) would be a lot lower than the average for the population, at least because of demographic factors, and then probably some extra.
I was talking about research scientists here (though my sense is 5 years of being a research engineer is still comparably good for gaining research skills, and probably somewhat better, than most PhDs). I also had a vague sense that at Deepmind being a research engineer was particularly bad for gaining research skills (compared to the same role at OpenAI or Anthropic).
Yes. Besides Deepmind none of the industry labs require PhDs, and I think the Deepmind requirement has also been loosening a bit.
and academia is the only system for producing high quality researchers that is going to exist at scale over the next few years
To be clear, I am not happy about this, but I would take bets that industry labs will produce and train many more AI alignment researchers than academia, so this statement seems relatively straightforwardly wrong (and of course we can quibble over the quality of researchers produced by different institutions, but my guess is the industry-trained researchers will perform well at least by your standards, if not mine)
I don’t think this essay is intended to make generalizations to all “Empiricists”, scientists, and “Epistemologists”. It’s just using those names as a shorthand for three types of people (whose existence seems clear to me, though of course their character does not reflect everyone who might identify under that label).
I didn’t, I provided various caveats in parentheticals about the exact level of danger.
Oops, mea culpa, I skipped your last parenthetical when reading your comment so missed that.
I was including the current level of RLHF as already not qualifying as “pure autoregressive LLMs”. IMO the RLHF is doing a bunch of important work at least at current capability levels (and my guess is also will do some important work at the first dangerous capability levels).
Also, I feel like you forgot the context of the original message, which said “all the way to superintelligence”. I was calibrating my “dangerous” threshold to “superintelligence level dangerous” not “speeds up AI R&D” dangerous.
My sense is almost everyone here expects that we will almost certainly arrive at dangerous capabilities with something else in addition to autoregressive LLMs (at the very least RLHF which is already widely used). I don’t know what’s true in the limit (like if you throw another 30 OOMs of compute at autoregressive models), and I doubt others have super strong opinions here. To me it seems plausible you get something that does recursive self-improvement out of a large enough autoregressive LLMs, but it seems very unlikely to be the fastest way to get there.
But OK, let’s leave aside the title and attempt to imply anything about 99% of trades out there, or the basically Marxist take on all exchanges being exploitation and obsession with showing how you are being tricked or ripped off.
My guess is you are pattern-matching this post and author to something that I am like 99% confident doesn’t match. I am extremely confident the author does not think remotely anything like “all exchanges [are] exploitation” or has a particular obsession with being tricked or ripped off (besides a broad fascination with adverse selection in a broad sense).
I think all of them follow a pattern of “there is a naive baseline expectation where you treat other people’s maps as a blackbox that suggest a deal is good, and a more sophisticated expectation that involves modeling the details of other people’s maps that suggests its bad” and highlights some heuristics that you could have used to figure this out in advance (in the subway example, a fully empty car does indeed seem a bit too good to be true, in the juggling example you do really need to think about who is going to sign up, in the bedroom example you want to avoid giving the other person a choice even if both options look equally good to you, in the Thanksgiving example you needed to model which foods get eaten first and how correlated your preferences are with the ones of other people, etc.).
This feels like a relatively natural category to me. It’s not like an earth-shattering unintuitive category, but I dispute that it doesn’t carve reality at an important joint.
I think this post is just trying to be a set of examples of adverse selection, not really some kind of argument that there is tons of adverse selection everywhere. Lists of examples seem useful, even if they are about phenomena that are not universally present, or require specific environmental circumstances to come together in the right way.
Hmm, it feels to me this misses the most important objection to PhDs, which is that many PhDs seem to teach their students actively bad methodologies and inference methods, sometimes incentivize students to commit scientific fraud, teach writing habits that are optimized to obscure and sound smart instead of aiming to explain clearly and straightforwardly, and often seem to produce zero-sum ideas around ownership of work and intellectual ideas that seem pretty bad for a research field.
To be clear, there are many PhD opportunities that do not have these problems, but many of them do, and it seems to me quite important to somehow identify PhD opportunities that do not have this problem. If you only have the choice to do a PhD under an advisor who does not to you seem actually good at producing clear, honest and high-quality research while acting in high-integrity ways around their colleagues, then I think almost any other job will be better preparation for a research career.
Oh, I totally recognized it, but like, the point of that slogan is to make a locally valid argument that guns are indeed incapable of killing people without being used by people. That is not true of AIs, so it seems like it doesn’t apply.
Promoted to curated: This post is great and indeed probably the best reference on a mechanistic understanding of status that I can think of. Most concretely, the post tied the following threads together for me, which previously felt related but with the relation not being very clear to me:
Improv-style scenes and associated “playing high/low status”
Helen’s “Making yourself big or small”
Ask culture & guess culture
Combat vs. nurture
I also particularly appreciated the idea of ask and guess culture being two limit points as a result of arms-race dynamics in a status tug-of-war, and find that explanation pretty compelling.
On the meta level:
The intro of this post really sells the rest of the post short, and I think I would pretty strongly recommend moving it to the end, or maybe just cutting it completely. I bounced off of this post like 3 times because it lead with all this metadata about what it was trying to do, and what the different sections are about, all without any payoff.
If I was considering a more in-depth edit, I would replace the first section with a concrete specific story, or some concrete application of the theory in this post, that shows the reader a nugget of understanding, and then go into the meta-level about what this post is trying to do (or maybe just not go into that at all), or move that into an appendix. Section 1.3 feels like the first meaty section of the post, and if I could move that up to the top, I would definitely do it.
I’ll hold off on curating this post for a few hours if you do want to make some edits like this, which I think would help a lot with people being more likely to read it in their email inbox and/or to click through to the whole post. But it’s still a great post otherwise and it also seems good to send it out as is.
Downvoted because the title seems straightforwardly false while not actually arguing for it (making it a bit clickbaity, but I am more objecting to the fact that it’s just false). Indeed, this site has a very large number of arguments and posts about why AIs could indeed kill people (and people with AIs might also kill people, though probably many fewer).
I don’t think this essay is commenting on AI optimists in-general. It is commenting on some specific arguments that I have seen around, but I don’t really see how it relates to the recent stuff that Quintin, Nora or you have been writing (and I would be reasonably surprised if Eliezer intended it to apply to that).
You can also leave it up to the reader to decide whether and when the analogy discussed here applies or not. I could spend a few hours digging up people engaging in reasoning really very closely to what is discussed in this article, though by default I am not going to.