aysja comments on The Field of AI Alignment: A Postmortem, and What To Do About It

aysja 27 Dec 2024 5:29 UTC
163 points
86
I’m not convinced that the “hard parts” of alignment are difficult in the standardly difficult, g-requiring way that e.g., a physics post-doc might possess. I do think it takes an unusual skillset, though, which is where most of the trouble lives. I.e., I think the pre-paradigmatic skillset requires unusually strong epistemics (because you often need to track for yourself what makes sense), ~creativity (the ability to synthesize new concepts, to generate genuinely novel hypotheses/ideas), good ability to traverse levels of abstraction (connecting details to large level structure, this is especially important for the alignment problem), not being efficient market pilled (you have to believe that more is possible in order to aim for it), noticing confusion, and probably a lot more that I’m failing to name here.
Most importantly, though, I think it requires quite a lot of willingness to remain confused. Many scientists who accomplished great things (Darwin, Einstein) didn’t have publishable results on their main inquiry for years. Einstein, for instance, talks about wandering off for weeks in a state of “psychic tension” in his youth, it took ~ten years to go from his first inkling of relativity to special relativity, and he nearly gave up at many points (including the week before he figured it out). Figuring out knowledge at the edge of human understanding can just be… really fucking brutal. I feel like this is largely forgotten, or ignored, or just not understood. Partially that’s because in retrospect everything looks obvious, so it doesn’t seem like it could have been that hard, but partially it’s because almost no one tries to do this sort of work, so there aren’t societal structures erected around it, and hence little collective understanding of what it’s like.
Anyway, I suspect there are really strong selection pressures for who ends up doing this sort of thing, since a lot needs to go right: smart enough, creative enough, strong epistemics, independent, willing to spend years without legible output, exceptionally driven, and so on. Indeed, the last point seems important to me—many great scientists are obsessed. Spend night and day on it, it’s in their shower thoughts, can’t put it down kind of obsessed. And I suspect this sort of has to be true because something has to motivate them to go against every conceivable pressure (social, financial, psychological) and pursue the strange meaning anyway.
I don’t think the EA pipeline is much selecting for pre-paradigmatic scientists, but I don’t think lack of trying to get physicists to work on alignment is really the bottleneck either. Mostly I think selection effects are very strong, e.g., the Sequences was, imo, one of the more effective recruiting strategies for alignment. I don’t really know what to recommend here, but I think I would anti-recommend putting all the physics post-docs from good universities in a room in the hope that they make progress. Requesting that the world write another book as good as the Sequences is a… big ask, although to the extent it’s possible I expect it’ll go much further in drawing people out who will self select into this rather unusual “job.”
What links here?
- AI #97: 4 by Zvi (2 Jan 2025 14:10 UTC; 45 points)
- Raemon 30 Dec 2024 6:33 UTC
  28 points
  3
  Parent
  This is the sort of thing I find appealing to believe, but I feel at least somewhat skeptical of. I notice a strong emotional pull to want this to be true (as well as an interesting counterbalancing emotional pull for it to not be true).
  I don’t think I’ve seen output from the people aspiring in this direction without being visibly quite smart to make me think “okay yeah it seems like it’s on track in some sense.”
  I’d be interested in hearing more explicit cruxes from you about it.
  I do think it’s plausible than the “smart enough, creative enough, strong epistemics, independent, willing to spend years without legible output, exceptionally driven, and so on” are sufficient (if you’re at least moderately-but-not-exceptionally-smart). Those are rare enough qualities that it doesn’t necessarily feel like I’m getting a free lunch, if they turn out to be sufficient for groundbreaking pre-paradigmatic research. I agree the x-risk pipeline hasn’t tried very hard to filter for and/or generate people with these qualities.
  (well, okay, “smart enough” is doing a lot of work there, I assume from context you mean “pretty smart but not like genius smart”)
  But, I’ve only really seen you note positive examples, and this seems like the sort of thing that’d have a lot of survivorship bias. There can be tons of people obsessed, but not necessarily on the right things, and if you’re not naturally the right cluster of obsessed + smart-in-the-right-way, I don’t know whether trying to cultivate the obsession on purpose will really work.
  I do nonetheless overall probably prefer people who have all your listed qualities, and who also either can:
  a) self-fund to pursue the research without having to make it legible to others
  b) somehow figure out a way to make it legible along the way
  I probably prefer those people to tackle “the hard parts of alignment” over many other things they could be doing, but not overwhelmingly obviously (and I think it should come with a background awareness that they are making a gamble, and if they aren’t the sort of person who must make that gamble due to their personality makeup, they should be prepared for the (mainline) outcome that it just doesn’t work out)
- johnswentworth 30 Dec 2024 22:19 UTC
  17 points
  3
  Parent
  I’m not convinced that the “hard parts” of alignment are difficult in the standardly difficult, g-requiring way that e.g., a physics post-doc might possess.
  To be clear, I wasn’t talking about physics postdocs mainly because of raw g. Raw g is a necessary element, and physics postdocs are pretty heavily loaded on it, but I was talking about physics postdocs mostly because of the large volume of applied math tools they have.
  The usual way that someone sees footholds on the hard parts of alignment is to have a broad enough technical background that they can see some analogy to something they know about, and try borrowing tools that work on that other thing. Thus the importance of a large volume of technical knowledge.
  - JuliaHP 31 Dec 2024 12:43 UTC
    7 points
    2
    Parent
    Curious about what it would look like to pick up the relevant skills, especially the subtle/vague/tacit skills, in an independent-study setting rather than in academia. As well as the value of doing this, IE maybe its just a stupid idea and its better to just go do a PhD. Is the purpose of a PhD to learn the relevant skills, or to filter for them? (If you have already written stuff which suffices as a response, id be happy to be pointed to the relevant bits rather than having them restated)
    
    ”Broad technical knowledge” should be in some sense the “easiest” (not in terms of time-investment, but in terms of predictable outcomes), by reading lots of textbooks (using similar material as your study guide).
    
    Writing/communication, while more vague, should also be learnable by just writing a lot of things, publishing them on the internet for feedback, reflecting on your process etc.
    
    Something like “solving novel problems” seems like a much “harder” one. I don’t know if this is a skill with a simple “core” or a grab-bag of tactics. Textbook problems take on a “meant-to-be-solved” flavor and I find one can be very good at solving these without being good at tackling novel problems. Another thing I notice is that when some people (myself included) try solving novel problems, we can end up on a path which gets there eventually, but if given “correct” feedback integration would go OOM faster.
    
    I’m sure there are other vague-skills which one ends up picking up from a physics PhD. Can you name others, and how one picks them up intentionally? Am I asking the wrong question?
    - johnswentworth 31 Dec 2024 14:39 UTC
      17 points
      4
      Parent
      I currently think broad technical knowledge is the main requisite, and I think self-study can suffice for the large majority of that in principle. The main failure mode I see would-be autodidacts run into is motivation, but if you can stay motivated then there’s plenty of study materials.
      For practice solving novel problems, just picking some interesting problems (preferably not AI) and working on them for a while is a fine way to practice.
      - Johannes C. Mayer 31 Dec 2024 17:09 UTC
        9 points
        1
        Parent
        Why not AI? Is it that AI alignment is too hard? Or do you think it’s likely one would fall into the “try a bunch of random stuff” paradigm popular in AI, which wouldn’t help much in getting better at solving hard problems?
        
        What do you think about the strategy of instead of learning a textbook e.g. on information theory, or compilers you try to write the textbook and only look at existing material if you are really stuck. That’s my primary learning strategy.
        
        It’s very slow and I probably do it too much, but it allows me to train to solve hard problems that aren’t super hard. If you read all the text books all the practice problems remaining are very hard.
        TristanTrim 12 Aug 2025 23:36 UTC
        3 points
        0
        Parent
        My POV is you are either hitting the hard core problems, in which case you aren’t practising, you’re trying to do the real thing, or you are advancing AI capabilities by solving some other problem, which is bad given the current strategic situation.
        
        Write the textbook is an interesting study strategy. It’s impossible with math though, in which each chapter might be the entire life’s work of multiple a skilled mathematician. This is probably also true of other fields.
        Johannes C. Mayer 28 Aug 2025 10:41 UTC
        4 points
        0
        Parent
        The idea is that you write the textbook yourself until you have aquired all the skills about doing original thinking. It’s not about never looking up things. Though aquiring the skill of thinking by reinventing things seems better, because the research frontier has much harder problems. So hard that they are not the right difficulty to efficiently learn the skill of “original problem solving”.
      - JuliaHP 31 Dec 2024 16:29 UTC
        9 points
        1
        Parent
        (That broad technical knowledge is the main thing (as opposed to tacit skills) why you value a physics PhD is a really surprising response to me, and seems like an important part of the model that didn’t come across from the post.)
- Seth Herd 27 Dec 2024 18:51 UTC
  14 points
  5
  Parent
  I think this is right. A couple of follow-on points:
  
  There’s a funding problem if this is an important route to progress. If good work is illegible for years, it’s hard to decide who to fund, and hard to argue for people to fund it. I don’t have a proposed solution, but I wanted to note this large problem.
  
  Einstein did his pre-paradigmatic work largely alone. Better collaboration might’ve sped it up.
  
  LessWrong allows people to share their thoughts prior to having publishable journal articles and get at least a few people to engage.
  
  This makes the difficult pre-paradigmatic thinking a group effort instead of a solo effort. This could speed up progress dramatically.
  
  This post and the resulting comments and discussions is an example of the community collectively doing much of the work you describe: traversing levels, practicing good epistemics, and remaining confused.
  
  Having conversations with other LWers (on calls, by DM, or in extended comment threads) is tremendously useful for me. I could produce those same thoughts and critiques, but it would take me longer to arrive at all of those different viewpoints of the issue. I mention this to encourage others to do it. Communication takes time and some additional effort (asking people to talk), but it’s often well worth it. Talking to people who are interested in and knowledgeable on the same topics can be an enormous speedup in doing difficult pre-paradigmatic thinking.
  
  LessWrong isn’t perfect, but it’s a vast improvement on the collaboration tools and communities that have been available to scientists in other fields. We should take advantage of it.
  - TristanTrim 13 Aug 2025 0:17 UTC
    15 points
    6
    Parent
    Regarding the illegibility problem, it is a bit of a specific case of a general problem I have been brooding on for years. There are 3 closely related issues:
    
    Understanding the scope and context of different ideas. As an example, I struggle to introduce people who are familiar with AI and ML to AIA because they assume that it is not a field people have been focusing on for 20 years giving it depth and breadth that would take them a long time to engage with. (They instead assume their ML background gives them better insight and talk over me with asinine observations that I read about on LW a decade ago… it’s frustrating.)
    Connecting people focused on similar concepts and problems. This is especially the case across terminological divides of which there are many in pre-paradigmatic fields like AIA. Any independent illegible researcher very likely has their own independent terminology to some degree.
    De-duplicating noise in conversations. It is hard to find original ideas when many people are saying variations of the same common (often discredited) ideas.
    
    The solution I have been daydreaming about is the creation of a social media platform that promotes the manual and automatic linking and deduplication of posts. Similar in some ways to a wiki, but with the idea being that if two ideas are actually the same idea wearing two different disguises, the goal is to find the way to describe that idea with the broadest applicability and ease of understanding, and link the other descriptions to that description. This along with some kind of graph representation of the ideas could ideally produce a map of the actual size and shape of a field (and how linked it is to other fields).
    
    The manual linking would need to be promoted with some kind of karma and direct social engagement dynamic (IE, your links show up on your user profile page so people can congratulate you on how clever you are for noticing that idea A is actually the same as idea B).
    
    The automatic linking could be done by various kinds of spiders/bots. Probably LLMs. Importantly I would want bots, which may hallucinate, to need verification before the link is presented as solid, but in fact this applies to any human linking idea nodes as well. Ideally links would be provided with an explanation, and only after many users confirm (or upvote) a link, would it be presented by the default interface.
    
    There could also be other kinds of links than just “these are the same idea”. The kinds of links I find most compelling are “A is similar/same as B”, “A contradicts B”, “A supports B”, “A is part of B”.
    
    I first started thinking of this idea with reference to how traditional social media exhausts me because it seems like a bunch of people talking past each other and you need to read way more than you should to understand any trending issue. It would be much nicer to browse a graph representing the unique elements of the conversation and potentially use LLM technology to find the parts of the conversation exploring your current POV which is either at the edge of the conversation, or you can find how much you would need to read in order to get to the edge, in which case you can decide it is not worth being informed about the issue and say “sorry, but I can’t support either side, I am uninformed” or put in that effort in an efficient way and (hopefully) without getting caught in an echo chamber failing to engage with actual criticism.
    
    But after thinking about the idea it appeals to me as a way to interact with all bodies of knowledge. I think the hardest parts would be setting it up so it feels engaging to people and driving adoption. (Aside from actual implementation difficulty.)
    - Seth Herd 13 Aug 2025 13:09 UTC
      3 points
      4
      Parent
      This is a fascinating idea and I think there’s probably some version that could be highly useful.
      
      I’d make this a short form or a top-level post. I think this idea is important.
      
      I am wondering if there’s some version of this that can run on lesswrong or in parallel to lesswrong. Less wrong is of course where all of those ideas that need linking live.
      - TristanTrim 13 Aug 2025 19:24 UTC
        1 point
        0
        Parent
        Oh yeah, having it link to external resources (LW, wiki, etc...) is probably a really good idea. I’ll expand on the idea a bit and make a post for it.
        Seth Herd 13 Aug 2025 23:20 UTC
        2 points
        0
        Parent
        I meant use LW instead of trying to start a new site that duplicates (and competes with) its function. LW is the place people post those alignment idea; why try to compete with it?
        
        Maybe that’s what you meant. It should be possible to do another site that searches and cross-references LW posts.
        
        To some extent, this functionality is already provided by recent LLMs; you can ask them what posts on LW cover ideas, and they’ve got decent answers.
        
        Improving that functionality or just spreading the idea that you’re wasting your time if you write up your idea before doing a bunch of LLM queries and deep research reports. Just searches typically don’t work to surface similar ideas since the terminology is usually completely different even between ideas that are essentially exactly the same.
        TristanTrim 13 Aug 2025 23:42 UTC
        1 point
        0
        Parent
        Yeah. One thing is I think this would be valuable for topics other than just alignment, but if the idea works well there wouldn’t be a reason not to have LW have it’s own version or have tight coupling with search and cross reference of LW posts.
        
        wasting your time if you write up your idea before doing a bunch of LLM queries and deep research reports
        
        This is another idea I dislike. I feel like I am more conscientious about this problem than other people and so in the past I would put a lot of effort into researching before expressing my own views. I think this had 3 negative effects. (1) Often I would get exhausted and lose interest in the topic before expressing my view. If it was novel or interesting, I would never know and neither would anyone else. (2) If there were subtle interesting aspects to my original view, I risked overwriting them as I researched without noticing. I could write my views and then research and then review them, but in practice I rarely do that. (3) There is no publicly visible evidence showing that I have been heavily engaged in reading and thinking about AIA (or any of the other things I have focused on). This is bad both because other people cannot point out any misconceptions I might have, and also it is difficult to find interested people in that class or know how many of them there are.
        
        I think people like me would find it easier to express their ideas if it was very likely it would get categorized into a proper location where any useful insight it may contain could be extracted, or referenced, and if it was purely redundant it could basically be marked as such and linked to the node representing that idea so it wouldn’t clog up the information channel but would instead provide statistics about that node and tie my identity to that node.
        
        Just searches typically don’t work to surface similar ideas since the terminology is usually completely different even between ideas that are essentially exactly the same.
        
        Yeah, exactly, this is a big problem and is why I want karma and intrinsic social reward motivation directing people to try to link those deeper, harder to link ideas. My view is that it shouldn’t be the platform that provides the de-duplication tools, but many different people should try building LLM bots and doing the work manually. The platform would provide the incentivization (karma, social, probably not monitary), and maybe would also provide the integrated version of the many users individually suggested links.
        Seth Herd 13 Aug 2025 23:56 UTC
        2 points
        0
        Parent
        Understood on raising the bar for writing. I think there’s a tradeoff; you might be wasting your time but you do get the visible evidence you’ve been thinking about it (although evidence that you thought about something but didn’t bother to research others’ thinking on that topic is… questionably positive).
        
        But sure, if that’s what you or anyone finds motivating, it would be great to get the value from that use of time by cataloguing it better.
        
        It does need to be mostly-automated though. People with deep knowledge have little time to read let alone to aid others’ reading.
        TristanTrim 14 Aug 2025 0:14 UTC
        1 point
        0
        Parent
        
        It does need to be mostly-automated though. People with deep knowledge have little time to read let alone to aid others’ reading.
        
        Yes, exactly. I’m getting quite idealistic here, but I’m imagining it as an ecosystem.
        
        People with deep knowledge wouldn’t need to waste their time with things that are obviously reduplication of existing ideas, but would be able to find anything novel that is currently lost in the noise.
        
        The entry point for relative novices would be posting some question or claim (there isn’t much difference between a search to a search engine and a prompt for conversation on social media) then very low effort spiders would create potential ties between what they said and places in the many conversation graphs it could fit. This would be how a user “locates themselves” in the conversation. From this point they could start learning about the conversation graph surrounding them, navigating either by reading nearby nodes or other posts within their node, or by writing new related posts that spiders either suggest are at other locations in the graph, or suggest that are original branches off from the existing graph.
        
        If you are interested in watching specific parts of the graph you might get notified that someone has entered your part of the graph and review their posts as written, which would let you add higher quality human links, more strongly tying them into the conversation graph. You might also decide that they are going in interesting directions, in which case you might paraphrase their idea as a genuinely original node branching off from the nearby location. You might instead decide that the spiders were mistaken about locating this person’s posting as being in the relevant part of the graph, in which case you could mark opposition to the link, weakening it and providing a negative training signal for the spiders (or other people) who suggested the link.
        
        To some extent this is how things already work with tagging, but it really doesn’t come close to my ideals of deeply deduplicating conversations into their simplest (but fully represented) form.
  - Linda Linsefors 29 Dec 2024 16:22 UTC
    5 points
    0
    Parent
    Einstein did his pre-paradigmatic work largely alone. Better collaboration might’ve sped it up.
    I think this is false. As I remember hearing the story, he where corresponding with several people via letters.
    - Steven Byrnes 31 Dec 2024 12:20 UTC
      10 points
      2
      Parent
      I know very little, but there’s a fun fact here: “During their lifetimes, Darwin sent at least 7,591 letters and received 6,530; Einstein sent more than 14,500 and received more than 16,200.” (Not sure what fraction was technical vs personal.)
      Also, this is a brief summary of Einstein’s mathematician friend Marcel Grossmann’s role in general relativity.
      - Seth Herd 1 Jan 2025 20:45 UTC
        7 points
        0
        Parent
        In the piece you linked, it sounds like Einstein had the correct geometry for general relativity one day after he asked for help finding one. Of course, that’s one notable success amongst perhaps a lot of collaboration. The number of letters he sent and received implies that he actually did a lot of written collaboration.
        
        I wonder about the value of real-time conversation vs. written exchanges. And the value of being fully engaged; truly curious about your interlocutor’s ideas.
        
        My own experience watching progress happen (and not-happen) in theoretical neuroscience is that fully engaged conversations with other true experts with different viewpoints was rare and often critical for real progress.
        
        My perception is that those conversations are tricky to produce. Experts are often splitting their attention between impressing people and coolheaded, openminded discussion. And they weren’t really seeking out these conversations, just having them when it was convenient, and being really fully engaged only when the interpersonal vibe happened to be right. Even so, the bit of real conversation I saw seemed quite important.
        
        It would be helpful to understand collaboration on difficult theory better, but it would be a whole research topic.
    - Seth Herd 29 Dec 2024 21:28 UTC
      4 points
      0
      Parent
      By largely alone I meant without the rich collaboration of having an office in the same campus or phone calls or LessWrong.
      - Linda Linsefors 29 Dec 2024 22:40 UTC
        5 points
        2
        Parent
        I think the qualitive difference is not as large as you think it is. But I also don’t think this is very crux-y for anything, so I will not try to figure out how to translate my reasoning to words, sorry.
- Chris_Leong 27 Dec 2024 17:17 UTC
  4 points
  0
  Parent
  Agreed. Simply focusing on physics post-docs feels too narrow to me.
  Then again, just as John has a particular idea of what good alignment research looks like, I have my own idea: I would lean towards recruiting folk with both a technical and a philosophical background. It’s possible that my own idea is just as narrow.
  - johnswentworth 27 Dec 2024 17:47 UTC
    8 points
    0
    Parent
    Simply focusing on physics post-docs feels too narrow to me.
    The post did explicitly say “Obviously that doesn’t mean we exclusively want physics postdocs”.
    - Chris_Leong 28 Dec 2024 2:12 UTC
      4 points
      0
      Parent
      Thanks for clarifying. Still feels narrow as a primary focus.
- TristanTrim 12 Aug 2025 23:27 UTC
  3 points
  0
  Parent
  
  the Sequences was, imo, one of the more effective recruiting strategies for alignment
  
  I read the Sequences nearly a decade ago and it, along with “Superintelligence” are the main things that have made my worldview such that I think AIA is a hard problem and we are not on track to solve it, and as a result, I am trying to become an AIA researcher focusing on the hard problems.
  
  It may be that there are many people out there considering the problem without support and without publishing their results because they don’t have any concrete, publishable results yet. I’m slowly getting to the point that I think I could describe a path to solving AIA (See here for my first attempt). But it’s still fuzzy and it’s taken a long time and a lot of coming back to consider the problem many times in between focusing on other things and without having enough traction on the problem to feel worth talking about.
  
  But it’s now my main career and/or life goal now. I just finished my BSc and am going to spend until my convocation in November self studying and publishing articles and seeking roles or funding.