Seth Herd comments on The Field of AI Alignment: A Postmortem, and What To Do About It

Seth Herd 27 Dec 2024 18:51 UTC
14 points
5
I think this is right. A couple of follow-on points:

There’s a funding problem if this is an important route to progress. If good work is illegible for years, it’s hard to decide who to fund, and hard to argue for people to fund it. I don’t have a proposed solution, but I wanted to note this large problem.

Einstein did his pre-paradigmatic work largely alone. Better collaboration might’ve sped it up.

LessWrong allows people to share their thoughts prior to having publishable journal articles and get at least a few people to engage.

This makes the difficult pre-paradigmatic thinking a group effort instead of a solo effort. This could speed up progress dramatically.

This post and the resulting comments and discussions is an example of the community collectively doing much of the work you describe: traversing levels, practicing good epistemics, and remaining confused.

Having conversations with other LWers (on calls, by DM, or in extended comment threads) is tremendously useful for me. I could produce those same thoughts and critiques, but it would take me longer to arrive at all of those different viewpoints of the issue. I mention this to encourage others to do it. Communication takes time and some additional effort (asking people to talk), but it’s often well worth it. Talking to people who are interested in and knowledgeable on the same topics can be an enormous speedup in doing difficult pre-paradigmatic thinking.

LessWrong isn’t perfect, but it’s a vast improvement on the collaboration tools and communities that have been available to scientists in other fields. We should take advantage of it.
- TristanTrim 13 Aug 2025 0:17 UTC
  15 points
  6
  Parent
  Regarding the illegibility problem, it is a bit of a specific case of a general problem I have been brooding on for years. There are 3 closely related issues:
  1. Understanding the scope and context of different ideas. As an example, I struggle to introduce people who are familiar with AI and ML to AIA because they assume that it is not a field people have been focusing on for 20 years giving it depth and breadth that would take them a long time to engage with. (They instead assume their ML background gives them better insight and talk over me with asinine observations that I read about on LW a decade ago… it’s frustrating.)
  2. Connecting people focused on similar concepts and problems. This is especially the case across terminological divides of which there are many in pre-paradigmatic fields like AIA. Any independent illegible researcher very likely has their own independent terminology to some degree.
  3. De-duplicating noise in conversations. It is hard to find original ideas when many people are saying variations of the same common (often discredited) ideas.
  The solution I have been daydreaming about is the creation of a social media platform that promotes the manual and automatic linking and deduplication of posts. Similar in some ways to a wiki, but with the idea being that if two ideas are actually the same idea wearing two different disguises, the goal is to find the way to describe that idea with the broadest applicability and ease of understanding, and link the other descriptions to that description. This along with some kind of graph representation of the ideas could ideally produce a map of the actual size and shape of a field (and how linked it is to other fields).
  
  The manual linking would need to be promoted with some kind of karma and direct social engagement dynamic (IE, your links show up on your user profile page so people can congratulate you on how clever you are for noticing that idea A is actually the same as idea B).
  
  The automatic linking could be done by various kinds of spiders/bots. Probably LLMs. Importantly I would want bots, which may hallucinate, to need verification before the link is presented as solid, but in fact this applies to any human linking idea nodes as well. Ideally links would be provided with an explanation, and only after many users confirm (or upvote) a link, would it be presented by the default interface.
  
  There could also be other kinds of links than just “these are the same idea”. The kinds of links I find most compelling are “A is similar/same as B”, “A contradicts B”, “A supports B”, “A is part of B”.
  
  I first started thinking of this idea with reference to how traditional social media exhausts me because it seems like a bunch of people talking past each other and you need to read way more than you should to understand any trending issue. It would be much nicer to browse a graph representing the unique elements of the conversation and potentially use LLM technology to find the parts of the conversation exploring your current POV which is either at the edge of the conversation, or you can find how much you would need to read in order to get to the edge, in which case you can decide it is not worth being informed about the issue and say “sorry, but I can’t support either side, I am uninformed” or put in that effort in an efficient way and (hopefully) without getting caught in an echo chamber failing to engage with actual criticism.
  
  But after thinking about the idea it appeals to me as a way to interact with all bodies of knowledge. I think the hardest parts would be setting it up so it feels engaging to people and driving adoption. (Aside from actual implementation difficulty.)
  - Seth Herd 13 Aug 2025 13:09 UTC
    3 points
    4
    Parent
    This is a fascinating idea and I think there’s probably some version that could be highly useful.
    
    I’d make this a short form or a top-level post. I think this idea is important.
    
    I am wondering if there’s some version of this that can run on lesswrong or in parallel to lesswrong. Less wrong is of course where all of those ideas that need linking live.
    - TristanTrim 13 Aug 2025 19:24 UTC
      1 point
      0
      Parent
      Oh yeah, having it link to external resources (LW, wiki, etc...) is probably a really good idea. I’ll expand on the idea a bit and make a post for it.
      - Seth Herd 13 Aug 2025 23:20 UTC
        2 points
        0
        Parent
        I meant use LW instead of trying to start a new site that duplicates (and competes with) its function. LW is the place people post those alignment idea; why try to compete with it?
        
        Maybe that’s what you meant. It should be possible to do another site that searches and cross-references LW posts.
        
        To some extent, this functionality is already provided by recent LLMs; you can ask them what posts on LW cover ideas, and they’ve got decent answers.
        
        Improving that functionality or just spreading the idea that you’re wasting your time if you write up your idea before doing a bunch of LLM queries and deep research reports. Just searches typically don’t work to surface similar ideas since the terminology is usually completely different even between ideas that are essentially exactly the same.
        TristanTrim 13 Aug 2025 23:42 UTC
        1 point
        0
        Parent
        Yeah. One thing is I think this would be valuable for topics other than just alignment, but if the idea works well there wouldn’t be a reason not to have LW have it’s own version or have tight coupling with search and cross reference of LW posts.
        
        wasting your time if you write up your idea before doing a bunch of LLM queries and deep research reports
        
        This is another idea I dislike. I feel like I am more conscientious about this problem than other people and so in the past I would put a lot of effort into researching before expressing my own views. I think this had 3 negative effects. (1) Often I would get exhausted and lose interest in the topic before expressing my view. If it was novel or interesting, I would never know and neither would anyone else. (2) If there were subtle interesting aspects to my original view, I risked overwriting them as I researched without noticing. I could write my views and then research and then review them, but in practice I rarely do that. (3) There is no publicly visible evidence showing that I have been heavily engaged in reading and thinking about AIA (or any of the other things I have focused on). This is bad both because other people cannot point out any misconceptions I might have, and also it is difficult to find interested people in that class or know how many of them there are.
        
        I think people like me would find it easier to express their ideas if it was very likely it would get categorized into a proper location where any useful insight it may contain could be extracted, or referenced, and if it was purely redundant it could basically be marked as such and linked to the node representing that idea so it wouldn’t clog up the information channel but would instead provide statistics about that node and tie my identity to that node.
        
        Just searches typically don’t work to surface similar ideas since the terminology is usually completely different even between ideas that are essentially exactly the same.
        
        Yeah, exactly, this is a big problem and is why I want karma and intrinsic social reward motivation directing people to try to link those deeper, harder to link ideas. My view is that it shouldn’t be the platform that provides the de-duplication tools, but many different people should try building LLM bots and doing the work manually. The platform would provide the incentivization (karma, social, probably not monitary), and maybe would also provide the integrated version of the many users individually suggested links.
        Seth Herd 13 Aug 2025 23:56 UTC
        2 points
        0
        Parent
        Understood on raising the bar for writing. I think there’s a tradeoff; you might be wasting your time but you do get the visible evidence you’ve been thinking about it (although evidence that you thought about something but didn’t bother to research others’ thinking on that topic is… questionably positive).
        
        But sure, if that’s what you or anyone finds motivating, it would be great to get the value from that use of time by cataloguing it better.
        
        It does need to be mostly-automated though. People with deep knowledge have little time to read let alone to aid others’ reading.
        TristanTrim 14 Aug 2025 0:14 UTC
        1 point
        0
        Parent
        
        It does need to be mostly-automated though. People with deep knowledge have little time to read let alone to aid others’ reading.
        
        Yes, exactly. I’m getting quite idealistic here, but I’m imagining it as an ecosystem.
        
        People with deep knowledge wouldn’t need to waste their time with things that are obviously reduplication of existing ideas, but would be able to find anything novel that is currently lost in the noise.
        
        The entry point for relative novices would be posting some question or claim (there isn’t much difference between a search to a search engine and a prompt for conversation on social media) then very low effort spiders would create potential ties between what they said and places in the many conversation graphs it could fit. This would be how a user “locates themselves” in the conversation. From this point they could start learning about the conversation graph surrounding them, navigating either by reading nearby nodes or other posts within their node, or by writing new related posts that spiders either suggest are at other locations in the graph, or suggest that are original branches off from the existing graph.
        
        If you are interested in watching specific parts of the graph you might get notified that someone has entered your part of the graph and review their posts as written, which would let you add higher quality human links, more strongly tying them into the conversation graph. You might also decide that they are going in interesting directions, in which case you might paraphrase their idea as a genuinely original node branching off from the nearby location. You might instead decide that the spiders were mistaken about locating this person’s posting as being in the relevant part of the graph, in which case you could mark opposition to the link, weakening it and providing a negative training signal for the spiders (or other people) who suggested the link.
        
        To some extent this is how things already work with tagging, but it really doesn’t come close to my ideals of deeply deduplicating conversations into their simplest (but fully represented) form.
- Linda Linsefors 29 Dec 2024 16:22 UTC
  5 points
  0
  Parent
  Einstein did his pre-paradigmatic work largely alone. Better collaboration might’ve sped it up.
  I think this is false. As I remember hearing the story, he where corresponding with several people via letters.
  - Steven Byrnes 31 Dec 2024 12:20 UTC
    10 points
    2
    Parent
    I know very little, but there’s a fun fact here: “During their lifetimes, Darwin sent at least 7,591 letters and received 6,530; Einstein sent more than 14,500 and received more than 16,200.” (Not sure what fraction was technical vs personal.)
    Also, this is a brief summary of Einstein’s mathematician friend Marcel Grossmann’s role in general relativity.
    - Seth Herd 1 Jan 2025 20:45 UTC
      7 points
      0
      Parent
      In the piece you linked, it sounds like Einstein had the correct geometry for general relativity one day after he asked for help finding one. Of course, that’s one notable success amongst perhaps a lot of collaboration. The number of letters he sent and received implies that he actually did a lot of written collaboration.
      
      I wonder about the value of real-time conversation vs. written exchanges. And the value of being fully engaged; truly curious about your interlocutor’s ideas.
      
      My own experience watching progress happen (and not-happen) in theoretical neuroscience is that fully engaged conversations with other true experts with different viewpoints was rare and often critical for real progress.
      
      My perception is that those conversations are tricky to produce. Experts are often splitting their attention between impressing people and coolheaded, openminded discussion. And they weren’t really seeking out these conversations, just having them when it was convenient, and being really fully engaged only when the interpersonal vibe happened to be right. Even so, the bit of real conversation I saw seemed quite important.
      
      It would be helpful to understand collaboration on difficult theory better, but it would be a whole research topic.
  - Seth Herd 29 Dec 2024 21:28 UTC
    4 points
    0
    Parent
    By largely alone I meant without the rich collaboration of having an office in the same campus or phone calls or LessWrong.
    - Linda Linsefors 29 Dec 2024 22:40 UTC
      5 points
      2
      Parent
      I think the qualitive difference is not as large as you think it is. But I also don’t think this is very crux-y for anything, so I will not try to figure out how to translate my reasoning to words, sorry.