Noosphere89 comments on The Field of AI Alignment: A Postmortem, and What To Do About It

Noosphere89 27 Dec 2024 0:38 UTC
15 points
5
I actually disagree with the natural abstractions research being ungrounded. Indeed, I think there is reason to believe that at least some of the natural abstractions work, especially the natural abstraction hypothesis actually sorts of holds true for today’s AI, and thus is the most likely out of the theoretical/agent-foundation approaches to work (I’m usually critical to agent foundations, but John Wentworth’s work is an exception that I’d like funding for).

For example, this post does an experiment that shows that OOD data still makes the Platonic Representation Hypothesis true, meaning that it’s likely that deeper factors are at play than just shallow similarity:

https://www.lesswrong.com/posts/Su2pg7iwBM55yjQdt/exploring-the-platonic-representation-hypothesis-beyond-in
- sunwillrise 27 Dec 2024 9:21 UTC
  16 points
  0
  Parent
  I’m wary of a possible equivocation about what the “natural abstraction hypothesis” means here.
  If we are referring to the redundant information hypothesis and various kinds of selection theorems, this is a mathematical framework that could end up being correct, is not at all ungrounded, and Wentworth sure seems like the man for the job.
  But then you are still left with the task of grounding this framework in physical reality to allow you to make correct empirical predictions about and real-world interventions on what you will see from more advanced models. Our physical world abstracting well seems plausible (not necessarily >50% likely), and these abstractions being “natural” (e.g., in a category-theoretic sense) seems likely conditional on the first clause of this sentence being true, but I give an extremely low probability to the idea that these abstractions will be used by any given general intelligence or (more to the point) advanced AI model to a large and wide enough extent that retargeting the search is even close to possible.
  And indeed, it is the latter question that represents the make-or-break moment for natural abstractions’ theory of change, for it is only when the model in front of you (as opposed to some other idealized model) uses these specific abstractions that you can look through the AI’s internal concepts and find your desired alignment target.
  Rohin Shah has already explained the basic reasons why I believe the mesa-optimizer-type search probably won’t exist/be findable in the inner workings of the models we encounter: “Search is computationally inefficient relative to heuristics, and we’ll be selecting really hard on computational efficiency.” And indeed, when I look at the only general intelligences I have ever encountered in my entire existence thus far, namely humans, I see mostly just a kludge of impulses and heuristics that depend very strongly (almost entirely) on our specific architectural make-up and the contextual feedback we encounter in our path through life. Change either of those and the end result shifts massively.
  And even moving beyond that, is the concept of the number “three” a natural abstraction? Then I see entire collections and societies of (generally intelligent) human beings today who don’t adopt it. Are the notions of “pressure” and “temperature” and “entropy” natural abstractions? I look at all human beings in 1600 and note that not a single one of them had ever correctly conceptualized a formal version of any of those; and indeed, even making a conservative estimate of the human species (with an essentially unchanged modern cognitive architecture) having existed for 200k years, this means that for 99.8% of our species’ history, we had no understanding whatsoever of concepts as “universal” and “natural” as that. If you look at subatomic particles like electrons or stuff in quantum mechanics, the percentage manages to get even higher. And that’s only conditioning on abstractions about the outside world that we have eventually managed to figure out; what about the other unknown unknowns?
  For example, this post does an experiment that shows that OOD data still makes the Platonic Representation Hypothesis true, meaning that it’s likely that deeper factors are at play than just shallow similarity
  I don’t think it shows that at all, since I have not been able to find any analysis of the methodology, data generation, discussion of results, etc. With no disrespect to the author (who surely wasn’t intending for his post to be taken as authoritative as a full paper in terms of updating towards his claim), this is shoddy science, or rather not science at all, just a context-free correlation matrix.
  Anyway, all this is probably more fit for a longer discussion at some point.
  - Thane Ruthenis 27 Dec 2024 16:54 UTC
    9 points
    0
    Parent
    Rohin Shah has already explained the basic reasons why I believe the mesa-optimizer-type search probably won’t exist/be findable in the inner workings of the models we encounter: “Search is computationally inefficient relative to heuristics, and we’ll be selecting really hard on computational efficiency.”
    I think this statement is quite ironic in retrospect, given how OpenAI’s o-series seems to work (at train-time and at inference-time both), and how much AI researchers hype it up.
    By contrast, my understanding is that the sort of search John is talking about retargeting isn’t the brute-force babble-and-prune algorithms, but a top-down heuristical-constraint-based search.
    So it is in fact the ML researchers now who believe in the superiority of the computationally inefficient search; not the agency theorists.
    - Noosphere89 27 Dec 2024 17:27 UTC
      13 points
      4
      Parent
      Re the OpenAI o-series and search, my initial prediction is that Q*/MCTS search will work well on problems that are easy to verify and and easy to get training data for, and not work if either of these 2 conditions are violated, and secondarily will be reliant on the model having good error correction capabilities to use the search effectively, which is why I expect we can make RL capable of superhuman performance on mathematics/programming with some rather moderate schlep/drudge work, and I also expect cost reductions such that it can actually be practical, but I’m only giving a ⁵⁰⁄₅₀ chance by 2028 for superhuman performance as measured by benchmarks in these domains.
      I think my main difference from you, Thane Ruthenis is I expect costs to reduce surprisingly rapidly, though this is admittedly untested.
      This will accelerate AI progress, but not immediately cause an AI explosion, though in the more extreme paces this could create something like a scenario where programming companies are founded by a few people smartly managing a lot of programming AIs, and programming/mathematics experiencing something like what happened to the news industry from the rise of the internet, where there was a lot of bankruptcy of the middle end, the top end won big, and most people are in the bottom end.
      Also, correct point on how a lot of people’s conceptions of search are babble-and-prune, not top down search like MCTS/Q*/BFS/DFS/A* (not specifically targeted at sunwillrise here).
      By contrast, my understanding is that the sort of search John is talking about retargeting isn’t the brute-force babble-and-prune algorithms, but a top-down heuristical-constraint-based search.
      - Thane Ruthenis 27 Dec 2024 17:53 UTC
        9 points
        0
        Parent
        I’m not strongly committed to the view that the costs won’t rapidly reduce: I can certainly see the worlds in which it’s possible to efficiently distill trees-of-thought unrolls into single chains of thoughts. Perhaps it scales iteratively, where we train a ML model to handle the next layer of complexity by generating big ToTs, distilling them into CoTs, then generating the next layer of ToTs using these more-competent CoTs, etc.
        Or perhaps distillation doesn’t work that well, and the training/inference costs grow exponentially (combinatorially?).
        Noosphere89 27 Dec 2024 18:07 UTC
        2 points
        2
        Parent
        Yeah, we will have to wait at least several years.
        
        One confound in all of this is that big talent is moving out of OpenAI, which means I’m more bearish on the company’s future prospects specifically without it being that much of a detriment towards progress towards AGI.
    - Rohin Shah 29 Dec 2024 16:17 UTC
      11 points
      2
      Parent
      I think this statement is quite ironic in retrospect, given how OpenAI’s o-series seems to work
      I stand by my statement and don’t think anything about the o-series model invalidates it.
      And to be clear, I’ve expected for many years that early powerful AIs will be expensive to run, and have critiqued people for analyses that implicitly assumed or implied that the first powerful AIs will be cheap, prior to the o-series being released. (Though unfortunately for the two posts I’m thinking of, I made the critiques privately.)
      There’s a world of difference between “you can get better results by thinking longer” (yeah, obviously this was going to happen) and “the AI system is a mesa optimizer in the strong sense that it has an explicitly represented goal such that you can retarget the search” (I seriously doubt it for the first transformative AIs, and am uncertain for post-singularity superintelligence).
      - Thane Ruthenis 29 Dec 2024 16:42 UTC
        4 points
        0
        Parent
        To lay out my arguments properly:
        “Search is ruinously computationally inefficient” does not work as a counter-argument against the retargetability of search, because the inefficiency argument applies to babble-and-prune search, not to the top-down heuristical-constraint-based search that was/is being discussed.
        There are valid arguments against easily-retargetable heuristics-based search as well (I do expect many learned ML algorithms to be much messier than that). But this isn’t one of them.
        ML researchers are currently incredibly excited about the inference-time scaling laws, talking about inference runs costing millions/billions of dollars, and how much capability will be unlocked this way.
        The o-series paradigm would use this compute to, essentially, perform babble-and-prune search. The pruning would seem to be done by some easily-swappable evaluator (either the system’s own judgement based on the target specified in a prompt, or an external theorem-prover, etc.).
        If things will indeed go this way, then it would seem that a massive amount of capabilities will be based on highly inefficient babble-and-prune search, and that this search would be easily retargetable by intervening on one compact element of the system (the prompt, or the evaluator function).
        Rohin Shah 29 Dec 2024 17:05 UTC
        8 points
        2
        Parent
        Re: (1), if you look through the thread for the comment of mine that was linked above, I respond to top-down heuristical-constraint-based search as well. I agree the response is different and not just “computational inefficiency”.
        Re: (2), I agree that near-future systems will be easily retargetable by just changing the prompt or the evaluator function (this isn’t new to the o-series, you can also “retarget” any LLM chatbot by giving it a different prompt). If this continues to superintelligence, I would summarize it as “it turns out alignment wasn’t a problem” (e.g. scheming never arose, we never had problems with LLMs exploiting systematic mistakes in our supervision, etc). I’d summarize this as “x-risky misalignment just doesn’t happen by default”, which I agree is plausible (see e.g. here), but when I’m talking about the viability of alignment plans like “retarget the search” I generally am assuming that there is some problem to solve.
        (Also, random nitpick, who is talking about inference runs of billions of dollars???)
        Thane Ruthenis 29 Dec 2024 17:39 UTC
        4 points
        0
        Parent
        Yup, I read through it after writing the previous response and now see that you don’t need to be convinced of that point. Sorry about dragging you into this.
        I could nitpick the details here, but I think the discussion has kind of wandered away from any pivotal points of disagreement, plus John didn’t want object-level arguments under this post. So I petition to leave it at that.
        Also, random nitpick, who is talking about inference runs of billions of dollars???
        There’s a log-scaling curve, OpenAI have already spent on the order of a million dollars just to score well on some benchmarks, and people are talking about “how much would you be willing to pay for the proof of the Riemann Hypothesis?”. It seems like a straightforward conclusion that if o-series/inference-time scaling works as well as ML researchers seem to hope, there’d be billion-dollar inference runs funded by some major institutions.
        Rohin Shah 29 Dec 2024 19:07 UTC
        6 points
        2
        Parent
        OpenAI have already spent on the order of a million dollars just to score well on some benchmarks
        Note this is many different inference runs each of which was thousands of dollars. I agree that people will spend billions of dollars on inference in total (which isn’t specific to the o-series of models). My incredulity was at the idea of spending billions of dollars on a single episode, which is what I thought you were talking about given that you were talking about capability gains from scaling up inference-time compute.
  - Noosphere89 27 Dec 2024 15:15 UTC
    2 points
    0
    Parent
    Yeah, it hasn’t been shown that these abstractions can ultimately be retargeted by default for today’s AI.