learn math or hardware
mesaoptimizer
You continue to model OpenAI as this black box monolith instead of trying to unravel the dynamics inside it and understand the incentive structures that lead these things to occur. Its a common pattern I notice in the way you interface with certain parts of reality.
I don’t consider OpenAI as responsible for this as much as Paul Christiano and Jan Leike and his team. Back in 2016 or 2017, when they initiated and led research into RLHF, they focused on LLMs because they expected that LLMs would be significantly more amenable to RLHF. This means that instruction-tuning was the cause of the focus on LLMs, which meant that it was almost inevitable that they’d try instruction-tuning on it, and incrementally build up models that deliver mundane utility. It was extremely predictable that Sam Altman and OpenAI would leverage this unexpected success to gain more investment and translate that into more researchers and compute. But Sam Altman and Greg Brockman aren’t researchers, and they didn’t figure out a path that minimized ‘capabilities overhang’—Paul Christiano did. And more important—this is not mutually exclusive with OpenAI using the additional resources for both capabilities research and (what they call) alignment research. While you might consider everything they do as effectively capabilities research, the point I am making is that this is still consistent with the hypothesis that while they are misguided, they still are roughly doing the best they can given their incentives.
What really changed my perspective here was the fact that Sam Altman seems to have been systematically destroying extremely valuable information about how we could evaluate OpenAI. Specifically, this non-disparagement clause that ex-employees cannot even mention without falling afoul of this contract, is something I didn’t expect (I did expect non-disclosure clauses but not something this extreme). This meant that my model of OpenAI was systematically too optimistic about how cooperative and trustworthy they are and will be in the future. In addition, if I was systematically deceived about OpenAI due to non-disparagement clauses that cannot even be mentioned, I would expect that something similar to also be possible when it comes to other frontier labs (especially Anthropic, but also DeepMind) due to things similar to this non-disparagement clause. In essence, I no longer believe that Sam Altman (for OpenAI is nothing but his tool now) is doing the best he can to benefit humanity given his incentives and constraints. I expect that Sam Altman is entirely doing whatever he believes will retain and increase his influence and power, and this includes the use of AGI, if and when his teams finally achieve that level of capabilities.
This is the update I expect people are making. It is about being systematically deceived at multiple levels. It is not about “OpenAI being irresponsible”.
I still parse that move as devastating the commons in order to make a quick buck.
I believe that ChatGPT was not released with the expectation that it would become as popular as it did. OpenAI pivoted hard when it saw the results.
Also, I think you are misinterpreting the sort of ‘updates’ people are making here.
I mean, if Paul doesn’t confirm that he is not under any non-disparagement obligations to OpenAI like Cullen O’ Keefe did, we have our answer.
In fact, given this asymmetry of information situation, it makes sense to assume that Paul is under such an obligation until he claims otherwise.
I just realized that Paul Christiano and Dario Amodei both probably have signed non-disclosure + non-disparagement contracts since they both left OpenAI.
That impacts how I’d interpret Paul’s (and Dario’s) claims and opinions (or the lack thereof), that relates to OpenAI or alignment proposals entangled with what OpenAI is doing. If Paul has systematically silenced himself, and a large amount of OpenPhil and SFF money has been mis-allocated because of systematically skewed beliefs that these organizations have had due to Paul’s opinions or lack thereof, well. I don’t think this is the case though—I expect Paul, Dario, and Holden all seem to have converged on similar beliefs (whether they track reality or not) and have taken actions consistent with those beliefs.
If your endgame strategy involved relying on OpenAI, DeepMind, or Anthropic to implement your alignment solution that solves science / super-cooperation / nanotechnology, consider figuring out another endgame plan.
I love the score of this comment as of writing: −1 karma points, 23 agree points.
I think it is useful for someone to tap me on the shoulder and say “Hey, this information you are consuming, its from <this source that you don’t entirely trust and have a complex causal model of>”.
Enforcing social norms to prevent scapegoating also destroys information that is valuable for accurate credit assignment and causally modelling reality. I haven’t yet found a third alternative, and until then, I’d recommend people both encourage and help people in their community to not scapegoat or lose their minds in ‘tribal instincts’ (as you put it), while not throwing away valuable information.
You can care about people while also seeing their flaws and noticing how they are hurting you and others you care about.
- May 18, 2024, 9:06 PM; 3 points) 's comment on Stephen Fowler’s Shortform by (
Similarly, governmental institutions have institutional memories with the problems of major historical fuckups, in a way that new startups very much don’t.
On the other hand, institutional scars can cause what effectively looks like institutional traumatic responses, ones that block the ability to explore and experiment and to try to make non-incremental changes or improvements to the status quo, to the system that makes up the institution, or to the system that the institution is embedded in.
There’s a real and concrete issue with the amount of roadblocks that seem to be in place to prevent people from doing things that make gigantic changes to the status quo. Here’s a simple example: would it be possible for people to get a nuclear plant set up in the United States within the next decade, barring financial constraints? Seems pretty unlikely to me. What about the FDA response to the COVID crisis? That sure seemed like a concrete example of how ‘institutional memories’ serve as gigantic roadblocks to the ability for our civilization to orient and act fast enough to deal with the sort of issues we are and will be facing this century.
In the end, capital flows towards AGI companies for the sole reason that it is the least bottlenecked / regulated way to multiply your capital, that seems to have the highest upside for the investors. If you could modulate this, you wouldn’t need to worry about the incentives and culture of these startups as much.
I had the impression that SPAR was focused on UC Berkeley undergrads and had therefore dismissed the idea of being a SPAR mentor or mentee. It was only recently that I looked at the website when someone mentioned that they wanted to learn from this one SPAR mentor, and then I looked at the website, and SPAR now seems to focus on the same niche as AI Safety Camp.
Did SPAR pivot in the past six months, or did I just misinterpret SPAR when I first encountered it?
Sort-of off-topic, so feel free to maybe move this comment elsewhere.
I’m quite surprised to see that you have just shipped an MSc thesis, because I didn’t expect you to be doing an MSc (or anything in traditional academia). I didn’t think you needed one, since I think you have enough career capital to continue to work indefinitely on the things you want to work on and get paid well for it. I also assumed that you might find academia somewhat a waste of your time in comparison to doing stuff you wanted to do.
Perhaps you could help clarify what I’m missing?
fiber at Tata Industries in Mumbai
Could you elaborate on how Tata Industries is relevant here? Based on a DDG search, the only news I find involving Tata and AI infrastructure is one where a subsidiary named TCS is supposedly getting into the generative AI gold rush.
My thought is that I don’t see why a pivotal act needs to be that.
Okay. Why do you think Eliezer proposed that, then?
Note that I agree with your sentiment here, although my concrete argument is basically what LawrenceC wrote as a reply to this post.
Ryan, this is kind of a side-note but I notice that you have a very Paul-like approach to arguments and replies on LW.
Two things that come to notice:
You have a tendency to reply to certain posts or comments with “I don’t quite understand what is being said here, and I disagree with it.” or, “It doesn’t track with my views”, or equivalent replies that seem not very useful for understanding your object level arguments. (Although I notice that in the recent comments I see, you usually postfix it with some elaboration on your model.)
In the comment I’m replying to, you use a strategy of black-box-like abstraction modeling of a situation to try to argue for a conclusion, one that usually involves numbers such as multipliers or percentages. (I have the impression that Paul uses this a lot, and one concrete example that comes to mind is the takeoff speeds essay. I usually consider such arguments invalid when they seem to throw away information we already have, or seem to use a set of abstractions that don’t particularly feel appropriate to the information I believe we have.
I just found this interesting and plausible enough to highlight to you. Its a moderate investment of my time to find out examples from your comment history to highlight all these instances, but writing this comment still seemed valuable.
This is a really well-written response. I’m pretty impressed by it.
If your acceptable lower limit for basically anything is zero you wont be allowed to do anything, really anything. You have to name some quantity of capabilities progress that’s okay to do before you’ll be allowed to talk about AI in a group setting.
Okay I just read the entire thing. Have you looked at Eric Drexler’s CAIS proposal? It seems to have played some role as the precursor to the davidad / Evan OAA proposal, and has involved the use of composable narrow AI systems.
but I’m a bit disappointed that x-risk-motivated researchers seem to be taking the “safety”/”harm” framing of refusals seriously
I’d say a more charitable interpretation is that it is a useful framing: both in terms of a concrete thing one could use as scaffolding for alignment-as-defined-by-Zack research progress, and also a thing that is financially advantageous to focus on since frontier labs are strongly incentivized to care about this.
Haven’t read the entire post, but my thoughts on seeing the first image: Pretty sure this is priced into Anthropic / Redwood / OpenAI cluster of strategies where you use an aligned boxed (or ’mostly aligned) generative LLM-style AGI to help you figure out what to do next.
Wasn’t edited, based on my memory.