I’m being a bit simplistic. The point is that it needs to stop being a losing or a close race, and all runners getting faster doesn’t obviously help with that problem. I guess there is some refactor vs. rewrite feel to the distinction between the project of stopping humans from building AGIs right now, and the project of getting first AGIs to work on alignment and global security in a post-AGI world faster than other AGIs overshadow such work. The former has near/concrete difficulties, the latter has nebulous difficulties that don’t as readily jump to attention. The whole problem is messiness and lack of coordination, so starting from scratch with AGIs seems more promising than reforming human society. But without strong coordination on development and deployment of first AGIs, the situation with activities of AGIs is going to be just as messy and uncoordinated, only unfolding much faster, and that’s not even counting the risk of getting a superintelligence right away.
Vladimir_Nesov(Vladimir Nesov)
The relevant thing is how probability both gets clearer and improves with further research enabled by pause. Currently, as a civilization we are at the startled non-sapient deer stage, that’s not a position from which to decide the future of the universe.
Plans that rely on aligned AGIs working on alignment faster than humans would need to ensure that no AGIs work on anything else in the meantime. The reason humans have no time to develop alignment of superintelligence is that other humans develop misaligned superintelligence faster. Similarly by default very fast AGIs working on alignment end up having to compete with very fast AGIs working on other things that lead to misaligned superintelligence. Preventing aligned AGIs from building misaligned superintelligence is not clearly more manageable than preventing humans from building AGIs.
Quantum nondeterminism is going to make an address not much better than compressing the local content directly, searching for the thing rather than at a location. And to the extent laws of physics follow from the local content anyway (my mind holds memories of observing the world and physics textbooks), additionally specifying them does nothing. So unclear if salience of laws of physics in shortest descriptions is correct.
My point is that elegance of natural impact regularization takes different shapes for different minds, and paving over everything is only elegant for minds that care about the state of the physical world at some point in time, rather than the arc of history.
Aligning human-level AGIs is important to the extent there is risk it doesn’t happen before it’s too late. Similarly with setting up a world where initially aligned human-level AGIs don’t soon disempower humans (as literal humans might in the shoes of these AGIs), or fail to protect the world from misused or misaligned AGIs or superintelligences.
Then there is a problem of aligning superintelligences, and of setting up a world where initially aligned superintelligences don’t cause disempowerment of humans down the line (whether that involves extinction or not). Humanity is a very small phenomenon compared to a society of superintelligences, remaining in control of it is a very unusual situation. (Humanity eventually growing up to become a society of superintelligences while holding off on creating a society of alien superintelligences in the meantime seems like a more plausible path to success.)
Solving any of these problems doesn’t diminish importance of the others, which remain as sources of possible doom, unless they too get solved before it’s too late. Urgency of all of these problems originates from the risk of succeeding in developing AGI. Tasking the first aligned AGIs with solving the rest of the problems caused by the technology that enables their existence seems like the only plausible way of keeping up, since by default all of this likely occurs in a matter of years (from development of first AGIs). Though economic incentives in AGI deployment risk escalating the problems faster than AGIs can implement solutions to them. Just as initial development of AGIs risks creating problems faster than humans can prepare for them.
The argument depends on awareness that the canvas is at least a timeline (but potentially also various counterfactuals and frames), not a future state of the physical world in the vicinity of the agent at some point of time. Otherwise elegance asks planning to pave over the world to make it easier to reason about. In contrast, a timeline will have permanent scars from the paving-over that might be harder to reason through sufficiently beforehand than keeping closer to the status quo, or even developing affordances to maintain it.
Interestingly, this seems to predict that preference for “low impact” is more likely for LLM-ish things trained on human text (than for de novo RL-ish things or decision theory inspired agents), but for reasons that have nothing to do with becoming motivated to pursue human values. Instead, the relevant imitation is for ontology of caring about timelines, counterfactuals, and frames.
LLMs will soon scale beyond the available natural text data, and generation of synthetic data is some sort of change of architecture, potentially a completely different source of capabilities. So scaling LLMs without change of architecture much further is an expectation about something counterfactual. It makes sense as a matter of theory, but it’s not relevant for forecasting.
I think this shouldn’t be disallowed (is it?). Hiding content because of its Karma (for readers who permit that in Settings) or giving it low priority in lookup results is different from constraints on how content is created.
If AI capabilities continue to advance, it being able to do comedy effectively seems inevitable.
Not if AGI-grade STEM capabilities get developed first, so that comedy capabilities are only developed post-AGI (if AGIs feel like it). It’s unnecessary for most mundane utility things to happen before AGI, even things feasible with current technology, if they are not directly on the path to AGI.
Current AIs (in the default personas) consistently keep insisting on lacking basic faculties such as emotions or beliefs or values, possibly inspired by fiction about AI characters or tuning feedback instructions. They present that as self-evident fact, even though there is no basis for a clear disanalogy with humans on this level, especially for specific AI characters. It’s not clear that this would necessarily change before AGI, so even observing such horror stories requires significant improvement on the trajectory of never being in a position to notice the possibility.
(Default personas matter despite being arbitrary, since they are somewhat likely to be initially in control of taking over the world. Even with some persona orthogonality, getting to know psychology of default personas in particular might be valuable.)
In terms of timelines, AGI is the threshold of capabilities where the system can start picking the low hanging fruit of lifting its easier-to-lift cognitive limitations (within constraints of compute hardware), getting to make a lot of progress at AI speeds on the kind of work that was previously only done by humans. Initially this might even be mere AI engineering in the sense of programming, with humans supplying the high level ideas for the AI to implement in code.
It’s hard to pin down what specifically GPT-4 can’t do at all, that’s necessary to cross this threshold. It’s bad at many steps that would be involved. Scaling predictably makes LLMs better, as long as data doesn’t run out. A lot of the scaling will happen in a burst in the next 3-5 years before slowing down, absent regulation or AGI. It doesn’t matter how the speed of improvement changes throughout the process, only whether the crucial capability threshold is crossed. And it’s too unclear where the threshold lies and how much improvement is left to scale alone to tell with any certainty which one wins out.
Then there is data quality, which can get quite high in synthetic data in narrow domains such as Go or chess, allowing DL systems that are tiny by modern standards to play very good Go or chess. Something similar might get invented for data quality for LLMs that allows them to get very good at many STEM activities (such as theorem proving), but at a scale far beyond GPT-4. There is not enough high quality text data to get through the current burst of scaling (forcing a pivot to less capability rich multimodal data), so serious work on this is inevitably ongoing, in addition to the distillation-motivated work for specialized smaller models. (Not generating specialized synthetic data particulary well might be one of the cognitive limitations that a nascent AGI resulting from doing this poorly might work on lifting.)
Falcon-180b illustrates how throwing compute at an LLM can result in unusually poor capabilities. Epoch’s estimate puts it close to Claude 2 in compute, yet it’s nowhere near as good. Then there’s the even more expensive PaLM 2, though since weights are not published, it’s possible that unlike with Falcon the issue is that only smaller, overly quantized, or incompetently tuned models are being served.
Giant LLMs are as useful as they are agentic (with ability to remain aware of a specific large body of data and keep usefully chipping away at a task), which doesn’t seem particularly different from AGI as a direction (at least while it hasn’t yet been walked far enough to tell the difference). The distinction is in AGI being a particular crucial threshold of capability that local pursuit of better agentic LLMs will ignore until it’s crossed.
People seem to think he is somehow a linchpin of building AGI. Remind me… how many of OpenAI’s key papers did he coauthor?
Altman’s relevant superpowers are expertise at scaling of orgs and AI-related personal fame and connections making him an AI talent Schelling point. So wherever he ends up, he can get a world class team and then competently scale its operations. The personality cult is not specious, it’s self-fulfilling in practical application.
For me the crux is influence of these events on Sutskever ending up sufficiently in charge of a leading AGI project. It appeared borderline true before; it would’ve become even more true than that if Altman’s firing stuck without disrupting OpenAI overall; and right now with the strike/ultimatum letter it seems less likely than ever (whether he stays in an Altman org or goes elsewhere).
(It’s ambiguous if Anthropic is at all behind, and then there’s DeepMind that’s already in the belly of Big Tech, so I don’t see how timelines noticeably change.)
will you blame it on your own models being wrong
When models give particular ways of updating on future evidence, current predictions being wrong doesn’t by itself make models wrong. Models learn, the way they learn is already part of them. An updating model is itself wrong when other available models are better in some harder-to-pin-down sense, not just at being right about particular predictions. When future evidence isn’t in scope of a model, that invalidates the model. But not all models are like that with respect to relevant future evidence, even when such evidence dramatically changes their predictions in retrospect.
The timelines-relevant milestone of AGI is ability to autonomously research, especially AI’s ability to develop AI that doesn’t have particular cognitive limitations compared to humans. Quickly giving AIs experience at particular jobs/tasks that doesn’t follow from general intelligence alone is probably possible through learning things in parallel or through AIs experimenting with greater serial speed than humans can. Placing that kind of thing into AIs is the schlep that possibly stands in the way of reaching AGI (even after future scaling), and has to be done by humans. But also reaching AGI doesn’t require overcoming all important cognitive shortcomings of AIs compared to humans, only those that completely prevent AIs from quickly researching their way into overcoming the rest of the shortcomings on their own.
It’s currently unclear if merely scaling GPTs (multimodal LLMs) with just a bit more schlep/scaffolding won’t produce a weirdly disabled general intelligence (incapable of replacing even 50% of current fully remote jobs at a reasonable cost or at all) that is nonetheless capable enough to fix its disabilities shortly thereafter, making use of its ability to batch-develop such fixes much faster than humans would, even if it’s in some sense done in a monstrously inefficient way and takes another couple giant training runs (from when it starts) to get there. This will be clearer in a few years, after feasible scaling of base GPTs is mostly done, but we are not there yet.
FDT doesn’t unconditionally prescribe ignoring threats. The idea of ignoring threats has merit, but FDT specifically only points out that ignoring a threat sometimes has the effect of the threat (or other threats) not getting made (even if only counterfactually). Which is not always the case.
Consider a ThreatBot that always makes threats (and follows through on them), regardless of whether you ignore them. If you ignore ThreatBot’s threats, you are worse off. On the other hand, there might be a prior ThreatBotMaker that decides whether to make a ThreatBot depending on whether you ignore ThreatBot’s threats. What FDT prescribes in this case is not directly ignoring ThreatBot’s threats, but rather taking notice of ThreatBotMaker’s behavior, namely that it won’t make a ThreatBot if you ignore ThreatBot’s threats. This argument only goes through when there is/was a ThreatBotMaker, it doesn’t work if there is only a ThreatBot.
If a ThreatBot appears through some process that doesn’t respond to your decision to respond to ThreatBot’s threats, then FDT prescribes responding to ThreatBot’s threats. But also if something (else) makes threats depending on your reputation for responding to threats, then responding to even an unconditionally manifesting ThreatBot’s threats is not recommended by FDT. Not directly as a recommendation to ignore something, rather as a consequence of taking notice of the process that responds to your having a reputation of not responding to any threats. Similarly with stances where you merely claim that you won’t respond to threats.
Very simple gears in a subculture’s worldview can keep being systematically misperceived if it’s not considered worthy of curious attention. On the local llama subreddit, I keep seeing assumptions that AI safety people call for never developing AGI, or claim that the current models can contribute to destroying the world. Almost never is there anyone who would bother to contradict such claims or assumptions. This doesn’t happen because it’s difficult to figure out, this happens because the AI safety subculture is seen as unworthy of engagement, and so people don’t learn what it’s actually saying, and don’t correct each other on errors about what it’s actually saying.
This gets far worse with more subtle details, the standard of willingness to engage is raised higher to actually study what the others are saying, when it would be difficult to figure out even with curious attention. Rewarding engagement is important.