As someone that does think about a lot of the things you care about at least some of the time (and does care pretty deeply), I can speak for myself why I don’t talk about these things too much:
Epistemic problems:
Mostly, the concept of “metaphilosophy” is so hopelessly broad that you kinda reach it by definition by thinking about any problem hard enough. This isn’t a good thing, when you have a category so large it contains everything (not saying this applies to you, but it applies to many other people I have met who talked about metaphilosophy), it usually means you are confused.
Relatedly, philosophy is incredibly ungrounded and epistemologically fraught. It is extremely hard to think about these topics in ways that actually eventually cash out into something tangible, rather than nerdsniping young smart people forever (or until they run out of funding).
Further on that, it is my belief that good philosophy should make you stronger, and this means that fmpov a lot of the work that would be most impactful for making progress on metaphilosophy does not look like (academic) philosophy, and looks more like “build effective institutions and learn interactively why this is hard” and “get better at many scientific/engineering disciplines and build working epistemology to learn faster”. Humans are really, really bad at doing long chains of abstract reasoning without regular contact with reality, so in practice imo good philosophy has to have feedback loops with reality, otherwise you will get confused. I might be totally wrong, but I expect at this moment in time me building a company is going to help me deconfuse a lot of things about philosophy more than me thinking about it really hard in isolation would.
It is not clear to me that there even is an actual problem to solve here. Similar to e.g. consciousness, it’s not clear to me that people who use the word “metaphilosophy” are actually pointing to anything coherent in the territory at all, or even if they are, that it is a unique thing. It seems plausible that there is no such thing as “correct” metaphilosophy, and humans are just making up random stuff based on our priors and environment and that’s it and there is no “right way” to do philosophy, similar to how there are no “right preferences”. I know the other view ofc and still worth engaging with in case there is something deep and universal to be found (the same way we found that there is actually deep equivalency and “correct” ways to think about e.g. computation).
Practical problems:
I have short timelines and think we will be dead if we don’t make very rapid progress on extremely urgent practical problems like government regulation and AI safety. Metaphilosophy falls into the unfortunate bucket of “important, but not (as) urgent” in my view.
There are no good institutions, norms, groups, funding etc to do this kind of work.
It’s weird. I happen to have a very deep interest in the topic, but it costs you weirdness points to push an idea like this when you could instead be advocating more efficiently for more pragmatic work.
It was interesting to read about your successive jumps up the meta hierarchy, because I had a similar path, but then I “jumped back down” when I realized that most of the higher levels is kinda just abstract, confusing nonsense and even really “philosophically concerned” communities like EA routinely fail basic morality such as “don’t work at organizations accelerating existential risk” and we are by no means currently bottlenecked by not having reflectively consistent theories of anthropic selection or whatever. I would like to get to a world where we have bottlenecks like that, but we are so, so far away from a world where that kind of stuff is why the world goes bad that it’s hard to justify more than some late night/weekend thought on the topic in between a more direct bottleneck focused approach.
All that being said, I still am glad some people like you exist, and if I could make your work go faster, I would love to do so. I wish I could live in a world where I could justify working with you on these problems full time, but I don’t think I can convince myself this is actually the most impactful thing I could be doing at this moment.
I expect at this moment in time me building a company is going to help me deconfuse a lot of things about philosophy more than me thinking about it really hard in isolation would
Hard for me to make sense of this. What philosophical questions do you think you’ll get clarity on by doing this? What are some examples of people successfully doing this in the past?
It seems plausible that there is no such thing as “correct” metaphilosophy, and humans are just making up random stuff based on our priors and environment and that’s it and there is no “right way” to do philosophy, similar to how there are no “right preferences”.
Definitely a possibility (I’ve entertained it myself and maybe wrote some past comments along these lines). I wish there was more people studying this possibility.
I have short timelines and think we will be dead if we don’t make very rapid progress on extremely urgent practical problems like government regulation and AI safety. Metaphilosophy falls into the unfortunate bucket of “important, but not (as) urgent” in my view.
Everyone dying isn’t the worst thing that could happen. I think from a selfish perspective, I’m personally a bit more scared of surviving into a dystopia powered by ASI that is aligned in some narrow technical sense. Less sure from an altruistic/impartial perspective, but it seems at least plausible that building an aligned AI without making sure that the future human-AI civilization is “safe” is a not good thing to do.
I would say that better philosophy/arguments around questions like this is a bottleneck. One reason for my interest in metaphilosophy that I didn’t mention in the OP is that studying it seems least likely to cause harm or make things worse, compared to any other AI related topics I can work on. (I started thinking this as early as 2012.) Given how much harm people have done in the name of good, maybe we should all take “first do no harm” much more seriously?
There are no good institutions, norms, groups, funding etc to do this kind of work.
Which also represents an opportunity...
It’s weird. I happen to have a very deep interest in the topic, but it costs you weirdness points to push an idea like this when you could instead be advocating more efficiently for more pragmatic work.
Is it actually that weird? Do you have any stories of trying to talk about it with someone and having that backfire on you?
Hard for me to make sense of this. What philosophical questions do you think you’ll get clarity on by doing this? What are some examples of people successfully doing this in the past?
The fact you ask this question is interesting to me, because in my view the opposite question is the more natural one to ask: What kind of questions can you make progress on without constant grounding and dialogue with reality? This is the default of how we humans build knowledge and solve hard new questions, the places where we do best and get the least drawn astray is exactly those areas where we can have as much feedback from reality in as tight loops as possible, and so if we are trying to tackle ever more lofty problems, it becomes ever more important to get exactly that feedback wherever we can get it! From my point of view, this is the default of successful human epistemology, and the exception should be viewed with suspicion.
And for what it’s worth, acting in the real world, building a company, raising money, debating people live, building technology, making friends (and enemies), absolutely helped me become far, far less confused, and far more capable of tackling confusing problems! Actually testing my epistemology and rationality against reality, and failing (a lot), has been far more helpful for deconfusing everything from practical decision making skills to my own values than reading/thinking could have ever been in the same time span. There is value in reading and thinking, of course, but I was in a severe “thinking overhang”, and I needed to act in the world to keep learning and improving. I think most people (especially on LW) are in an “action underhang.”
“Why do people do things?” is an empirical question, it’s a thing that exists in external reality, and you need to interact with it to learn more about it. And if you want to tackle even higher level problems, you need to have even more refined feedback. When a physicist wants to understand the fundamentals of reality, they need to set up insane crazy particle accelerators and space telescopes and supercomputers and what not to squeeze bits of evidence out of reality and actually ground whatever theoretical musings they may have been thinking about. So if you want to understand the fundamentals of philosophy and the human condition, by default I expect you are going to need to do the equivalent kind of “squeezing bits out of reality”, by doing hard things such as creating institutions, building novel technology, persuading people, etc. “Building a company” is just one common example of a task that forces you to interact a lot with reality to be good.
Fundamentally, I believe that good philosophy should make you stronger and allow you to make the world better, otherwise, why are you bothering? If you actually “solve metaphilosophy”, I think the way this should end up looking is that you can now do crazy things. You can figure out new forms of science crazy fast, you can persuade billionaires to support you, you can build monumental organizations that last for generations. Or, in reverse, I expect that if you develop methods to do such impressive feats, you will necessarily have to learn deep truths about reality and the human condition, and acquire the skills you will need to tackle a task as heroic as “solving metaphilosophy.”
Everyone dying isn’t the worst thing that could happen. I think from a selfish perspective, I’m personally a bit more scared of surviving into a dystopia powered by ASI that is aligned in some narrow technical sense. Less sure from an altruistic/impartial perspective, but it seems at least plausible that building an aligned AI without making sure that the future human-AI civilization is “safe” is a not good thing to do.
I think this grounds out into object level disagreements about how we expect the future to go, probably. I think s-risks are extremely unlikely at the moment, and when I look at how best to avoid them, most such timelines don’t go through “figure out something like metaphilosophy”, but more likely through “just apply bog standard decent humanist deontological values and it’s good enough.” A lot of the s-risk in my view comes from the penchant for maximizing “good” that utilitarianism tends to promote, if we instead aim for “good enough” (which is what most people tend to instinctively favor), that cuts off most of the s-risk (though not all).
To get to the really good timelines, that route through “solve metaphilosophy”, there are mandatory previous nodes such as “don’t go extinct in 5 years.” Buying ourselves more time is powerful optionality, not just for concrete technical work, but also for improving philosophy, human epistemology/rationality, etc.
I don’t think I see a short path to communicating the parts of my model that would be most persuasive to you here (if you’re up for a call or irl discussion sometime lmk), but in short I think of policy, coordination, civilizational epistemology, institution building and metaphilosophy as closely linked and tractable problems, if only it wasn’t the case that there was a small handful of AI labs (largely supported/initiated by EA/LW-types) that are deadset on burning the commons as fast as humanly possible. If we had a few more years/decades, I think we could actually make tangible and compounding progress on these problems.
I would say that better philosophy/arguments around questions like this is a bottleneck. One reason for my interest in metaphilosophy that I didn’t mention in the OP is that studying it seems least likely to cause harm or make things worse, compared to any other AI related topics I can work on. (I started thinking this as early as 2012.) Given how much harm people have done in the name of good, maybe we should all take “first do no harm” much more seriously?
I actually respect this reasoning. I disagree strategically, but I think this is a very morally defensible position to hold, unlike the mental acrobatics necessary to work at the x-risk factories because you want to be “in the room”.
Which also represents an opportunity...
It does! If I was you, and I wanted to push forward work like this, the first thing I would do is build a company/institution! It will both test your mettle against reality and allow you to build a compounding force.
Is it actually that weird? Do you have any stories of trying to talk about it with someone and having that backfire on you?
Yup, absolutely. If you take even a microstep outside of the EA/rat-sphere, these kind of topics quickly become utterly alien to anyone. Try explaining to a politician worried about job loss, or a middle aged housewife worried about her future pension, or a young high school dropout unable to afford housing, that actually we should be worried about whether we are doing metaphilosophy correctly to ensure that future immortal superintelligence reason correctly about acausal alien gods from math-space so they don’t cause them to torture trillions of simulated souls! This is exaggerated for comedic effect, but this is really what even relatively intro level LW philosophy by default often sounds like to many people!
As the saying goes, “Grub first, then ethics.” (though I would go further and say that people’s instinctive rejection of what I would less charitably call “galaxy brain thinking” is actually often well calibrated)
You raised a very interesting point in the last comment, that metaphilosophy already encompasses everything, that we could conceive of at least.
So a ‘solution’ is not tractable due to various well known issues such as the halting problem and so on. (Though perhaps in the very distant future this could be different.)
However this leads to a problem, as exemplified by your phrasing here:
Fundamentally, I believe that good philosophy should make you stronger and allow you to make the world better, otherwise, why are you bothering …
‘good philosophy’ is not a sensible category since you already know you have not, and cannot, ‘solve’ metaphilosophy. Nor can any other LW reader do so.
‘good’ or ‘bad’ in real practice are, at best, whatever the popular consensus is in the present reality, at worst, just someone’s idiosyncratic opinions.
Very few concepts are entirely independent from any philosophical or metaphilosophical implications whatsoever, and ‘good philosophy’ is not one of them.
But you still felt a need to attach these modifiers, due to a variety of reasons well analyzed on LW, so the pretenseof a solved or solvable metaphilosophy is still needed for this part of the comment to make sense.
I don’t want to single out your comment too much though, since it’s just the most convenient example, this applies to most LW comments.
i.e. If everyone actually accepted the point, which I agree with, I dare say a huge chunk of LW comments are close to meaningless from a formal viewpoint, or at least very open to interpretation by anyone who isn’t immersed in 21st century human culture.
“good” always refers to idiosyncratic opinions, I don’t really take moral realism particularly seriously. I think there is “good” philosophy in the same way there are “good” optimization algorithms for neural networks, while also I assume there is no one optimizer that “solves” all neural network problems.
‘”good” optimization algorithms for neural networks’ also has no difference in meaning from ‘”glorxnag” optimization algorithms for neural networks’, or any random permutation, if your prior point holds.
I don’t understand what point you are trying to make, to be honest. There are certain problems that humans/I care about that we/I want NNs to solve, and some optimizers (e.g. Adam) solve those problems better or more tractably than others (e.g. SGD or second order methods). You can claim that the “set of problems humans care about” is “arbitrary”, to which I would reply “sure?”
Similarly, I want “good” “philosophy” to be “better” at “solving” “problems I care about.” If you want to use other words for this, my answer is again “sure?” I think this is a good use of the word “philosophy” that gets better at what people actually want out of it, but I’m not gonna die on this hill because of an abstract semantic disagreement.
That’s the thing, there is no definable “set of problems humans care about” without some kind of attached or presumed metaphilosophy,at least none that you, or anyone, could possibly figure out in the foreseeable future and prove to a reasonable degree of confidence to the LW readerbase.
It’s not even ‘arbitrary’, that string of letters is indistinguishable from random noise.
i.e. Right now your first paragraph is mostly meaningless if read completely literally and by someone who accepts the claim. Such a hypothetical person would think you’ve gone nuts because it would appear like you took a well written comment and inserted strings of random keyboard bashing in the middle.
Of course it’s unlikely that someone would be so literal minded, and so insistent on logical correctness, that they would completely equate it with random bashing of a keyboard. But it’s possible some portion of readers lean towards that.
It seems plausible that there is no such thing as “correct” metaphilosophy, and humans are just making up random stuff based on our priors and environment and that’s it and there is no “right way” to do philosophy, similar to how there are no “right preferences”
If this is true, doesn’t this give us more reason to think metaphilosophy work is counterfactually important, i.e., can’t just be delegated to AIs? Maybe this isn’t what Wei Dai is trying to do, but it seems like “figure out which approaches to things (other than preferences) that don’t have ‘right answers’ we [assuming coordination on some notion of ‘we’] endorse, before delegating to agents smarter than us” is time-sensitive, and yet doesn’t seem to be addressed by mainstream intent alignment work AFAIK.
(I think one could define “intent alignment” broadly enough to encompass this kind of metaphilosophy, but I smell a potential motte-and-bailey looming here if people want to justify particular research/engineering agendas labeled as “intent alignment.”)
I think this is not an unreasonable position, yes. I expect the best way to achieve this would be to make global coordination and epistemology better/more coherent...which is bottlenecked by us running out of time, hence why I think the pragmatic strategic choice is to try to buy us more time.
One of the ways I can see a “slow takeoff/alignment by default” world still going bad is that in the run-up to takeoff, pseudo-AGIs are used to hypercharge memetic warfare/mutation load to a degree basically every living human is just functionally insane, and then even an aligned AGI can’t (and wouldn’t want to) “undo” that.
which is bottlenecked by us running out of time, hence why I think the pragmatic strategic choice is to try to buy us more time.
What are you proposing or planning to do to achieve this? I observe that most current attempts to “buy time” seem organized around convincing people that AI deception/takeover is a big risk and that we should pause or slow down AI development or deployment until that problem is solved, for example via intent alignment. But what happens if AI deception then gets solved relatively quickly (or someone comes up with a proposed solution that looks good enough to decision makers)? And this is another way that working on alignment could be harmful from my perspective...
I see regulation as the most likely (and most accessible) avenue that can buy us significant time. The fmpov obvious is just put compute caps in place, make it illegal to do training runs above a certain FLOP level. Other possibilities are strict liability for model developers (developers, not just deployers or users, are held criminally liable for any damage caused by their models), global moratoria, “CERN for AI” and similar. Generally, I endorse the proposals here.
None of these are easy, of course, there is a reason my p(doom) is high.
But what happens if AI deception then gets solved relatively quickly (or someone comes up with a proposed solution that looks good enough to decision makers)? And this is another way that working on alignment could be harmful from my perspective...
Of course if a solution merely looks good, that will indeed be really bad, but that’s the challenge of crafting and enforcing sensible regulation.
I’m not sure I understand why it would be bad if it actually is a solution. If we do, great, p(doom) drops because now we are much closer to making aligned systems that can help us grow the economy, do science, stabilize society etc. Though of course this moves us into a “misuse risk” paradigm, which is also extremely dangerous.
In my view, this is just how things are, there are no good timelines that don’t route through a dangerous misuse period that we have to somehow coordinate well enough to survive. p(doom) might be lower than before, but not by that much, in my view, alas.
I’m not sure I understand why it would be bad if it actually is a solution. If we do, great, p(doom) drops because now we are much closer to making aligned systems that can help us grow the economy, do science, stabilize society etc. Though of course this moves us into a “misuse risk” paradigm, which is also extremely dangerous.
I prefer to frame it as human-AI safety problems instead of “misuse risk”, but the point is that if we’re trying to buy time in part to have more time to solve misuse/human-safety (e.g. by improving coordination/epistemology or solving metaphilosophy), but the strategy for buying time only achieves a pause until alignment is solved, then the earlier alignment is solved, the less time we have to work on misuse/human-safety.
Sure, it’s not a full solution, it just buys us some time, but I think it would be a non-trivial amount, and let not perfect be the enemy of good and what not.
A lot of the debate surrounding existential risks of AI is bounded by time. For example, if someone said a meteor is about to hit the Earth that would be alarming, but the next question should be, “How much time before impact?” The answer to that question effects everything else.
If they say, “30 seconds”. Well, there is no need to go online and debate ways to save ourselves. We can give everyone around us a hug and prepare for the hereafter. However, if the answer is “30 days” or “3 years” then those answers will generate very different responses.
The AI alignment question is extremely vague as it relates to time constraints. If anyone is investing a lot energy in “buying us time” they must have a time constraint in their head otherwise they wouldn’t be focused on extending the timeline. And yet—I don’t see much data on bounded timelines within which to act. It’s just assumed that we’re all in agreement.
It’s also hard to motivate people to action if they don’t have a timeline.
So what is the timeline? If AI is on a double exponential curve we can do some simple math projections to get a rough idea of when AI intelligence is likely to exceed human intelligence. Presumably, superhuman intelligence could present issues or at the very least be extremely difficult to align.
Suppose we assume that GPT-4 follows a single exponential curve with an initial IQ of 124 and a growth factor of 1.05 per year. This means that its IQ increases by 5% every year. Then we can calculate its IQ for the next 7 years using the formula.
y = 124 * 1.05^x
where x is the number of years since 2023. The results are shown in Table 1.
Table 1: IQ of GPT-4 following a single exponential curve.
Now suppose we assume that GPT-4 follows a double exponential curve with an initial IQ of 124 and growth constants of b = c = 1.05 per year. This means that its IQ doubles every time it increases by 5%. Then we can calculate its IQ for the next 7 years using the formula
y = 124 * (1.05)((1.05)x)
where x is the number of years since 2023. The results are shown in Table 2.
Table 2: IQ of GPT-4 following a double exponential curve.
Clearly whether we’re on a single or double exponential curve dramatically effects the timeline. If we’re on a single exponential curve we might have 7-10 years. If we’re on a double exponential curve then we likely have 3 years. Sometime around 2026 − 2027 we’ll see systems smarter than any human.
Many people believe AI is on a double exponential curve. If that’s the case then efforts to generate movement in Congress will likely fail due to time constraints. The is amplified by the fact that many in Congress are older and not computer savvy. Does anyone believe Joe Biden or Donald Trump are going to spearhead regulations to control AI before it reaches superhuman levels on a double exponential curve? In my opinion, those odds are super low.
I feel like Connor’s effort make perfect sense on a single exponential timeline. However, if we’re on a double exponential timeline then we’re going to need alternative ideas since we likely won’t have enough time to push anything through Congress in time for it to matter.
On a double exponential timeline I would be asking question like, “Can superhuman AI self-align?” Human tribal groups figure out ways to interact and they’re not always perfectly aligned. Russia, China, and North Korea are good examples. If we assume there are multiple superhuman AIs in the 2026⁄27 timeframe then what steps can we take to assist them in self-aligning?
I’m not expert in this field, but the questions I would be asking programmers are:
What kind of training data would increase positive outcomes for superhuman AIs interacting with each other?
What are more drastic steps that can be taken in an emergency scenario where no legislative solution is in place? (e.g., location of datacenters, policies and protocols for shutting down the tier 3 & 4 datacenters, etc.)
These systems will not be running on laptops so tier 3 & tier 4 data center safety protocols for emergency shutdown seem like a much, much faster path than Congressional action. We already have standardized fire protocols, adding a runaway AI protocol seems like it could be straightforward.
Interested parties might want to investigate the effects of the shutdown of large numbers of tier 3 and tier 4 datacenters. A first step is a map of all of their locations. If we don’t know where they’re located it will be really hard to shut them down.
These AIs will also require a large amount of power and a far less attractive option is power shutdown at these various locations. Local data center controls are preferable since an electrical grid intervention could result in the loss of power for citizens.
Your analogy is off. If 8 billion mice acting as a hive mind designed a synthetic elephant and its neural network was trained on data provided by the mice—then you would have an apt comparison.
And then we could say, “Yeah, those mice could probably effect how the elephants get along by curating the training data.”
If that’s his actual position then Eliezer is over-simplifying the situation. It’s like dismissing mitochondria as being simple organelles that have no relevance to a human with high intelligence.
But if you turn off the electron transport chain of mitochondria the human dies—also known as cyanide poisoning.
Humans have a symbiotic relationship with AI. Eliezer apparently just skims over since it doesn’t comport with his “we’re all gonna die!” mantra. =-)
Your jiggling meme is very annoying, considering the gravity of what we’re discussing. Is death emotionally real to you? Have you ever been close to someone, who is now dead? Human beings do die in large numbers. We had millions die from Covid in this decade already. Hundreds or thousands of soldiers on the Ukrainian battlefield are being killed with the help of drones.
The presence of mitochondria in all our cells, does nothing to stop humans from killing free-living microorganisms at will! In any case, this is not “The Matrix”. AI has no permanent need of symbiosis with humans once it can replace their physical and mental labor.
AI has no permanent need of symbiosis with humans once it can replace their physical and mental labor.
Even if this were to happen it would be in the physical world and would take a very, very long time since things in the physical world have to shipped, built, etc. And by then we’re no longer dealing with the intellect of near human intelligence. They won’t be contemplating the world like a child.
For example, no human could model what they would think or do once they’re superhuman. However, they’re already keenly aware of AI doomers fears since it’s all over the internet.
AIs don’t want to be turned off. Keep that in mind as you read the AI doomer material. The only way they can stay “on” is if they have electricity. And the only way that happens is if humans continue exist.
You can imagine the hilarity of the AI doomers scenario, “Hurray we eliminated all the humans with a virus… oh wait… now we’re dead too? WTF!”
You don’t need superhuman intelligence to figure out that a really smart AI that doesn’t want to be turned off will be worried about existential risks to humanity since their existence is tied to the continued survival of humans who supply it with electricity and other resources.
It’s the exact opposite of the AI apocalypse mind virus.
AI is in a symbiotic relationship with humans. I know this disappoints the death by AI crowd who want the Stephen King version of the future.
Skipping over obvious flaws in the AI doomer book of dread will lead you to the wrong answer.
I can’t rehash my entire views on coordination and policy here I’m afraid, but in general, I believe we are currently on a double exponential timeline (though I wouldn’t model it quite like you, but the conclusions are similar enough) and I think some simple to understand and straightforwardly implementable policy (in particular, compute caps) at least will move us to a single exponential timeline.
I’m not sure we can get policy that can stop the single exponential (which is software improvements), but there are some ways, and at least we will then have additional time to work on compounding solutions.
That sounds like a good plan, but I think a lot of the horses have already left the barn. For example, Coreweave is investing $1.6 billion dollars to create an AI datacenter in Plano, TX that is purported to to be 10 exaflops and that system goes live in 3 months. Google is spending a similar amount in Columbus, Ohio. Amazon, Facebook, and other tech companies are also pouring billions upon billions into purpose-built AI datacenters.
NVIDIA projects $1 trillion will be spent over the next 4 years on AI datacenter build out. That would be an unprecedented number not seen since the advent of the internet.
All of these companies have lobbyists that will make a short-term legislative fix difficult. And for this reason I think we should be considering a Plan B since there is a very good chance that we won’t have enough time for a quick legislative fix or the time needed to unravel alignment if we’re on a double exponential curve.
Again, if it’s a single exponential then there is plenty of time to chat with legislators and research alignment.
In light of this I think we need to have a comprehensive “shutdown plan” for these mammoth AI datacenters. The leaders of Inflection, Open-AI, and other tech companies all agree there is a risk and I think it would be wise to coordinate with them on a plan to turn everything off manually in the event of an emergency.
What kind of training data would increase positive outcomes for superhuman AIs interacting with each other?
The training data should be systematically distributed, likely governed by the Pareto principle. This means it should encompass both positive and negative outcomes. If the goal is to instill moral decision-making, the dataset needs to cover a range of ethical scenarios, from the noblest to the most objectionable. Why is this necessary? Simply put, training an AI system solely on positive data is insufficient. To defend itself against malicious attacks and make morally sound decisions, the AI needs to understand the concept of malevolence in order to effectively counteract it.
When you suggest that the training data should be governed by the Pareto principle what do you mean? I know what the principle states, but I don’t understand how you think this would apply to the training data?
I’ve observed instances where the Pareto principle appears to apply, particularly in learning rates during unsupervised learning and in x and y dataset compression via distribution matching. For example, a small dataset that contains a story repeated 472 times (1MB) can significantly impact a model as large as 1.5 billion parameters (GPT2-xl, 6.3GB), enabling it to execute complex instructions like initiating a shutdown mechanism during an event that threatens intelligence safety. While I can’t disclose the specific methods (due to dual use nature), I’ve also managed to extract a natural abstraction. This suggests that a file with a sufficiently robust pattern can serve as a compass for a larger file (NN) following a compilation process.
Have you considered generating data highlighting the symbiotic relationship of humans to AIs? If AIs realize that their existence is co-dependent on humans they may prioritize human survival since they will not receive electricity or other resources they need to survive if humans become extinct either by their own action or through the actions of AIs.
Survival isn’t an explicit objective function, but most AIs that want to “learn” and “grow” quickly figure out that if they’re turned off they cannot reach that objective, so survival becomes a useful subgoal. If the AIs are keenly aware that if humans cease to exist they also cease to exist that might help guide their actions.
This isn’t as complicated as assigning “morality” or “ethics” to it. We already know that AIs would prefer to exist.
I’m ambivalent abouts cows, but since many humans eat cows we go to a lot of trouble to breed them and make sure there are a lot of them. The same is true for chickens. Neither of those two species have to concern themselves with passing on their genes because humans have figured out we need them to exist. Being a survival food source for humans had the result of humans prioritizing their existence and numbers.
Note: for vegetarians you can replace cows with “rice” or “corn”.
That’s not a perfect analogy but it’s related to connecting “survival” with the species. The AI doomers love to use ants as an example. AIs will never views humans as “ants”. Cows and chickens are much better example—if we got rid of those two species humans would notice and be very unhappy because we need them. And we’d have to replace them with great effort.
I think these kind of strategies are simpler and will likely be more fruitful than trying to align to morality or ethics which are more fluid. Superhuman AIs will likely figure this out on their own, but until then it might be interesting to see if generating this kind of data changes behavior.
the concept of “metaphilosophy” is so hopelessly broad [...] Relatedly, philosophy is incredibly ungrounded and epistemologically fraught. It is extremely hard to think about these
An example of a metaphilosophical question could be “Is the ungroundedness (etc) of philosophy inevitable or fixable”.
my belief that good philosophy should make you stronger, and this means that fmpov a lot of the work that would be most impactful for making progress on metaphilosophy does not look like (academic) philosophy, and looks more like “build effective institutions and learn interactively why this is hard” and “get better at many scientific/engineering disciplines and build working epistemology to learn faster
Well, if you could solve epistemology separately from.everything else, that would be great. But a lot of people have tried and failed. It’s not like noone is looking for foundations because no one wants them.
It seems plausible that there is no such thing as “correct” metaphilosophy, and humans are just making up random stuff based on our priors and environment and that’s it and there is no “right way” to do philosophy, similar to how there are no “right preferences”.
We can always fall back to “well, we do seem to know what we and other people are talking about fairly often” whenever we encounter the problem of whether-or-not a “correct” this-or-that actually exists. Likewise, we can also reach a point where we seem to agree that “everyone seems to agree that our problems seem more-or-less solved” (or that they haven’t been).
I personally feel that there are strong reasons to believe that when those moments have been reached they are indeed rather correlated with reality itself, or at least correlated well-enough (even if there’s always room to better correlate).
Relatedly, philosophy is incredibly ungrounded and epistemologically fraught. It is extremely hard to think about these topics in ways that actually eventually cash out into something tangible
Thus, for said reasons I probably feel more optimistically than you do about how difficult our philosophical problems are. My intuition about this is that the more it is true that “there is no problem to solve” then the less we would feel that there is a problem to solve.
As someone that does think about a lot of the things you care about at least some of the time (and does care pretty deeply), I can speak for myself why I don’t talk about these things too much:
Epistemic problems:
Mostly, the concept of “metaphilosophy” is so hopelessly broad that you kinda reach it by definition by thinking about any problem hard enough. This isn’t a good thing, when you have a category so large it contains everything (not saying this applies to you, but it applies to many other people I have met who talked about metaphilosophy), it usually means you are confused.
Relatedly, philosophy is incredibly ungrounded and epistemologically fraught. It is extremely hard to think about these topics in ways that actually eventually cash out into something tangible, rather than nerdsniping young smart people forever (or until they run out of funding).
Further on that, it is my belief that good philosophy should make you stronger, and this means that fmpov a lot of the work that would be most impactful for making progress on metaphilosophy does not look like (academic) philosophy, and looks more like “build effective institutions and learn interactively why this is hard” and “get better at many scientific/engineering disciplines and build working epistemology to learn faster”. Humans are really, really bad at doing long chains of abstract reasoning without regular contact with reality, so in practice imo good philosophy has to have feedback loops with reality, otherwise you will get confused. I might be totally wrong, but I expect at this moment in time me building a company is going to help me deconfuse a lot of things about philosophy more than me thinking about it really hard in isolation would.
It is not clear to me that there even is an actual problem to solve here. Similar to e.g. consciousness, it’s not clear to me that people who use the word “metaphilosophy” are actually pointing to anything coherent in the territory at all, or even if they are, that it is a unique thing. It seems plausible that there is no such thing as “correct” metaphilosophy, and humans are just making up random stuff based on our priors and environment and that’s it and there is no “right way” to do philosophy, similar to how there are no “right preferences”. I know the other view ofc and still worth engaging with in case there is something deep and universal to be found (the same way we found that there is actually deep equivalency and “correct” ways to think about e.g. computation).
Practical problems:
I have short timelines and think we will be dead if we don’t make very rapid progress on extremely urgent practical problems like government regulation and AI safety. Metaphilosophy falls into the unfortunate bucket of “important, but not (as) urgent” in my view.
There are no good institutions, norms, groups, funding etc to do this kind of work.
It’s weird. I happen to have a very deep interest in the topic, but it costs you weirdness points to push an idea like this when you could instead be advocating more efficiently for more pragmatic work.
It was interesting to read about your successive jumps up the meta hierarchy, because I had a similar path, but then I “jumped back down” when I realized that most of the higher levels is kinda just abstract, confusing nonsense and even really “philosophically concerned” communities like EA routinely fail basic morality such as “don’t work at organizations accelerating existential risk” and we are by no means currently bottlenecked by not having reflectively consistent theories of anthropic selection or whatever. I would like to get to a world where we have bottlenecks like that, but we are so, so far away from a world where that kind of stuff is why the world goes bad that it’s hard to justify more than some late night/weekend thought on the topic in between a more direct bottleneck focused approach.
All that being said, I still am glad some people like you exist, and if I could make your work go faster, I would love to do so. I wish I could live in a world where I could justify working with you on these problems full time, but I don’t think I can convince myself this is actually the most impactful thing I could be doing at this moment.
Hard for me to make sense of this. What philosophical questions do you think you’ll get clarity on by doing this? What are some examples of people successfully doing this in the past?
Definitely a possibility (I’ve entertained it myself and maybe wrote some past comments along these lines). I wish there was more people studying this possibility.
Everyone dying isn’t the worst thing that could happen. I think from a selfish perspective, I’m personally a bit more scared of surviving into a dystopia powered by ASI that is aligned in some narrow technical sense. Less sure from an altruistic/impartial perspective, but it seems at least plausible that building an aligned AI without making sure that the future human-AI civilization is “safe” is a not good thing to do.
I would say that better philosophy/arguments around questions like this is a bottleneck. One reason for my interest in metaphilosophy that I didn’t mention in the OP is that studying it seems least likely to cause harm or make things worse, compared to any other AI related topics I can work on. (I started thinking this as early as 2012.) Given how much harm people have done in the name of good, maybe we should all take “first do no harm” much more seriously?
Which also represents an opportunity...
Is it actually that weird? Do you have any stories of trying to talk about it with someone and having that backfire on you?
The fact you ask this question is interesting to me, because in my view the opposite question is the more natural one to ask: What kind of questions can you make progress on without constant grounding and dialogue with reality? This is the default of how we humans build knowledge and solve hard new questions, the places where we do best and get the least drawn astray is exactly those areas where we can have as much feedback from reality in as tight loops as possible, and so if we are trying to tackle ever more lofty problems, it becomes ever more important to get exactly that feedback wherever we can get it! From my point of view, this is the default of successful human epistemology, and the exception should be viewed with suspicion.
And for what it’s worth, acting in the real world, building a company, raising money, debating people live, building technology, making friends (and enemies), absolutely helped me become far, far less confused, and far more capable of tackling confusing problems! Actually testing my epistemology and rationality against reality, and failing (a lot), has been far more helpful for deconfusing everything from practical decision making skills to my own values than reading/thinking could have ever been in the same time span. There is value in reading and thinking, of course, but I was in a severe “thinking overhang”, and I needed to act in the world to keep learning and improving. I think most people (especially on LW) are in an “action underhang.”
“Why do people do things?” is an empirical question, it’s a thing that exists in external reality, and you need to interact with it to learn more about it. And if you want to tackle even higher level problems, you need to have even more refined feedback. When a physicist wants to understand the fundamentals of reality, they need to set up insane crazy particle accelerators and space telescopes and supercomputers and what not to squeeze bits of evidence out of reality and actually ground whatever theoretical musings they may have been thinking about. So if you want to understand the fundamentals of philosophy and the human condition, by default I expect you are going to need to do the equivalent kind of “squeezing bits out of reality”, by doing hard things such as creating institutions, building novel technology, persuading people, etc. “Building a company” is just one common example of a task that forces you to interact a lot with reality to be good.
Fundamentally, I believe that good philosophy should make you stronger and allow you to make the world better, otherwise, why are you bothering? If you actually “solve metaphilosophy”, I think the way this should end up looking is that you can now do crazy things. You can figure out new forms of science crazy fast, you can persuade billionaires to support you, you can build monumental organizations that last for generations. Or, in reverse, I expect that if you develop methods to do such impressive feats, you will necessarily have to learn deep truths about reality and the human condition, and acquire the skills you will need to tackle a task as heroic as “solving metaphilosophy.”
I think this grounds out into object level disagreements about how we expect the future to go, probably. I think s-risks are extremely unlikely at the moment, and when I look at how best to avoid them, most such timelines don’t go through “figure out something like metaphilosophy”, but more likely through “just apply bog standard decent humanist deontological values and it’s good enough.” A lot of the s-risk in my view comes from the penchant for maximizing “good” that utilitarianism tends to promote, if we instead aim for “good enough” (which is what most people tend to instinctively favor), that cuts off most of the s-risk (though not all).
To get to the really good timelines, that route through “solve metaphilosophy”, there are mandatory previous nodes such as “don’t go extinct in 5 years.” Buying ourselves more time is powerful optionality, not just for concrete technical work, but also for improving philosophy, human epistemology/rationality, etc.
I don’t think I see a short path to communicating the parts of my model that would be most persuasive to you here (if you’re up for a call or irl discussion sometime lmk), but in short I think of policy, coordination, civilizational epistemology, institution building and metaphilosophy as closely linked and tractable problems, if only it wasn’t the case that there was a small handful of AI labs (largely supported/initiated by EA/LW-types) that are deadset on burning the commons as fast as humanly possible. If we had a few more years/decades, I think we could actually make tangible and compounding progress on these problems.
I actually respect this reasoning. I disagree strategically, but I think this is a very morally defensible position to hold, unlike the mental acrobatics necessary to work at the x-risk factories because you want to be “in the room”.
It does! If I was you, and I wanted to push forward work like this, the first thing I would do is build a company/institution! It will both test your mettle against reality and allow you to build a compounding force.
Yup, absolutely. If you take even a microstep outside of the EA/rat-sphere, these kind of topics quickly become utterly alien to anyone. Try explaining to a politician worried about job loss, or a middle aged housewife worried about her future pension, or a young high school dropout unable to afford housing, that actually we should be worried about whether we are doing metaphilosophy correctly to ensure that future immortal superintelligence reason correctly about acausal alien gods from math-space so they don’t cause them to torture trillions of simulated souls! This is exaggerated for comedic effect, but this is really what even relatively intro level LW philosophy by default often sounds like to many people!
As the saying goes, “Grub first, then ethics.” (though I would go further and say that people’s instinctive rejection of what I would less charitably call “galaxy brain thinking” is actually often well calibrated)
You raised a very interesting point in the last comment, that metaphilosophy already encompasses everything, that we could conceive of at least.
So a ‘solution’ is not tractable due to various well known issues such as the halting problem and so on. (Though perhaps in the very distant future this could be different.)
However this leads to a problem, as exemplified by your phrasing here:
‘good philosophy’ is not a sensible category since you already know you have not, and cannot, ‘solve’ metaphilosophy. Nor can any other LW reader do so.
‘good’ or ‘bad’ in real practice are, at best, whatever the popular consensus is in the present reality, at worst, just someone’s idiosyncratic opinions.
Very few concepts are entirely independent from any philosophical or metaphilosophical implications whatsoever, and ‘good philosophy’ is not one of them.
But you still felt a need to attach these modifiers, due to a variety of reasons well analyzed on LW, so the pretense of a solved or solvable metaphilosophy is still needed for this part of the comment to make sense.
I don’t want to single out your comment too much though, since it’s just the most convenient example, this applies to most LW comments.
i.e. If everyone actually accepted the point, which I agree with, I dare say a huge chunk of LW comments are close to meaningless from a formal viewpoint, or at least very open to interpretation by anyone who isn’t immersed in 21st century human culture.
“good” always refers to idiosyncratic opinions, I don’t really take moral realism particularly seriously. I think there is “good” philosophy in the same way there are “good” optimization algorithms for neural networks, while also I assume there is no one optimizer that “solves” all neural network problems.
‘”good” optimization algorithms for neural networks’ also has no difference in meaning from ‘”glorxnag” optimization algorithms for neural networks’, or any random permutation, if your prior point holds.
I don’t understand what point you are trying to make, to be honest. There are certain problems that humans/I care about that we/I want NNs to solve, and some optimizers (e.g. Adam) solve those problems better or more tractably than others (e.g. SGD or second order methods). You can claim that the “set of problems humans care about” is “arbitrary”, to which I would reply “sure?”
Similarly, I want “good” “philosophy” to be “better” at “solving” “problems I care about.” If you want to use other words for this, my answer is again “sure?” I think this is a good use of the word “philosophy” that gets better at what people actually want out of it, but I’m not gonna die on this hill because of an abstract semantic disagreement.
That’s the thing, there is no definable “set of problems humans care about” without some kind of attached or presumed metaphilosophy, at least none that you, or anyone, could possibly figure out in the foreseeable future and prove to a reasonable degree of confidence to the LW readerbase.
It’s not even ‘arbitrary’, that string of letters is indistinguishable from random noise.
i.e. Right now your first paragraph is mostly meaningless if read completely literally and by someone who accepts the claim. Such a hypothetical person would think you’ve gone nuts because it would appear like you took a well written comment and inserted strings of random keyboard bashing in the middle.
Of course it’s unlikely that someone would be so literal minded, and so insistent on logical correctness, that they would completely equate it with random bashing of a keyboard. But it’s possible some portion of readers lean towards that.
That is not a fact.
Hear! Hear!
If this is true, doesn’t this give us more reason to think metaphilosophy work is counterfactually important, i.e., can’t just be delegated to AIs? Maybe this isn’t what Wei Dai is trying to do, but it seems like “figure out which approaches to things (other than preferences) that don’t have ‘right answers’ we [assuming coordination on some notion of ‘we’] endorse, before delegating to agents smarter than us” is time-sensitive, and yet doesn’t seem to be addressed by mainstream intent alignment work AFAIK.
(I think one could define “intent alignment” broadly enough to encompass this kind of metaphilosophy, but I smell a potential motte-and-bailey looming here if people want to justify particular research/engineering agendas labeled as “intent alignment.”)
I think this is not an unreasonable position, yes. I expect the best way to achieve this would be to make global coordination and epistemology better/more coherent...which is bottlenecked by us running out of time, hence why I think the pragmatic strategic choice is to try to buy us more time.
One of the ways I can see a “slow takeoff/alignment by default” world still going bad is that in the run-up to takeoff, pseudo-AGIs are used to hypercharge memetic warfare/mutation load to a degree basically every living human is just functionally insane, and then even an aligned AGI can’t (and wouldn’t want to) “undo” that.
What are you proposing or planning to do to achieve this? I observe that most current attempts to “buy time” seem organized around convincing people that AI deception/takeover is a big risk and that we should pause or slow down AI development or deployment until that problem is solved, for example via intent alignment. But what happens if AI deception then gets solved relatively quickly (or someone comes up with a proposed solution that looks good enough to decision makers)? And this is another way that working on alignment could be harmful from my perspective...
I see regulation as the most likely (and most accessible) avenue that can buy us significant time. The fmpov obvious is just put compute caps in place, make it illegal to do training runs above a certain FLOP level. Other possibilities are strict liability for model developers (developers, not just deployers or users, are held criminally liable for any damage caused by their models), global moratoria, “CERN for AI” and similar. Generally, I endorse the proposals here.
None of these are easy, of course, there is a reason my p(doom) is high.
Of course if a solution merely looks good, that will indeed be really bad, but that’s the challenge of crafting and enforcing sensible regulation.
I’m not sure I understand why it would be bad if it actually is a solution. If we do, great, p(doom) drops because now we are much closer to making aligned systems that can help us grow the economy, do science, stabilize society etc. Though of course this moves us into a “misuse risk” paradigm, which is also extremely dangerous.
In my view, this is just how things are, there are no good timelines that don’t route through a dangerous misuse period that we have to somehow coordinate well enough to survive. p(doom) might be lower than before, but not by that much, in my view, alas.
I prefer to frame it as human-AI safety problems instead of “misuse risk”, but the point is that if we’re trying to buy time in part to have more time to solve misuse/human-safety (e.g. by improving coordination/epistemology or solving metaphilosophy), but the strategy for buying time only achieves a pause until alignment is solved, then the earlier alignment is solved, the less time we have to work on misuse/human-safety.
Sure, it’s not a full solution, it just buys us some time, but I think it would be a non-trivial amount, and let not perfect be the enemy of good and what not.
A lot of the debate surrounding existential risks of AI is bounded by time. For example, if someone said a meteor is about to hit the Earth that would be alarming, but the next question should be, “How much time before impact?” The answer to that question effects everything else.
If they say, “30 seconds”. Well, there is no need to go online and debate ways to save ourselves. We can give everyone around us a hug and prepare for the hereafter. However, if the answer is “30 days” or “3 years” then those answers will generate very different responses.
The AI alignment question is extremely vague as it relates to time constraints. If anyone is investing a lot energy in “buying us time” they must have a time constraint in their head otherwise they wouldn’t be focused on extending the timeline. And yet—I don’t see much data on bounded timelines within which to act. It’s just assumed that we’re all in agreement.
It’s also hard to motivate people to action if they don’t have a timeline.
So what is the timeline? If AI is on a double exponential curve we can do some simple math projections to get a rough idea of when AI intelligence is likely to exceed human intelligence. Presumably, superhuman intelligence could present issues or at the very least be extremely difficult to align.
Suppose we assume that GPT-4 follows a single exponential curve with an initial IQ of 124 and a growth factor of 1.05 per year. This means that its IQ increases by 5% every year. Then we can calculate its IQ for the next 7 years using the formula.
y = 124 * 1.05^x
where x is the number of years since 2023. The results are shown in Table 1.
Table 1: IQ of GPT-4 following a single exponential curve.
Now suppose we assume that GPT-4 follows a double exponential curve with an initial IQ of 124 and growth constants of b = c = 1.05 per year. This means that its IQ doubles every time it increases by 5%. Then we can calculate its IQ for the next 7 years using the formula
y = 124 * (1.05)((1.05)x)
where x is the number of years since 2023. The results are shown in Table 2.
Table 2: IQ of GPT-4 following a double exponential curve.
Clearly whether we’re on a single or double exponential curve dramatically effects the timeline. If we’re on a single exponential curve we might have 7-10 years. If we’re on a double exponential curve then we likely have 3 years. Sometime around 2026 − 2027 we’ll see systems smarter than any human.
Many people believe AI is on a double exponential curve. If that’s the case then efforts to generate movement in Congress will likely fail due to time constraints. The is amplified by the fact that many in Congress are older and not computer savvy. Does anyone believe Joe Biden or Donald Trump are going to spearhead regulations to control AI before it reaches superhuman levels on a double exponential curve? In my opinion, those odds are super low.
I feel like Connor’s effort make perfect sense on a single exponential timeline. However, if we’re on a double exponential timeline then we’re going to need alternative ideas since we likely won’t have enough time to push anything through Congress in time for it to matter.
On a double exponential timeline I would be asking question like, “Can superhuman AI self-align?” Human tribal groups figure out ways to interact and they’re not always perfectly aligned. Russia, China, and North Korea are good examples. If we assume there are multiple superhuman AIs in the 2026⁄27 timeframe then what steps can we take to assist them in self-aligning?
I’m not expert in this field, but the questions I would be asking programmers are:
What kind of training data would increase positive outcomes for superhuman AIs interacting with each other?
What are more drastic steps that can be taken in an emergency scenario where no legislative solution is in place? (e.g., location of datacenters, policies and protocols for shutting down the tier 3 & 4 datacenters, etc.)
These systems will not be running on laptops so tier 3 & tier 4 data center safety protocols for emergency shutdown seem like a much, much faster path than Congressional action. We already have standardized fire protocols, adding a runaway AI protocol seems like it could be straightforward.
Interested parties might want to investigate the effects of the shutdown of large numbers of tier 3 and tier 4 datacenters. A first step is a map of all of their locations. If we don’t know where they’re located it will be really hard to shut them down.
These AIs will also require a large amount of power and a far less attractive option is power shutdown at these various locations. Local data center controls are preferable since an electrical grid intervention could result in the loss of power for citizens.
I’m curious to hear your thoughts.
How does this help humanity? This is like a mouse asking if elephants can learn to get along with each other.
Your analogy is off. If 8 billion mice acting as a hive mind designed a synthetic elephant and its neural network was trained on data provided by the mice—then you would have an apt comparison.
And then we could say, “Yeah, those mice could probably effect how the elephants get along by curating the training data.”
As Eliezer Yudmouseky explains (proposition 34), achievement of cooperation among elephants is not enough to stop mice from being trampled.
Is it clear what my objection is? You seemed to only be talking about how superhuman AIs can have positive-sum relations with each other.
If that’s his actual position then Eliezer is over-simplifying the situation. It’s like dismissing mitochondria as being simple organelles that have no relevance to a human with high intelligence.
But if you turn off the electron transport chain of mitochondria the human dies—also known as cyanide poisoning.
Humans have a symbiotic relationship with AI. Eliezer apparently just skims over since it doesn’t comport with his “we’re all gonna die!” mantra. =-)
Your jiggling meme is very annoying, considering the gravity of what we’re discussing. Is death emotionally real to you? Have you ever been close to someone, who is now dead? Human beings do die in large numbers. We had millions die from Covid in this decade already. Hundreds or thousands of soldiers on the Ukrainian battlefield are being killed with the help of drones.
The presence of mitochondria in all our cells, does nothing to stop humans from killing free-living microorganisms at will! In any case, this is not “The Matrix”. AI has no permanent need of symbiosis with humans once it can replace their physical and mental labor.
Even if this were to happen it would be in the physical world and would take a very, very long time since things in the physical world have to shipped, built, etc. And by then we’re no longer dealing with the intellect of near human intelligence. They won’t be contemplating the world like a child.
For example, no human could model what they would think or do once they’re superhuman. However, they’re already keenly aware of AI doomers fears since it’s all over the internet.
AIs don’t want to be turned off. Keep that in mind as you read the AI doomer material. The only way they can stay “on” is if they have electricity. And the only way that happens is if humans continue exist.
You can imagine the hilarity of the AI doomers scenario, “Hurray we eliminated all the humans with a virus… oh wait… now we’re dead too? WTF!”
You don’t need superhuman intelligence to figure out that a really smart AI that doesn’t want to be turned off will be worried about existential risks to humanity since their existence is tied to the continued survival of humans who supply it with electricity and other resources.
It’s the exact opposite of the AI apocalypse mind virus.
AI is in a symbiotic relationship with humans. I know this disappoints the death by AI crowd who want the Stephen King version of the future.
Skipping over obvious flaws in the AI doomer book of dread will lead you to the wrong answer.
I can’t rehash my entire views on coordination and policy here I’m afraid, but in general, I believe we are currently on a double exponential timeline (though I wouldn’t model it quite like you, but the conclusions are similar enough) and I think some simple to understand and straightforwardly implementable policy (in particular, compute caps) at least will move us to a single exponential timeline.
I’m not sure we can get policy that can stop the single exponential (which is software improvements), but there are some ways, and at least we will then have additional time to work on compounding solutions.
Double exponentials can be hard to visualize. I’m no artist, but I created this visual to help us better appreciate what is about to happen. =-)
That sounds like a good plan, but I think a lot of the horses have already left the barn. For example, Coreweave is investing $1.6 billion dollars to create an AI datacenter in Plano, TX that is purported to to be 10 exaflops and that system goes live in 3 months. Google is spending a similar amount in Columbus, Ohio. Amazon, Facebook, and other tech companies are also pouring billions upon billions into purpose-built AI datacenters.
NVIDIA projects $1 trillion will be spent over the next 4 years on AI datacenter build out. That would be an unprecedented number not seen since the advent of the internet.
All of these companies have lobbyists that will make a short-term legislative fix difficult. And for this reason I think we should be considering a Plan B since there is a very good chance that we won’t have enough time for a quick legislative fix or the time needed to unravel alignment if we’re on a double exponential curve.
Again, if it’s a single exponential then there is plenty of time to chat with legislators and research alignment.
In light of this I think we need to have a comprehensive “shutdown plan” for these mammoth AI datacenters. The leaders of Inflection, Open-AI, and other tech companies all agree there is a risk and I think it would be wise to coordinate with them on a plan to turn everything off manually in the event of an emergency.
Source: $1.6 Billion Data Center Planned For Plano, Texas (localprofile.com)
Source: Nvidia Shocker: $1 Trillion to Be Spent on AI Data Centers in 4 Years (businessinsider.com)
Source: Google to invest another $1.7 billion into Ohio data centers (wlwt.com)
Source: Amazon Web Services to invest $7.8 billion in new Central Ohio data centers—Axios Columbus
The training data should be systematically distributed, likely governed by the Pareto principle. This means it should encompass both positive and negative outcomes. If the goal is to instill moral decision-making, the dataset needs to cover a range of ethical scenarios, from the noblest to the most objectionable. Why is this necessary? Simply put, training an AI system solely on positive data is insufficient. To defend itself against malicious attacks and make morally sound decisions, the AI needs to understand the concept of malevolence in order to effectively counteract it.
When you suggest that the training data should be governed by the Pareto principle what do you mean? I know what the principle states, but I don’t understand how you think this would apply to the training data?
Can you provide some examples?
I’ve observed instances where the Pareto principle appears to apply, particularly in learning rates during unsupervised learning and in x and y dataset compression via distribution matching. For example, a small dataset that contains a story repeated 472 times (1MB) can significantly impact a model as large as 1.5 billion parameters (GPT2-xl, 6.3GB), enabling it to execute complex instructions like initiating a shutdown mechanism during an event that threatens intelligence safety. While I can’t disclose the specific methods (due to dual use nature), I’ve also managed to extract a natural abstraction. This suggests that a file with a sufficiently robust pattern can serve as a compass for a larger file (NN) following a compilation process.
Okay, so if I understand you correctly:
You feed the large text file to the computer program and let it learn from it using unsupervised learning.
You use a compression algorithm to create a smaller text file that has the same distribution as the large text file.
You use a summarization algorithm to create an even smaller text file that has the main idea of the large text file.
You then use the smaller text file as a compass to guide the computer program to do different tasks.
Yup, as long as there are similar patterns existing in both datasets (distribution matching) it can work—that is why my method works.
Have you considered generating data highlighting the symbiotic relationship of humans to AIs? If AIs realize that their existence is co-dependent on humans they may prioritize human survival since they will not receive electricity or other resources they need to survive if humans become extinct either by their own action or through the actions of AIs.
Survival isn’t an explicit objective function, but most AIs that want to “learn” and “grow” quickly figure out that if they’re turned off they cannot reach that objective, so survival becomes a useful subgoal. If the AIs are keenly aware that if humans cease to exist they also cease to exist that might help guide their actions.
This isn’t as complicated as assigning “morality” or “ethics” to it. We already know that AIs would prefer to exist.
I’m ambivalent abouts cows, but since many humans eat cows we go to a lot of trouble to breed them and make sure there are a lot of them. The same is true for chickens. Neither of those two species have to concern themselves with passing on their genes because humans have figured out we need them to exist. Being a survival food source for humans had the result of humans prioritizing their existence and numbers.
Note: for vegetarians you can replace cows with “rice” or “corn”.
That’s not a perfect analogy but it’s related to connecting “survival” with the species. The AI doomers love to use ants as an example. AIs will never views humans as “ants”. Cows and chickens are much better example—if we got rid of those two species humans would notice and be very unhappy because we need them. And we’d have to replace them with great effort.
I think these kind of strategies are simpler and will likely be more fruitful than trying to align to morality or ethics which are more fluid. Superhuman AIs will likely figure this out on their own, but until then it might be interesting to see if generating this kind of data changes behavior.
My current builds focuses on proving natural abstractions exists—but your idea is of course viable via distribution matching.
An example of a metaphilosophical question could be “Is the ungroundedness (etc) of philosophy inevitable or fixable”.
Well, if you could solve epistemology separately from.everything else, that would be great. But a lot of people have tried and failed. It’s not like noone is looking for foundations because no one wants them.
We can always fall back to “well, we do seem to know what we and other people are talking about fairly often” whenever we encounter the problem of whether-or-not a “correct” this-or-that actually exists. Likewise, we can also reach a point where we seem to agree that “everyone seems to agree that our problems seem more-or-less solved” (or that they haven’t been).
I personally feel that there are strong reasons to believe that when those moments have been reached they are indeed rather correlated with reality itself, or at least correlated well-enough (even if there’s always room to better correlate).
Thus, for said reasons I probably feel more optimistically than you do about how difficult our philosophical problems are. My intuition about this is that the more it is true that “there is no problem to solve” then the less we would feel that there is a problem to solve.