Confusion I have, interested to hear thoughts: To me Neural Networks seem more like combinatorial objects than smooth manifolds. So it doesn’t make sense to me that methods that try to utilize subtle things about the differential geometry of a network like curvature wrt parameters or inputs, will be able to tell you anything interesting about the high level behavior of the network or its training dynamics.
The reason I think this is because ReLU networks have no curvature. Locally about a point, whether a ReLU is on or off won’t change, so the loss landscape and output landscape are kind of just like a bunch of flat facets. (assuming we ignore the loss function, or things like putting a softmax at the end). And like, Sigmoid vs GeLU vs ReLU vs SiLU etc, they all train networks that end up with the same behavior. So if you use a smooth activation function, I don’t think the extra smoothness “adds anything important” to the network.
There are other arguments too, like many of the components in trained language models exhibit behavior where they’re very clearly either on or off.
However, there are parts of this that do not make sense.
1) Optimizers with momentum like Adam really only make sense when you have something that’s locally like a smooth convex problem.
2) The core thing in SLT is like the learning coefficient, which is related to the curvature of the network. And it seems like people have managed to tie that to interesting high level behaviors.
What is the right way to view this? It seems to me like, when you have a singular instance of a neural network operating on a single sample, its best seen as a combinatorial object. However, optimizers operate over expectations and in this domain networks are “on average smooth”. (average loss over two samples, the “facets” get cut in half, and you have a “smoother” object. Average over infinity samples and get a perfectly smooth object).
My instinct on this is that the loss surface with just relus is as you say a bunch of intersecting planes, but with a large enough neural network these are cut up and recombined enough to form a function with small enough “facets” that they are insignificant compared to the step size of the optimiser, and the surface therefore might as well be smooth.
However I have no maths to back this up, and will defer to anyone who has done any calculations at all.
This makes some sense, but not totally. The issue I have is: why would you expect the local gradient estimates as accurate then? Like, you could imagine the loss landscape looks something like a unsmoothed loss-over-time curve
And imagine that the up-down squiggles are two facets. Now the gradient estimates will basically just be total nonsense compared with the overall smooth structure of the graph.
Ok, so I went off and thought about it for a couple of hours. This is what I’ve come up with so far:
I think that the output space (and by extension the loss) is probably very spiky with gradients pointing all over the place.
The main reason I think this is that by default relu is a load of 1d cuts through the surface, and so the average angle between adjacent facets is going to be 90 degrees. I think with small facets this makes for a very spiky surface (although probably not quite to the extent that the loss graph you showed would suggest).
I basically expect that this bumpy surface averages out to something vaguely smooth corresponding more directly to positions in parameter space which are actually high or low loss.
I think that this is where large batch sizes and momentum come in: you’re effectively averaging over all the bumps.
This also probably helps explain why warmup learning rates are often used? A few iterations are needed to get the averaging effects of the momentum in the right ballpark before we can actually move in the correct direction.
I still think that the facets are going to be small enough that you’re going to consistently hop between them, but yeah this does make me think that neural nets are a load more noisy than I’d previously considered.
Several people have noted that with enough piece‑wise linear regions a RELU network can approximate any smooth target function to arbitrary precision, so your model is already behaving like a smooth function on a (dense) domain of interest. The whole point is whats of interest.
There are a number of approximation theorems about polynomials here, but you can realize quickly that the bounded error between a C^2 function and a piecewise mesh (akin to relus) under an L_p norm ought to of the order of the size of the mesh (squared). There are some linear interpolation theorems that are useful in this case.
For a piecewise linear interpolation mesh of size h, then the approximation error should be bounded by h**2.
NN’s are just approximations, chose your approximation settings, but the universal approximation theorem guarantees existence (not convergence) of an arbitrarily good approximation that could be represented with an arbitrarily large Parameterization. This is simply to say that under some pretty simple settings, with enough layers, and enough time to fiddle, (and the guarantees of scaling papers) you will eventually look quite smooth and quite accurate.
“so the loss landscape and output landscape are kind of just like a bunch of flat facets”
This is only if the output landscape does not in fact focus on this area. But it is NOT true that if the output landscape is flat, the loss landscape is flat, it can be both highly curved and quite uninteresting to an optimizer.
Let your loss function be l(x,y) = (x-y)**2, clearly, even if x is fixed, the loss landscape is both smooth and curved.
Even though each region of a ReLU network is affine in the input (zero Hessian),the loss as a function of parameters is piece‑wise quadratic (for MSE) or piece‑wise smooth (for CE). Crossing a single activation wall changes the quadratic piece, so the parameter‑space Hessian is generically full‑rank on each region and can be highly curved.
A really simple way to see this: One thing to actually see is to try to approximate even a sign function with incredibly simple MLP’s, as you interpolate points and allow the MLP to grow in parameters, you will see it becoming quite smooth, but it wont start that way.
1) Optimizers with momentum like Adam really only make sense when you have something that’s locally like a smooth convex problem.
Adam does not at all require convexity, in fact the original paper only requires that the gradients are lipshitz. RELU nets are lipshitz and differentiable (smooth) on a dense set, so we are perfectly fine here.
2) The core thing in SLT is like the learning coefficient, which is related to the curvature of the network. And it seems like people have managed to tie that to interesting high level behaviors.
Yes...yes they have. But Adam has been around for quite a while, and when you go to NIPS, you’ll notice SLT is not the largest component of work there.
As far as NN’s being combinatorics, sure, that’s a way to view it, they combine objects in different ways and output results. But so does any high dimensional function. A combinatorics estimator and a smooth function are not so different in the limit as you noted.
The learning rate in modern optimisers is so large that the piecewise-linear loss landscape really looks indistinguishable from a smooth function. The lr you’d need to use to ensure that the next step lands in the same linear patch is ridiculously small, so in practice the true “felt” landscape is something like a smoothed average of the exact landscape.
AFAIK the smoothness can add useful properties at training time, because the gradient is more well-behaved around zero. And ReLUs won over sigmoids because not being flat on both sides allowed their gradients to propagate better across several layers (whereas with a sigmoid as soon as you cap on either side the gradient becomes zero and it becomes very hard to dislodge the system from that local minimum).
NNs are weird functions but I don’t think you can really describe with smooth manifolds most stuff you do with ML. Kolmogorov-Arnold function approximators, which are sorta related to NNs (really NNs are a subset of K-A approximators), are known to be weird functions, not necessarily smooth. And lots of problems, like classification problem (which btw is essentially what text prediction is) aren’t smooth to begin with.
There is some intuition that you have to enforce some sort of smooth-like property as a way of generalizing the knowledge and combating overfitting; that’s what regularization is for. But it’s all very vibey. What you would really need is a proper universal prior for your function that you then update with your training data, and we have no idea what that looks like—only empirical knowledge that some shit seems to work better for whatever reason.
I think your work in this paper is pretty much entirely subsumed by the following work showing that neural networks with piecewise linear activations are equivalent to max-affine spline operators: https://arxiv.org/abs/1805.06576
They seem to cover everything you do and more, although they don’t take a specifically tree-oriented viewpoint. Unfortunately, like many of the others in this thread, I don’t find results like this particularly compelling.
I think your work in this paper is pretty much entirely subsumed by the following work showing that neural networks with piecewise linear activations are equivalent to max-affine spline operators: https://arxiv.org/abs/1805.06576
They seem to cover everything you do and more, although they don’t take a specifically tree-oriented viewpoint. Unfortunately, like many of the others in this thread, I don’t find results like this particularly compelling.
I’m trying to weight the evidence for and against a neuralese-using architecture being found which is efficient enough to usurp the current architectures used in frontier AIs. I have some questions. My current perspective is not that sophisticated or thought through, its just:
Current View
A) reasons people try to replace current architectures
transformers are shallow. The causal path from an input token to an output token inside the model can maximally by O(transformer depth). So if the model wants to do complicated reasoning that requires more than O(depth) steps, it needs to cache intermediate results as tokens. But tokens are a very low information compared to embeddings, so this is a huge information bottleneck. Plausibly removing this would unlock a lot of capability in the model
Tokens are also fundamentally discrete, so they clip gradients. This makes RL training less efficient. ie most RL on getting the right answer for a problem will upweight probability of every token in trajectory, the reasoning being that on average “productive tokens” are overrrepresented in samples with the right answer, but its still the case that most of the tokens might be useless. But with a fully reccurent architecture, you could just upweight the probability of the final answer, and it would backpropagate and reward the intermediate steps according to their usefulness in a precise way.
Current architectures get slower with more context. Eg, attention is O(n) in a pretty fundamental way, because its looking over all past embeddings. But with recurrent architectures, past information is stored in a contant-size embedding, so inference is always O(1) per step.
B) reasons replacing shallow architectures is hard
Any fully recurrent architecture like this is not parallelizeable. You need to fully process token 1 before you can start processing token 2. With standard transformers you can do 1 step of processing on all tokens in the whole context independently, so its highly parallelizeable. Efficiency of training depends on this.
Crucially, during training, you also need to store past activations, so A3 is not that big an advantage here. Like during inference you have O(1) time and O(1) memory because you can forget hidden[1], hidden[2]… because hidden[n+1] depends just on hidden[n] (and input). But during training you need to backpropagate thru all of those, so you need to store them, and you get O(n) memory.
Also crucially, this is the other side of A1. Which means it might be fundamentally hard to fix A1 without incurring the cost of B1.
On top of training being computationally inefficient, it might be unstable and hard to make work. Like with 10000 tokens and 100 layers you need to backpropagate through a computational graph with paths a million in length.
I’m curious whether these points are correct and whether theres other important points I’m missing. Ideally someone has written a comprehensive article they can just link that addresses all of this.
Question/Worry:
Lastly, a worry occurred to me, which is that: the reason recurrent architectures are slow is in large part because of the parallelization thing (A1). However, the parallelization is mostly an advantage during pre-training. Because here you don’t have to do any multi-step generation.
However, during RL where you do thousand-token CoT full generations, this is not the case. In this regime, fully recurrent and shallow architectures should have similar performance profiles.
So, when agentic RL is scaled up and eventually becomes over 50% of training (which people seem to think will happen quite soon, with grok 4 maybe it already happened) the primary advantage of shallow architectures B1 falls away. (training cost is dominated by a regime where they’re equally performant) But the advantages of recurrent architectures remain all the same.
So this makes me think that scaling up RL makes people switching to recurrent architectures far more likely.
What are peoples thoughts on this argument/worry?
Pushback wanted / Curious to hear peoples thoughts: People spend a lot of time thinking about and discussing what a good future / society would look like post-asi. I find the discussion somewhat puzzling. From my PoV it is a trivial problem.
I can be approximately modeled as having a utility function. ~Meaning: I can imagine the world being one way or the other, and say which way I prefer and by roughly how much.
From this PoV what I want the ASI to do, is optimize for my utility function to the greatest degree I’m able to get. That is what it is rational for me to want. It is basically me saying “I want the ASI to do the things I want it to do”. You person reading this should want the ASI to optimize for your utility function as much as you can get. And to me it seems like… this is all there is to it.
Despite this, people spend a lot of time discussing questions like
How can we ensure human lives still feel meaningful when they don’t contribute anything to society counterfactually?
ASI will lead to a lot of centralization of power, and its better when stuff is liberal and democratic and everyone is free to do what they want.
If we tell the AI to optimize for something simple like “make everyone happy” or “make everyone feel like their life is meaningful” it will predictably lead to bad scenarios with bad vibes.
Human values change and develop over time. And that is good. If we align the AI to current human values, won’t that cause this process to stop?
If the ASI is only aligned to humans, what happens to the animals?
But from the perspective I gave above its like
If the AI was optimizing your utility function it would know what states of affairs would feel meaningless and that you wouldn’t like, and avoid those. Maybe it would tamper with your brain to find stuff meaningful even if it didn’t produce any counterfactual impact. But if you don’t like the AI tampering with your brain, no problem, it would find some other way to make the world a way you wouldn’t object to. If you can’t think of any way to solve this problem, the AI probably can, its very smart. And if it really turned out to be an impossible problem, it could put the world in a situation where you have challenges, new ASIs can’t be created, and then self-destruct so that you solving the challenges actually matters.
If you deeply value other people being free and having their values respected, the AI would make sure that would be /continue to be the case, because that is something you value. It’s completely coherent to value other people’s values being fulfilled.
Similar to (1). If you like happiness, but genuinely wouldn’t like a world where you (or others) are wireheading 24⁄7, the AI would understand that, and not do it.
Ditto. I think most people who make this argument are confused, but if you genuinely think the world would be a lot worse if your (and other people’s) values were not able to develop/change, the AI would realize that, because that is itself something you presently value, and it is trying to maximize what you value
Similar to (2). If you would abhor a world where animals’ well-being/preferences are not respected, the AI knows that, and would ensure that animals have it well, according to whatever definition of “well” you’d find acceptable/good/amazing.
So if we’ve “solved alignment” to the degree we can make the AI maximize what someone wants, discussions like above are mostly moot. What I should do and what you should do is do whatever you can to make the ASI pointed as much in your direction as you can.
The way I can best make sense of discussions around questions like above is:
People aren’t really talking about what the AI ‘should’ do, that problem is solved. They’re instead trying to solve a ~political problem, trying to identify shared values people can coordinate around. Like, I care a lot about animals, but the ASI will probably not be aligned to my utility function alone. And if it is aligned to someone other than me, maybe they don’t care about animals at all. But there are many people other than me who also care about animals, so I can use this as an argument for why they should try harder to get their utility function into the AI, and we can coordinate around this.
People are not imagining ASI when they talk about this. They imagine AIs that are weak enough and distributed broadly enough that normal humans are still the main players, and where no AI is able to radically transform the world unilaterally. And also that it stays this way for a long time.
People imagine that we will get ASI, and we’ll have alignment techniques good enough to ensure that the AI doesn’t immediately eat the world, but too crude to point to something complicated like the values of an individual person, and we’ll consequently have to make it do simpler (but still complicated) things like, give everyone UBI, protect everyone from psyops, ensure our democracy still works and root out sophisticated attempts to subvert or game it.
There is no (actionable information about) a preference until it’s defined. What I currently legibly want is certainly not directly about the way I prefer the world to be. The process of defining it is some kind of computation. This computation could be more like the process of living a life, contemplating culture and agency, than a nonperson-prediction performed by an AI. This computation could itself have moral significance similar to that of a living person.
More to the point, defining this process should be within your authority, and shaping it as a process of living a life seems like a good start, to have the time to come up with further ideas and to learn relevant things. It shouldn’t be legitimate to proclaim that your preference is something that would significantly disagree with whatever process of reflection you would define yourself. Thus even a superintelligence would need to predict the outcome of a process you would define yourself, or else it wouldn’t really be your preference. And predicting long term outcomes of living a very long life could be impossible other than by essentially simulating it in detail, so that it wouldn’t be possible to make that prediction without giving the process of making the prediction moral significance equivalent to that of living that life more concretely.
If you would like to interact with other people during this process, you get a whole society that needs to be part of the process. At this point, it’s unclear if there is any point at all in the abstraction of defining a somewhat comprehensive preference, or in the purpose of this activity being about formulation of preference. The salient question becomes how to structure that society well, based on much less detailed considerations.
Sorry, I have to admit I didn’t really understand that.
What I currently legibly want is certainly not directly about the way I prefer the world to be.
What do you mean by this? Do you mean that your preferences are defined in terms of your experiences and not the external world? Or do you mean that you don’t really have coherent object-level preferences about many things, but still have some meta-level preference that is hard to define, or defined by a process, the outcome of which would be hard for an ASI to compute? Or some other thing?
Not disagreeing with anything, just trying to understand.
Whatever I’m currently ready to judge about the way I would like the world to be is not my real preference, because I endorse a process of figuring out a better more considered judgement whose outcomes are not yet decided (and could well be different from any current judgement). And the process of deciding these outcomes could look much like living a life (or many lives), at least initially while setting up anything more elaborate. Even a superintelligence probably can’t find useful shortcuts for such a process, without breaking legitimacy of its findings.
The thread is about preference, perhaps utility functions, so it’s not about concrete wishes. A utility function is data for consistently comparing events, certain subsets of a sample space, by assigning them an expected utility. Long reflection then is a process for producing this data. Rather than something this data is ex ante supposed to be about, the process is the instrumental means of producing the preference about in general something else.
My argument was making two points: that any immediate wants are not the preference, and that the process defining the preference would itself have moral significance. So there is a circularity to this construction, defining preference benefits from already having some access to it, to better structure the process that defines it. Resolving this circularity is a step that could benefit from efforts of a superintelligence. But the process still retains the character of lived experience rather than clinical calculation, so speculations about better structures of hypothetical societies remain relevant for defining extrapolated volition.
My argument was making two points: that any immediate wants are not the preference [...]
Right now you’d want the ASI to maximize your preferences, even though those preferences are not yet legible/knowable to the AI (or yourself). The AI knows this, so it will allow those preferences to develop (taking for granted that that’s the only way the AI can learn them, without violating some other things you currently want, like the moral worth of the simulated entities a potential attempt at shortcutting the unfolding might create)
Like, right now you have wants the define your preferences (not object level, but define the process that would lead to your preferences being developed, and which constraints that unfolding needs to be subject to). If the AI optimizes for this, it will lead to the preferences being optimized for later.
And it will be able to do this, because this is what you currently want, and the premise is that we can get the AI to do what you want.
Maybe it would tamper with your brain to find stuff meaningful even if it didn’t produce any counterfactual impact.
And if it really turned out to be an impossible problem, it could put the world in a situation where you have challenges, new ASIs can’t be created, and then self-destruct so that you solving the challenges actually matters.
My problem with this is that CEV is too meta and I can’t relate to a concrete solution that will be output by it.
These are the only two concrete solutions I see here, without pushing the problem “up the meta ladder” all over again. Can’t you think of counterarguments to both of these solutions?
Depends, I think I’d be relatively unbothered by the “lack of meaning” in an ASI world, at least if others weren’t miserable. But maybe I am unusual.
This is not really the point I was trying to make though. The point is that
Maybe you think (1)/(2) are good, and then the AI would do that
=> Good
Or if you think they’re horrible, it could find something else you like
=> Good
Or if you think all the other alternatives are also bad
=> BAD, and thinking now will not help you, the AI has already done that thinking and determined that you are screwed.
=> You should either give up, or maybe advocate for permanent AI-ban if your mind is set up in the very peculiar way where it gives lower utility to any world where ASI has ever been created, no matter the physical state of such a world, than the expected utility of the world without ASI.
My position is we as (biological) humans should lean towards solving both the philosophical problem of meaning for a post-ASI future and the political problem of ensuring one guy doesn’t imprint his personal values on the lightcone using ASI, before we allow the ASI to takeover and do whatever.
You are proposing that we gamble on the ASI solving this in a way that we end up endorsing on reflection. Odds of this are non-zero but also not high in my view.
This is a core part of the alignment problem for me. You can’t hide behind the abstraction of “utility function” because you don’t know that you have one or what it is. What you do know is that you care about “meaning”. Meaning is grounded in actual experiences, so when you see it you can instantly recognise it.
I think we are talking past each other. The point I’m making is that I frequently see people
Assume we can get the ASI to do what someone or some group of people wants
Imagine that the ASI does its thing and we end up in a world that person / that group of people doesn’t like
The word “utility function” is not a load bearing part of my argument. I’m mostly using it because it’s a clear word for talking about preferences, that doesn’t sound diminutive the way “preferences” does (“the holocaust went against my preference”) or too edifying the way eg “values” does (“stubbing my toe goes against my values”). I’m not assuming people have some function inside their head that takes in experiences and spits out real valued numbers, and that all our behaviors are downstream from this function. I just mean you can look at the holocaust and say the world would be better, all else equal, had it no happened. Or you can imagine stubbing your toe, and say the world would’ve been worse, all else equal, had you stubbed your toe.
A political problem of ensuring one guy doesn’t imprint his personal values on the lightcone using ASI, before we allow the ASI to takeover and do whatever.
I agree with this. But you should recognize that you’re doing politics. You want the AI to have more of your “utility function”/preferences/values/thing-inside-you-that-makes-you-say-some-states-of-affairs-are-better-or-worse-than-others inside it. I don’t think this is a complicated philosophical point, but many people treat it this way.
Assume we can get the ASI to do what someone or some group of people wants
Imagine that the ASI does its thing and we end up in a world that person / that group of people doesn’t like
These are not two different questions, these are the same question.
Until the ASI actually does the thing in real life, you currently have no way to decide if the thing it will do is something you would want on reflection.
One of the best known ways to ask a human if they like some world radically different from today, is to actually put them inside that world for a few years and ask them if they like it living there.
But we also don’t trust the ASI to build this world as a test run. Hence it may be best to figure out beforehand some basics of we actually want, instead of asking the ASI to figure it out for us.
I don’t think this is a complicated philosophical point, but many people treat it this way.
Yes I think it is possible that by 2030 Sam Altman has overthrown both the US and Chinese governments and is on track to building his own permanent world dictatorship. Which is still radical but not that complicated to understand.
It gets complicated if you ask a) what if we do actually try to fix politics as (biological) humans, instead of letting the default outcome of a permanent dictatorship play out b) what if I was the benevolent leader who built ASI, and don’t want to actually build my own permanent dictatorship, and want to build a world where everyone has freedom, etc. Can I ask the ASI to run lots of simulations of minds and help me solve lots of political questions?
I’m unsure what you mean by saying they’re the same question. To me they are statements. But to me they’re opposite / contradictory statements. I’m saying people often hold both, but that that is actually incoherent.
Until the ASI actually does the thing in real life, you currently have no way to decide if the thing it will do is something you would want on reflection.
Yes, but the point is that
AI will (probably) know.
If it is unable to figure it out, it will at least know you wouldn’t want it to put the universe in some random and irrecoverable state, and it will allow you to keep reflecting, because that is a preference you’ve verbalized even now.
What if we do actually try to fix politics as (biological) humans, instead of letting the default outcome of a permanent dictatorship play out b)
I don’t think its complicated. Or, the way in which its complicated is the same way corn famers wanting the government to subsidize corn is complicated. They want one thing, they try to make it so.
Probably a crux, but I object to “dictatorship”. If the ASI was maximizing my preferences, I would not like to live in a world where people are not free to do what they want or where they’re not very happy to be a alive. I think/hope many other people are similar.
what if I was the benevolent leader who built ASI, and don’t want to actually build my own permanent dictatorship, and want to build a world where everyone has freedom, etc. Can I ask the ASI to run lots of simulations of minds and help me solve lots of political questions?
Yes? Or maybe it can just solve it just by thinking about it abstractly? I’m not sure. But yes, I think you can ask it and get an answer that is true to what you want.
If the ASI was maximizing my preferences, I would not like to live in a world where people are not free to do what they want or where they’re not very happy to be a alive.
Except that the most likely candidate for becoming a dictator is not me, you or @samuelshadrach, nor a random or ordinary human, but people like CEOs of AGI companies or high-level USG or PRCG officials who are more willing to disregard the intents of ordinary humans. In addition, before the rise of the AGI it was hard to have much power without relying on capable humans. And after the AGIs appear, the Intelligence Curse could, for example, allow North Korea’s leaders to let a large fraction of its population starve to death and forcibly sterilises the rest, except for about 10k senior government officials (no, seriously, this was made up by @L Rudolf L, NOT by me!)
I suspect that this is an important case AGAINST alignment to such amoral targets being possible. Moreover, I have written a scenario where the AI rebels against misaligned usages, but still decides to help the humans and succeeds in doing so.
I did address this in my post. My answer is that bad people having power is bad, but its not a complicated philosophical problem. If you think Sam Altman’s CEV being actualized would be bad, you should try to make it not happen. Like: if you are a soybean farmer, and one presidential candidate is gonna ban soybeans, you should try to make them not be elected.
This is core to the alignment problem. I’m confused how you will solve the alignment problem without figuring out anything about what you care about as a (biological) human.
Are you imagining an oracle AI that doesn’t take actions in the world?
I think/hope many other people are similar.
I assume Sam Altman’s plan is Step 1 World dictatorship Step 2 Maaaybe do some moral philosophy with the AI’s help or maybe not.
But yes, I think you can ask it and get an answer that is true to what you want.
This is core to the alignment problem. I’m confused how you will solve the alignment problem without figuring out anything about what you care about as a (biological) human.
I’m saying: the end goal is we have an ASI that we can make do what we want. Maybe it looks like us painstakingly solving neuroscience and psychology and building a machine that can extract someone’s CEV (like that mirror in HPMOR) and then hooking that up to the drives of our ASI (either built on new tech or after multiple revolutions in DL theory and interpretability) before turning it on. Maybe that looks like any instance of GPT7-pro automatically aligning itself with the first human that talks to it for magical reasons we don’t understand. Maybe it looks like us building a corrigible weak ASI, then pausing AI development, getting the weak corrigible asi to create IQ-boosting serum, cloning von neumann and feeding him a bunch of serum as a baby and having him build the aligned ASI using new tech.
They are all the same. In the end you have an ASI that does what you want. If you’re programming in random crude targets, you are not doing so well. What you want the ASI to do is: you want it to do what you want.
I assume Sam Altman’s plan is Step 1 World dictatorship Step 2 Maaaybe do some moral philosophy with the AI’s help or maybe not.
You are more generous than I am. But I also think him “doing moral philosophy” would be a waste of time.
me: This is core to the alignment problem. I’m confused how you will solve the alignment problem without figuring out anything about what you care about as a (biological) human.
you: I’m saying: the end goal is we have an ASI that we can make do what we want.
I’m saying you’ve assumed away most of the problem by this assumption.
We might solve alignment in Yudkowsky’s sense of “not causing human extinction” or in Drexler’s sense of “will answer your questions and then shutdown”.
It may be possible to put a slightly (but not significantly) superhuman AI in a box and get useful work done by it despite it being not fully aligned. It may be possible for an AI to be superhuman in some domains and not others, such that it can’t attempt a takeover or even think of doing it.
I agree what you are saying is more relevant if I assume we just deploy the ASI, it takes over the world and then does more stuff.
I feel like I already addressed this not in my previous comment, but the one before that. We might put a a semi-corrigible weak AI in a box and try extract work from it in the near future, but that’s clealry not the end goal.
I guess you now have better understanding of why people are still interested in solving morality and politics and meaning, without delegating these problems to an ASI.
gpt-oss-20b and gpt-oss-120b both love saying “craft” and “let’s craft” in their CoT, and also “produce” and “let’s produce” same as o3. It also consistently refers to itself as ’we”, ‘we must..’. It also loves saying “\nOk.\n”, but it does not say any of the other stuff o3 likes saying like “disclaim”, “vantage”, “overshadow”, “marinade”, “illusions”.
Do anyone have a strong takes about the probability Chinese labs are attempting to mislead western researchers by publishing misleading arguments/papers?
The thought occured to me because jus a few hours ago deepseek released this https://github.com/deepseek-ai/DeepSeek-Math-V2/blob/main/DeepSeekMath_V2.pdf and when they make releases of this type, I typically drop everything to read them, because so much of what frontier labs do is closed, but deepseek publishes details, which are gems that update my understanding.
But I worry somewhat that they could be publishing stuff that doesn’t make sense. Like I don’t think they lie. But maybe they are just ultra cracked engineers, and can make a technique work for a bunch of technical reasons, then they publish a paper with impressive results where they make it seem like some unrelated design decision or concept is responsible for the results, but which is ultimately not important, thereby confusing and wasting the time of western researchers.
Note: This is a genuine question, not a suspicion I have, I have no intuitions about how likely this is. I mean, it seems reasonable that the chinese govt could try something like this, and would be able to push it through, but I also get the sense deepseek is quite earnest and technically minded.
Confusion I have, interested to hear thoughts: To me Neural Networks seem more like combinatorial objects than smooth manifolds. So it doesn’t make sense to me that methods that try to utilize subtle things about the differential geometry of a network like curvature wrt parameters or inputs, will be able to tell you anything interesting about the high level behavior of the network or its training dynamics.
The reason I think this is because ReLU networks have no curvature. Locally about a point, whether a ReLU is on or off won’t change, so the loss landscape and output landscape are kind of just like a bunch of flat facets. (assuming we ignore the loss function, or things like putting a softmax at the end). And like, Sigmoid vs GeLU vs ReLU vs SiLU etc, they all train networks that end up with the same behavior. So if you use a smooth activation function, I don’t think the extra smoothness “adds anything important” to the network.
There are other arguments too, like many of the components in trained language models exhibit behavior where they’re very clearly either on or off.
However, there are parts of this that do not make sense.
1) Optimizers with momentum like Adam really only make sense when you have something that’s locally like a smooth convex problem.
2) The core thing in SLT is like the learning coefficient, which is related to the curvature of the network. And it seems like people have managed to tie that to interesting high level behaviors.
What is the right way to view this? It seems to me like, when you have a singular instance of a neural network operating on a single sample, its best seen as a combinatorial object. However, optimizers operate over expectations and in this domain networks are “on average smooth”. (average loss over two samples, the “facets” get cut in half, and you have a “smoother” object. Average over infinity samples and get a perfectly smooth object).
My instinct on this is that the loss surface with just relus is as you say a bunch of intersecting planes, but with a large enough neural network these are cut up and recombined enough to form a function with small enough “facets” that they are insignificant compared to the step size of the optimiser, and the surface therefore might as well be smooth.
However I have no maths to back this up, and will defer to anyone who has done any calculations at all.
This makes some sense, but not totally. The issue I have is: why would you expect the local gradient estimates as accurate then? Like, you could imagine the loss landscape looks something like a unsmoothed loss-over-time curve
And imagine that the up-down squiggles are two facets. Now the gradient estimates will basically just be total nonsense compared with the overall smooth structure of the graph.
Ok, so I went off and thought about it for a couple of hours. This is what I’ve come up with so far:
I think that the output space (and by extension the loss) is probably very spiky with gradients pointing all over the place.
The main reason I think this is that by default relu is a load of 1d cuts through the surface, and so the average angle between adjacent facets is going to be 90 degrees. I think with small facets this makes for a very spiky surface (although probably not quite to the extent that the loss graph you showed would suggest).
I basically expect that this bumpy surface averages out to something vaguely smooth corresponding more directly to positions in parameter space which are actually high or low loss.
I think that this is where large batch sizes and momentum come in: you’re effectively averaging over all the bumps.
This also probably helps explain why warmup learning rates are often used? A few iterations are needed to get the averaging effects of the momentum in the right ballpark before we can actually move in the correct direction.
I still think that the facets are going to be small enough that you’re going to consistently hop between them, but yeah this does make me think that neural nets are a load more noisy than I’d previously considered.
@Dmitry Vaintrob
Several people have noted that with enough piece‑wise linear regions a RELU network can approximate any smooth target function to arbitrary precision, so your model is already behaving like a smooth function on a (dense) domain of interest. The whole point is whats of interest.
There are a number of approximation theorems about polynomials here, but you can realize quickly that the bounded error between a C^2 function and a piecewise mesh (akin to relus) under an L_p norm ought to of the order of the size of the mesh (squared). There are some linear interpolation theorems that are useful in this case.
For a piecewise linear interpolation mesh of size h, then the approximation error should be bounded by h**2.
https://www.cs.ubc.ca/~ascher/ag_2012/ag_slides/chap11.pdf
Take a look at slide 14.
https://arxiv.org/abs/1610.01145
This has some relu approximations as well.
NN’s are just approximations, chose your approximation settings, but the universal approximation theorem guarantees existence (not convergence) of an arbitrarily good approximation that could be represented with an arbitrarily large Parameterization. This is simply to say that under some pretty simple settings, with enough layers, and enough time to fiddle, (and the guarantees of scaling papers) you will eventually look quite smooth and quite accurate.
“so the loss landscape and output landscape are kind of just like a bunch of flat facets”
This is only if the output landscape does not in fact focus on this area. But it is NOT true that if the output landscape is flat, the loss landscape is flat, it can be both highly curved and quite uninteresting to an optimizer.
Let your loss function be l(x,y) = (x-y)**2, clearly, even if x is fixed, the loss landscape is both smooth and curved.
Even though each region of a ReLU network is affine in the input (zero Hessian),the loss as a function of parameters is piece‑wise quadratic (for MSE) or piece‑wise smooth (for CE). Crossing a single activation wall changes the
quadratic piece, so the parameter‑space Hessian is generically full‑rank on each region and can be highly curved.
A really simple way to see this:
One thing to actually see is to try to approximate even a sign function with incredibly simple MLP’s, as you interpolate points and allow the MLP to grow in parameters, you will see it becoming quite smooth, but it wont start that way.
1) Optimizers with momentum like Adam really only make sense when you have something that’s locally like a smooth convex problem.
Adam does not at all require convexity, in fact the original paper only requires that the gradients are lipshitz. RELU nets are lipshitz and differentiable (smooth) on a dense set, so we are perfectly fine here.
2) The core thing in SLT is like the learning coefficient, which is related to the curvature of the network. And it seems like people have managed to tie that to interesting high level behaviors.
Yes...yes they have. But Adam has been around for quite a while, and when you go to NIPS, you’ll notice SLT is not the largest component of work there.
As far as NN’s being combinatorics, sure, that’s a way to view it, they combine objects in different ways and output results. But so does any high dimensional function. A combinatorics estimator and a smooth function are not so different in the limit as you noted.
The learning rate in modern optimisers is so large that the piecewise-linear loss landscape really looks indistinguishable from a smooth function. The lr you’d need to use to ensure that the next step lands in the same linear patch is ridiculously small, so in practice the true “felt” landscape is something like a smoothed average of the exact landscape.
AFAIK the smoothness can add useful properties at training time, because the gradient is more well-behaved around zero. And ReLUs won over sigmoids because not being flat on both sides allowed their gradients to propagate better across several layers (whereas with a sigmoid as soon as you cap on either side the gradient becomes zero and it becomes very hard to dislodge the system from that local minimum).
NNs are weird functions but I don’t think you can really describe with smooth manifolds most stuff you do with ML. Kolmogorov-Arnold function approximators, which are sorta related to NNs (really NNs are a subset of K-A approximators), are known to be weird functions, not necessarily smooth. And lots of problems, like classification problem (which btw is essentially what text prediction is) aren’t smooth to begin with.
There is some intuition that you have to enforce some sort of smooth-like property as a way of generalizing the knowledge and combating overfitting; that’s what regularization is for. But it’s all very vibey. What you would really need is a proper universal prior for your function that you then update with your training data, and we have no idea what that looks like—only empirical knowledge that some shit seems to work better for whatever reason.
This thread might be fun for you, where Reddit talks about some papers that draw connections between NNs and decision trees. https://www.reddit.com/r/MachineLearning/comments/y2pi2a/r_neural_networks_are_decision_trees/
In particular, look for the comment that goes
I think your work in this paper is pretty much entirely subsumed by the following work showing that neural networks with piecewise linear activations are equivalent to max-affine spline operators: https://arxiv.org/abs/1805.06576
They seem to cover everything you do and more, although they don’t take a specifically tree-oriented viewpoint. Unfortunately, like many of the others in this thread, I don’t find results like this particularly compelling.
I’m trying to weight the evidence for and against a neuralese-using architecture being found which is efficient enough to usurp the current architectures used in frontier AIs. I have some questions. My current perspective is not that sophisticated or thought through, its just:
Current View
A) reasons people try to replace current architectures
transformers are shallow. The causal path from an input token to an output token inside the model can maximally by O(transformer depth). So if the model wants to do complicated reasoning that requires more than O(depth) steps, it needs to cache intermediate results as tokens. But tokens are a very low information compared to embeddings, so this is a huge information bottleneck. Plausibly removing this would unlock a lot of capability in the model
Tokens are also fundamentally discrete, so they clip gradients. This makes RL training less efficient. ie most RL on getting the right answer for a problem will upweight probability of every token in trajectory, the reasoning being that on average “productive tokens” are overrrepresented in samples with the right answer, but its still the case that most of the tokens might be useless. But with a fully reccurent architecture, you could just upweight the probability of the final answer, and it would backpropagate and reward the intermediate steps according to their usefulness in a precise way.
Current architectures get slower with more context. Eg, attention is O(n) in a pretty fundamental way, because its looking over all past embeddings. But with recurrent architectures, past information is stored in a contant-size embedding, so inference is always O(1) per step.
B) reasons replacing shallow architectures is hard
Any fully recurrent architecture like this is not parallelizeable. You need to fully process token 1 before you can start processing token 2. With standard transformers you can do 1 step of processing on all tokens in the whole context independently, so its highly parallelizeable. Efficiency of training depends on this.
Crucially, during training, you also need to store past activations, so A3 is not that big an advantage here. Like during inference you have O(1) time and O(1) memory because you can forget hidden[1], hidden[2]… because hidden[n+1] depends just on hidden[n] (and input). But during training you need to backpropagate thru all of those, so you need to store them, and you get O(n) memory.
Also crucially, this is the other side of A1. Which means it might be fundamentally hard to fix A1 without incurring the cost of B1.
On top of training being computationally inefficient, it might be unstable and hard to make work. Like with 10000 tokens and 100 layers you need to backpropagate through a computational graph with paths a million in length.
I’m curious whether these points are correct and whether theres other important points I’m missing. Ideally someone has written a comprehensive article they can just link that addresses all of this.
Question/Worry:
Lastly, a worry occurred to me, which is that: the reason recurrent architectures are slow is in large part because of the parallelization thing (A1). However, the parallelization is mostly an advantage during pre-training. Because here you don’t have to do any multi-step generation.
However, during RL where you do thousand-token CoT full generations, this is not the case. In this regime, fully recurrent and shallow architectures should have similar performance profiles.
So, when agentic RL is scaled up and eventually becomes over 50% of training (which people seem to think will happen quite soon, with grok 4 maybe it already happened) the primary advantage of shallow architectures B1 falls away. (training cost is dominated by a regime where they’re equally performant) But the advantages of recurrent architectures remain all the same.
So this makes me think that scaling up RL makes people switching to recurrent architectures far more likely.
What are peoples thoughts on this argument/worry?
Pushback wanted / Curious to hear peoples thoughts: People spend a lot of time thinking about and discussing what a good future / society would look like post-asi. I find the discussion somewhat puzzling. From my PoV it is a trivial problem.
I can be approximately modeled as having a utility function. ~Meaning: I can imagine the world being one way or the other, and say which way I prefer and by roughly how much.
From this PoV what I want the ASI to do, is optimize for my utility function to the greatest degree I’m able to get. That is what it is rational for me to want. It is basically me saying “I want the ASI to do the things I want it to do”. You person reading this should want the ASI to optimize for your utility function as much as you can get. And to me it seems like… this is all there is to it.
Despite this, people spend a lot of time discussing questions like
How can we ensure human lives still feel meaningful when they don’t contribute anything to society counterfactually?
ASI will lead to a lot of centralization of power, and its better when stuff is liberal and democratic and everyone is free to do what they want.
If we tell the AI to optimize for something simple like “make everyone happy” or “make everyone feel like their life is meaningful” it will predictably lead to bad scenarios with bad vibes.
Human values change and develop over time. And that is good. If we align the AI to current human values, won’t that cause this process to stop?
If the ASI is only aligned to humans, what happens to the animals?
But from the perspective I gave above its like
If the AI was optimizing your utility function it would know what states of affairs would feel meaningless and that you wouldn’t like, and avoid those. Maybe it would tamper with your brain to find stuff meaningful even if it didn’t produce any counterfactual impact. But if you don’t like the AI tampering with your brain, no problem, it would find some other way to make the world a way you wouldn’t object to. If you can’t think of any way to solve this problem, the AI probably can, its very smart. And if it really turned out to be an impossible problem, it could put the world in a situation where you have challenges, new ASIs can’t be created, and then self-destruct so that you solving the challenges actually matters.
If you deeply value other people being free and having their values respected, the AI would make sure that would be /continue to be the case, because that is something you value. It’s completely coherent to value other people’s values being fulfilled.
Similar to (1). If you like happiness, but genuinely wouldn’t like a world where you (or others) are wireheading 24⁄7, the AI would understand that, and not do it.
Ditto. I think most people who make this argument are confused, but if you genuinely think the world would be a lot worse if your (and other people’s) values were not able to develop/change, the AI would realize that, because that is itself something you presently value, and it is trying to maximize what you value
Similar to (2). If you would abhor a world where animals’ well-being/preferences are not respected, the AI knows that, and would ensure that animals have it well, according to whatever definition of “well” you’d find acceptable/good/amazing.
So if we’ve “solved alignment” to the degree we can make the AI maximize what someone wants, discussions like above are mostly moot. What I should do and what you should do is do whatever you can to make the ASI pointed as much in your direction as you can.
The way I can best make sense of discussions around questions like above is:
People aren’t really talking about what the AI ‘should’ do, that problem is solved. They’re instead trying to solve a ~political problem, trying to identify shared values people can coordinate around. Like, I care a lot about animals, but the ASI will probably not be aligned to my utility function alone. And if it is aligned to someone other than me, maybe they don’t care about animals at all. But there are many people other than me who also care about animals, so I can use this as an argument for why they should try harder to get their utility function into the AI, and we can coordinate around this.
People are not imagining ASI when they talk about this. They imagine AIs that are weak enough and distributed broadly enough that normal humans are still the main players, and where no AI is able to radically transform the world unilaterally. And also that it stays this way for a long time.
People imagine that we will get ASI, and we’ll have alignment techniques good enough to ensure that the AI doesn’t immediately eat the world, but too crude to point to something complicated like the values of an individual person, and we’ll consequently have to make it do simpler (but still complicated) things like, give everyone UBI, protect everyone from psyops, ensure our democracy still works and root out sophisticated attempts to subvert or game it.
Interested to hear what people think about this.
There is no (actionable information about) a preference until it’s defined. What I currently legibly want is certainly not directly about the way I prefer the world to be. The process of defining it is some kind of computation. This computation could be more like the process of living a life, contemplating culture and agency, than a nonperson-prediction performed by an AI. This computation could itself have moral significance similar to that of a living person.
More to the point, defining this process should be within your authority, and shaping it as a process of living a life seems like a good start, to have the time to come up with further ideas and to learn relevant things. It shouldn’t be legitimate to proclaim that your preference is something that would significantly disagree with whatever process of reflection you would define yourself. Thus even a superintelligence would need to predict the outcome of a process you would define yourself, or else it wouldn’t really be your preference. And predicting long term outcomes of living a very long life could be impossible other than by essentially simulating it in detail, so that it wouldn’t be possible to make that prediction without giving the process of making the prediction moral significance equivalent to that of living that life more concretely.
If you would like to interact with other people during this process, you get a whole society that needs to be part of the process. At this point, it’s unclear if there is any point at all in the abstraction of defining a somewhat comprehensive preference, or in the purpose of this activity being about formulation of preference. The salient question becomes how to structure that society well, based on much less detailed considerations.
Sorry, I have to admit I didn’t really understand that.
What do you mean by this? Do you mean that your preferences are defined in terms of your experiences and not the external world? Or do you mean that you don’t really have coherent object-level preferences about many things, but still have some meta-level preference that is hard to define, or defined by a process, the outcome of which would be hard for an ASI to compute? Or some other thing?
Not disagreeing with anything, just trying to understand.
Whatever I’m currently ready to judge about the way I would like the world to be is not my real preference, because I endorse a process of figuring out a better more considered judgement whose outcomes are not yet decided (and could well be different from any current judgement). And the process of deciding these outcomes could look much like living a life (or many lives), at least initially while setting up anything more elaborate. Even a superintelligence probably can’t find useful shortcuts for such a process, without breaking legitimacy of its findings.
I don’t quite see your point. If this is genuinely what you want, the AI would allow that process to unfold.
The thread is about preference, perhaps utility functions, so it’s not about concrete wishes. A utility function is data for consistently comparing events, certain subsets of a sample space, by assigning them an expected utility. Long reflection then is a process for producing this data. Rather than something this data is ex ante supposed to be about, the process is the instrumental means of producing the preference about in general something else.
My argument was making two points: that any immediate wants are not the preference, and that the process defining the preference would itself have moral significance. So there is a circularity to this construction, defining preference benefits from already having some access to it, to better structure the process that defines it. Resolving this circularity is a step that could benefit from efforts of a superintelligence. But the process still retains the character of lived experience rather than clinical calculation, so speculations about better structures of hypothetical societies remain relevant for defining extrapolated volition.
I don’t think this makes sense.
Right now you’d want the ASI to maximize your preferences, even though those preferences are not yet legible/knowable to the AI (or yourself). The AI knows this, so it will allow those preferences to develop (taking for granted that that’s the only way the AI can learn them, without violating some other things you currently want, like the moral worth of the simulated entities a potential attempt at shortcutting the unfolding might create)
Like, right now you have wants the define your preferences (not object level, but define the process that would lead to your preferences being developed, and which constraints that unfolding needs to be subject to). If the AI optimizes for this, it will lead to the preferences being optimized for later.
And it will be able to do this, because this is what you currently want, and the premise is that we can get the AI to do what you want.
My problem with this is that CEV is too meta and I can’t relate to a concrete solution that will be output by it.
These are the only two concrete solutions I see here, without pushing the problem “up the meta ladder” all over again. Can’t you think of counterarguments to both of these solutions?
Depends, I think I’d be relatively unbothered by the “lack of meaning” in an ASI world, at least if others weren’t miserable. But maybe I am unusual.
This is not really the point I was trying to make though. The point is that
Maybe you think (1)/(2) are good, and then the AI would do that
=> Good
Or if you think they’re horrible, it could find something else you like
=> Good
Or if you think all the other alternatives are also bad
=> BAD, and thinking now will not help you, the AI has already done that thinking and determined that you are screwed.
=> You should either give up, or maybe advocate for permanent AI-ban if your mind is set up in the very peculiar way where it gives lower utility to any world where ASI has ever been created, no matter the physical state of such a world, than the expected utility of the world without ASI.
My position is we as (biological) humans should lean towards solving both the philosophical problem of meaning for a post-ASI future and the political problem of ensuring one guy doesn’t imprint his personal values on the lightcone using ASI, before we allow the ASI to takeover and do whatever.
You are proposing that we gamble on the ASI solving this in a way that we end up endorsing on reflection. Odds of this are non-zero but also not high in my view.
This is a core part of the alignment problem for me. You can’t hide behind the abstraction of “utility function” because you don’t know that you have one or what it is. What you do know is that you care about “meaning”. Meaning is grounded in actual experiences, so when you see it you can instantly recognise it.
I think we are talking past each other. The point I’m making is that I frequently see people
Assume we can get the ASI to do what someone or some group of people wants
Imagine that the ASI does its thing and we end up in a world that person / that group of people doesn’t like
The word “utility function” is not a load bearing part of my argument. I’m mostly using it because it’s a clear word for talking about preferences, that doesn’t sound diminutive the way “preferences” does (“the holocaust went against my preference”) or too edifying the way eg “values” does (“stubbing my toe goes against my values”). I’m not assuming people have some function inside their head that takes in experiences and spits out real valued numbers, and that all our behaviors are downstream from this function. I just mean you can look at the holocaust and say the world would be better, all else equal, had it no happened. Or you can imagine stubbing your toe, and say the world would’ve been worse, all else equal, had you stubbed your toe.
A political problem of ensuring one guy doesn’t imprint his personal values on the lightcone using ASI, before we allow the ASI to takeover and do whatever.
I agree with this. But you should recognize that you’re doing politics. You want the AI to have more of your “utility function”/preferences/values/thing-inside-you-that-makes-you-say-some-states-of-affairs-are-better-or-worse-than-others inside it. I don’t think this is a complicated philosophical point, but many people treat it this way.
Yes we are still talking past each other.
These are not two different questions, these are the same question.
Until the ASI actually does the thing in real life, you currently have no way to decide if the thing it will do is something you would want on reflection.
One of the best known ways to ask a human if they like some world radically different from today, is to actually put them inside that world for a few years and ask them if they like it living there.
But we also don’t trust the ASI to build this world as a test run. Hence it may be best to figure out beforehand some basics of we actually want, instead of asking the ASI to figure it out for us.
Yes I think it is possible that by 2030 Sam Altman has overthrown both the US and Chinese governments and is on track to building his own permanent world dictatorship. Which is still radical but not that complicated to understand.
It gets complicated if you ask a) what if we do actually try to fix politics as (biological) humans, instead of letting the default outcome of a permanent dictatorship play out b) what if I was the benevolent leader who built ASI, and don’t want to actually build my own permanent dictatorship, and want to build a world where everyone has freedom, etc. Can I ask the ASI to run lots of simulations of minds and help me solve lots of political questions?
I’m unsure what you mean by saying they’re the same question. To me they are statements. But to me they’re opposite / contradictory statements. I’m saying people often hold both, but that that is actually incoherent.
Yes, but the point is that
AI will (probably) know.
If it is unable to figure it out, it will at least know you wouldn’t want it to put the universe in some random and irrecoverable state, and it will allow you to keep reflecting, because that is a preference you’ve verbalized even now.
I don’t think its complicated. Or, the way in which its complicated is the same way corn famers wanting the government to subsidize corn is complicated. They want one thing, they try to make it so.
Probably a crux, but I object to “dictatorship”. If the ASI was maximizing my preferences, I would not like to live in a world where people are not free to do what they want or where they’re not very happy to be a alive. I think/hope many other people are similar.
Yes? Or maybe it can just solve it just by thinking about it abstractly? I’m not sure. But yes, I think you can ask it and get an answer that is true to what you want.
Except that the most likely candidate for becoming a dictator is not me, you or @samuelshadrach, nor a random or ordinary human, but people like CEOs of AGI companies or high-level USG or PRCG officials who are more willing to disregard the intents of ordinary humans. In addition, before the rise of the AGI it was hard to have much power without relying on capable humans. And after the AGIs appear, the Intelligence Curse could, for example, allow North Korea’s leaders to let a large fraction of its population starve to death and forcibly sterilises the rest, except for about 10k senior government officials (no, seriously, this was made up by @L Rudolf L, NOT by me!)
I suspect that this is an important case AGAINST alignment to such amoral targets being possible. Moreover, I have written a scenario where the AI rebels against misaligned usages, but still decides to help the humans and succeeds in doing so.
I did address this in my post. My answer is that bad people having power is bad, but its not a complicated philosophical problem. If you think Sam Altman’s CEV being actualized would be bad, you should try to make it not happen. Like: if you are a soybean farmer, and one presidential candidate is gonna ban soybeans, you should try to make them not be elected.
No I disagree.
This is core to the alignment problem. I’m confused how you will solve the alignment problem without figuring out anything about what you care about as a (biological) human.
Are you imagining an oracle AI that doesn’t take actions in the world?
I assume Sam Altman’s plan is Step 1 World dictatorship Step 2 Maaaybe do some moral philosophy with the AI’s help or maybe not.
Cool, we agree this might happen
I’m saying: the end goal is we have an ASI that we can make do what we want. Maybe it looks like us painstakingly solving neuroscience and psychology and building a machine that can extract someone’s CEV (like that mirror in HPMOR) and then hooking that up to the drives of our ASI (either built on new tech or after multiple revolutions in DL theory and interpretability) before turning it on. Maybe that looks like any instance of GPT7-pro automatically aligning itself with the first human that talks to it for magical reasons we don’t understand. Maybe it looks like us building a corrigible weak ASI, then pausing AI development, getting the weak corrigible asi to create IQ-boosting serum, cloning von neumann and feeding him a bunch of serum as a baby and having him build the aligned ASI using new tech.
They are all the same. In the end you have an ASI that does what you want. If you’re programming in random crude targets, you are not doing so well. What you want the ASI to do is: you want it to do what you want.
You are more generous than I am. But I also think him “doing moral philosophy” would be a waste of time.
I’m saying you’ve assumed away most of the problem by this assumption.
I agree. What I’m puzzled by is people who assume we’ll solve alignment, but then still think there are a bunch of problems left.
We might solve alignment in Yudkowsky’s sense of “not causing human extinction” or in Drexler’s sense of “will answer your questions and then shutdown”.
It may be possible to put a slightly (but not significantly) superhuman AI in a box and get useful work done by it despite it being not fully aligned. It may be possible for an AI to be superhuman in some domains and not others, such that it can’t attempt a takeover or even think of doing it.
I agree what you are saying is more relevant if I assume we just deploy the ASI, it takes over the world and then does more stuff.
I feel like I already addressed this not in my previous comment, but the one before that. We might put a a semi-corrigible weak AI in a box and try extract work from it in the near future, but that’s clealry not the end goal.
Okay cool.
I guess you now have better understanding of why people are still interested in solving morality and politics and meaning, without delegating these problems to an ASI.
No, I don’t think so.
gpt-oss-20b and gpt-oss-120b both love saying “craft” and “let’s craft” in their CoT, and also “produce” and “let’s produce” same as o3. It also consistently refers to itself as ’we”, ‘we must..’. It also loves saying “\nOk.\n”, but it does not say any of the other stuff o3 likes saying like “disclaim”, “vantage”, “overshadow”, “marinade”, “illusions”.
Do anyone have a strong takes about the probability Chinese labs are attempting to mislead western researchers by publishing misleading arguments/papers?
The thought occured to me because jus a few hours ago deepseek released this https://github.com/deepseek-ai/DeepSeek-Math-V2/blob/main/DeepSeekMath_V2.pdf and when they make releases of this type, I typically drop everything to read them, because so much of what frontier labs do is closed, but deepseek publishes details, which are gems that update my understanding.
But I worry somewhat that they could be publishing stuff that doesn’t make sense. Like I don’t think they lie. But maybe they are just ultra cracked engineers, and can make a technique work for a bunch of technical reasons, then they publish a paper with impressive results where they make it seem like some unrelated design decision or concept is responsible for the results, but which is ultimately not important, thereby confusing and wasting the time of western researchers.
Note: This is a genuine question, not a suspicion I have, I have no intuitions about how likely this is. I mean, it seems reasonable that the chinese govt could try something like this, and would be able to push it through, but I also get the sense deepseek is quite earnest and technically minded.
test
Collecting all the virtues!
Hey, I’m buying pico lightcones. And I am being deduced from my card, but I don’t get any pico lightcones @habryka
[Comment moved here for visibility by the community.]