I mean, this is exactly the affordance a gradient hacker wants, but if you could build a model that is not- engineering capable but aligned enough and wise enough, gradient hacking like this is a good strategy.
Mis-Understandings
If you thought simulated humans were moral patients, you might decide not to run history simulations. That being said, I don’t think that is an absolute rule.
I also realized something else. In my mind, if acting rationally the strategic framework for the government confronting protestors is coextensive with the dynamics of political violence and occupation. The governments strategy options, goals, and constraints are basically the same, and so they should be taking many of the same sorts of actions, and they are basically continuums of the same thing. The fact that protestors often settle for small concessions is a reflection of their real strategic situation, and is a success for the protestors (they get their object), and sometimes also the government (they still exist)
I realized we are in the weeds. Since we both agree that states will be humbled, not destroyed, we should not see survivorship bias shape what sorts of states remain.
The ability of either (the guerilla successors/stay-behind forces of the government) or (the government against an anti-war revolution) to win an occupation are exactly the structural forces that constrain the opposition from attempting state destruction.
In Iran, the reason why they stepped back from attempting regime change (which is absolutely worse for Iran than just losing air defenses) is that they realized that there was not enough support to win an occupation with boots on the ground, and the antigovernment protestors now have the initiative to push for that or not.
In Ukraine, Russia definitely tried regime change, and lost. This is what the 3 mile long tank columns at Kyiv were about, why it was a 3-day special military operation. You can’t occupy territory in 3 days, (because Ukraine can just counter-attack) the only possible war aim is regime change as a fait-accompli. The fact that Ukraine did not collapse as a government is determined by basically the same factors as civil war.
In Syria, they achieved one maximalist war aim, (the destruction of the Assad regime), because it lost the hybrid (regular civil)/(guerilla) war with the Islamist rebels mostly. They were not the US preferred faction, but neither Israel nor the US are willing to uproot them, so they will stay, and have to deal with sectarian violence by themselves.
There was basically a Color Revolution in Nepal in 2025. There was an attempt at protests to overthrow the Iranian government about 3 months ago, and they failed. Those protests are a large shaping factor of the current war. That is to say, the constraining factor preventing people from “just shooting protestors”, is that it leads to fighting an occupation, which governments lose somewhere between 5 and 20 percent of the time, which is a lot. In the absence of the willingness to use violence, the best tools at disrupting protests involve willingness to give concessions, that is, they give protestors political power. If you are not willing to either give concessions or use violence, your options are worse, and generally involve letting the protestors set up a pseudo-government for as long as they are willing to, which can become exactly as dangerous and disruptive as it sounds.
I do think that these factors means that interstate conflict between rational actors will mostly push wars to have limited objectives. The problem is that negotiated settlements strictly dominate those wars, and so truly rational actors should always find a settlement for at least those issues.
I don’t know, the chemical signaling paths for a human brain do look a lot like steering vectors. So the human branch is just not really convincing to me.
The other thought is if we expect states to care mostly about resource goals, we should expect them to find diplomatic solutions, (in which case democracies credible commitment mechanisms matter). I am very convinced by the crisis bargaining perspective, in which rational states should be expected to find settlements short of war. We see the largest wars over ideological factors, under this model, because they disrupt this bargaining model in various ways. If you just want access to some resource, especially if the defender can destroy it, it very rarely makes sense to go to war for it (since you can almost always simply threaten too, then get what you want at the bargaining table). So we should find resource based goals, even with opportunism, don’t lead to war.
I think that there is no amount of robots that can secure the mines against a truly hostile population, unless you are basically willing to kill them all. So you still need legitimacy, and there is not a way to separate legitimacy from access to resources against a popular and violent veto. Basically, I think that if there is a local population, legitimacy is a necessary condition for the security of resources and robots, that there is no practical number of AI overseers and robot enforcers that can guarantee this so long as you are not willing to do mass killing/displacement.
If you have resource based goals, there is basically no reason to invade anywhere unless you can generate legitimacy or are willing to actually kill or displace all the current inhabitants. Unless you do that, or something just as obviously unpopular or severe, there is no amount of robots that can solve the fundamental challenge of insurgency without reliance on legitimacy. Those insurgents can basically deny you access to most of the resources of the area.
Military effectiveness is not just one thing, and your enemy chooses what part to test. Your enemy can force the coupling of legitimacy and military effectiveness, and so even having resource orientated goals cannot help you unless you use the fact that you value things differently to produce settlements that look like trade.
I think you have me on the wrong side. If the existential struggle for a state always takes the form of insurgency on it’s home territory, or a diplomatic settlement, then the primary factors that matter for state survival are legitimacy and ability to transfer power. Democracies are great at that. Basically, the argument for the reasons that you don’t see democracies in civil wars as often also make them very strong at winning them in home territory, which makes their displacement a remote possibility. Basically, we should not see the end to popular (that is supported by the populace) systems of government ever, because protests can always force you to fight an occupation or lose control of the territory, and the main factor for winning those struggles is always the popular will.
In both those cases, if the existing state wins the occupation, they definitely will survive. That requires riflemen and legitimacy, same as before. If Iran keeps their rifles loyal, you don’t get a Syria style outcome. You just force the Iranians to build more missiles for a time (or give up on missiles).
That sort of war can limit the ability of states to project power, but it can only acquire territory by fighting an occupation. I think Russia will have a bad 10 years in whatever they keep of the Donbass, unless they have produced mass displacement, fighting an occupation.
In Israel/Iran, the maximal war aim is to force the Iranian to fight an occupation. If they are legitimate enough to not need to do so, there is not much you can do with bombs. (I mean you can destroy bombs so they can’t bomb you, and you can make them poor, but that is not useful in itself)
Thus you can destroy power projection, but not create political change.
You also can’t dismiss protests. They can force you to fight an occupation whether you want to or not. And occupations still rely on rifles.
I don’t think this analysis works, because if we imagine that democracies exist because of the distribution of military power, the binding constraint on the destruction of states for profit is at this point not the ability to win an actual war, but the ability to win the occupation afterwards, which modern guerilla tactics have made excessively costly even in an ideal case.
For civil wars, you are even more hard stuck in that just because you know who opposes you does not make it profitable to kill them. That is to say, this analysis overestimates how good drones and robots are at occupations, because fundamentally the hard part of an occupation is to convince the conquered of the legitimacy of the government. Without that, you basically cannot draw any sort of power from modern economies, so why do the occupation. If you already are in control, and the civil war breaks legitimacy, you are still hard stuck.
That is to say, for the rifleman to be displaced the displacer would have to be able to win civil wars and occupations, currently the only successful strategy is truly dense occupation with riflemen and police, and I don’t see the argument that robots are any better at convincing people to go along.
If there is weirder physics, such that FTL or relaxations to the laws of thermodynamics are possible, I assume that the estimation increases. Then again, under those conditions there may not be a finite upper bound.
There is another two verification routines, which is looking over shoulders at internal documents, and banning new releases.
There is also just checking for the presense of the registered weights in the gpu’s local rap, which produces a memory tax big enough that there is not space to fit the intermediate values of the gradient. This requires that the monitors have code on the running machines, but periodic memory dumps can be a surveilance mechanism by verifying that all models on gpus match a model known to be already trained, which stops new initialization and thus new runs.
That is to say, this is assuming that there is a difference in type between the sorts of heuristics that a pretrained and not superhuman LLM will reach for, and those necessary to be superinteligent. There is always the chance that you just select for regular engineering, but you always reach for the right branch first. Since the right branch is also one that the regular persona would have generated, then the number of bits of selction towards danger is at most the number of bits of selection between a safe and a RLed persona.
This model has personas as moral up until the RL step that makes them sufficiently inhuman.
the base LLM has never seen a superintelligence in its pre-training corpus
Is “The LLM, but lucky on sampling”, something not in the corpus. It seems that that is exactly the corpus GRPO generates.
They could. In the scenario where they do not, you get Japan like problems of not enough inflationary firepower. To do that with confidence requires the tax base to support it, which means as labor share of income drops, taxing the capital share of income as well.
Taxing the capital share of income is not the end of the world, it is a relatively normal policy, but we would have to generate the political will to actually do so, the government must be able to credibly collect those taxes.
If we think the funds for helicopter money come from government borrowing, and the bond market breaks due to unexpected deflation, the government cannot get the money to pay for the inflation to get the market to get the money (this is what is meant by systemic risk), or default
If we think the funds come from future taxes, if the capital share of income increases and the willingness to consume from accumulated capital drops, the government must increase taxes on the income streams of capital or labor to make up consumption
If you think MMT is correct, and the government can just do helicopter, they will have to do that to keep equilibrium.
All of those cases (helicopter money and capital taxation) as specific policies are major policy shits. There is always a chance that does not happen.
There is a big difference between “The Fed Can” and “The Fed Will”, and if you are at 95% odds of the feds getting it right, that is still 5% of not fixing it. That is what is meant by left tail risk.
Remember the top line, where they talked about this being a model of left tail risk (for investors).
They are predicting specifically that there is reasonable odds of a case where the stimulus is underbaked and too late, causing significant deflation, and a large number of past claims on future economic activity do not survive the restructure, and so every actor taking interest rate risk is at risk of getting blown out by deflation, or taking huge nominal losses from a default due to their counterparties getting blown out by deflation.
They are explicitly predicting that if there is massive AI productivity growth, there is no guarantee that the government will take the opportunity to print and spend to generate enough inflation to prevent the breakdown of basically all loans to extreme deflation. Somebody who found a way to exit into something safe from extreme deflation was fine, but everyone who misidentified it is soaked.
it might be that the only safe asset is datacenter shares, or something, and everybody who is not in that has whatever they are holding deflated to nothing.
The deal they predict looks like
We play musical chairs with all the claims on future production
There is way more future production to claim.
Even with a fairness guarantee, that can still be scary.
The counterargument against continous tokens being passed forwards is that if you want to use neuralese, you have to give up sampling, since the big idea of latent reasoning is to not pass through the random discretization of sampling a token. But random discretization is itself powerful, especially with the possibility of a useful bias. If you give it up, the model becomes deterministic, so it can’t use Best of N. If Best of N or tree search on chains of thoughts is really important, either in training or in deployment, that is something that is not really compatible with the latent paradigm, in addition to the difficulty of training data.
The argument against semantic drift/Thinkish is extremely weak, and we should expect semantic drift when training with self play without countermeasures.
While Blocks are older, syntax highlighting is much newer. I am not sure that counts.