Are the refusals of the type, “I don’t know” or of the type “This is not a task I consistently know” or are they of the type “This is something that I think is against guidelines”
Mis-Understandings
How could we tell the world in which they are doing stylometry suppression from the world in which they used to do explicit stylometry and have now stopped, or stopped equivalent training leaks.
The correct test is both whether model performance on individual tasks correlates, and whether tasks have run to run pass rates other than zero and 1. Doing that analysis suggests that there is a subset of hard tasks, but that subset is pass sometimes as well as pass never.
How do we know in this model if the identity of failed subtasks is constant, that there is a subset of subtask such that P percent of subtasks are passed with probability zero, vs all subtasks are uniform, but the hazard rate comes from a chance of unrecoverable failure on each individual task step.
Remember that the main graph is 50% pass rate over resampling, so it would be the correlation of supertasks to resampling that would tell you of few hard subtasks vs lots of not so hard ones with nonzero hazard to every task.
The deeper problem is that the benchmark works by aggregating over these units tests, but a threshold is the wrong sort of aggregation here, instead we would really want to visualize the full distribution of unit test passes and samples, and so the benchmark is too convex.
You can get any benchmark to be sharper by just saying, take n questions, if you get any wrong you get scored as failing, but then it will have a sharper sigmoid past the critical point, so it is not actually a useful benchmark to do so except to visualize.
Right, it is not that there is a first critical try. It is that even if you passed the first critical try, and got an aligned AI/ or a restrained AI system, there would be a second, and a third, and a forth, and while you might have resources from the previous tries, you need the probability of each individual event to shrink faster than 1/n in possible events to never have one event if you cannot stop yourself from taking events, and probabilities of each event are independent, by the divergence of the harmonic series.
So for any outcome that could be a consequence of an event, if you want it to not happen you have two options. 1. learn so fast that you take only a finite chance of it happening or 2. Take a finite number of events, and start with a big N
They were under some time pressure, if the battery goes true flat the probe is permanently dead, as NiCd batteries do not survive overdrain.
While Blocks are older, syntax highlighting is much newer. I am not sure that counts.
I mean, this is exactly the affordance a gradient hacker wants, but if you could build a model that is not- engineering capable but aligned enough and wise enough, gradient hacking like this is a good strategy.
If you thought simulated humans were moral patients, you might decide not to run history simulations. That being said, I don’t think that is an absolute rule.
I also realized something else. In my mind, if acting rationally the strategic framework for the government confronting protestors is coextensive with the dynamics of political violence and occupation. The governments strategy options, goals, and constraints are basically the same, and so they should be taking many of the same sorts of actions, and they are basically continuums of the same thing. The fact that protestors often settle for small concessions is a reflection of their real strategic situation, and is a success for the protestors (they get their object), and sometimes also the government (they still exist)
I realized we are in the weeds. Since we both agree that states will be humbled, not destroyed, we should not see survivorship bias shape what sorts of states remain.
The ability of either (the guerilla successors/stay-behind forces of the government) or (the government against an anti-war revolution) to win an occupation are exactly the structural forces that constrain the opposition from attempting state destruction.
In Iran, the reason why they stepped back from attempting regime change (which is absolutely worse for Iran than just losing air defenses) is that they realized that there was not enough support to win an occupation with boots on the ground, and the antigovernment protestors now have the initiative to push for that or not.
In Ukraine, Russia definitely tried regime change, and lost. This is what the 3 mile long tank columns at Kyiv were about, why it was a 3-day special military operation. You can’t occupy territory in 3 days, (because Ukraine can just counter-attack) the only possible war aim is regime change as a fait-accompli. The fact that Ukraine did not collapse as a government is determined by basically the same factors as civil war.
In Syria, they achieved one maximalist war aim, (the destruction of the Assad regime), because it lost the hybrid (regular civil)/(guerilla) war with the Islamist rebels mostly. They were not the US preferred faction, but neither Israel nor the US are willing to uproot them, so they will stay, and have to deal with sectarian violence by themselves.
There was basically a Color Revolution in Nepal in 2025. There was an attempt at protests to overthrow the Iranian government about 3 months ago, and they failed. Those protests are a large shaping factor of the current war. That is to say, the constraining factor preventing people from “just shooting protestors”, is that it leads to fighting an occupation, which governments lose somewhere between 5 and 20 percent of the time, which is a lot. In the absence of the willingness to use violence, the best tools at disrupting protests involve willingness to give concessions, that is, they give protestors political power. If you are not willing to either give concessions or use violence, your options are worse, and generally involve letting the protestors set up a pseudo-government for as long as they are willing to, which can become exactly as dangerous and disruptive as it sounds.
I do think that these factors means that interstate conflict between rational actors will mostly push wars to have limited objectives. The problem is that negotiated settlements strictly dominate those wars, and so truly rational actors should always find a settlement for at least those issues.
I don’t know, the chemical signaling paths for a human brain do look a lot like steering vectors. So the human branch is just not really convincing to me.
The other thought is if we expect states to care mostly about resource goals, we should expect them to find diplomatic solutions, (in which case democracies credible commitment mechanisms matter). I am very convinced by the crisis bargaining perspective, in which rational states should be expected to find settlements short of war. We see the largest wars over ideological factors, under this model, because they disrupt this bargaining model in various ways. If you just want access to some resource, especially if the defender can destroy it, it very rarely makes sense to go to war for it (since you can almost always simply threaten too, then get what you want at the bargaining table). So we should find resource based goals, even with opportunism, don’t lead to war.
I think that there is no amount of robots that can secure the mines against a truly hostile population, unless you are basically willing to kill them all. So you still need legitimacy, and there is not a way to separate legitimacy from access to resources against a popular and violent veto. Basically, I think that if there is a local population, legitimacy is a necessary condition for the security of resources and robots, that there is no practical number of AI overseers and robot enforcers that can guarantee this so long as you are not willing to do mass killing/displacement.
If you have resource based goals, there is basically no reason to invade anywhere unless you can generate legitimacy or are willing to actually kill or displace all the current inhabitants. Unless you do that, or something just as obviously unpopular or severe, there is no amount of robots that can solve the fundamental challenge of insurgency without reliance on legitimacy. Those insurgents can basically deny you access to most of the resources of the area.
Military effectiveness is not just one thing, and your enemy chooses what part to test. Your enemy can force the coupling of legitimacy and military effectiveness, and so even having resource orientated goals cannot help you unless you use the fact that you value things differently to produce settlements that look like trade.
I think you have me on the wrong side. If the existential struggle for a state always takes the form of insurgency on it’s home territory, or a diplomatic settlement, then the primary factors that matter for state survival are legitimacy and ability to transfer power. Democracies are great at that. Basically, the argument for the reasons that you don’t see democracies in civil wars as often also make them very strong at winning them in home territory, which makes their displacement a remote possibility. Basically, we should not see the end to popular (that is supported by the populace) systems of government ever, because protests can always force you to fight an occupation or lose control of the territory, and the main factor for winning those struggles is always the popular will.
In both those cases, if the existing state wins the occupation, they definitely will survive. That requires riflemen and legitimacy, same as before. If Iran keeps their rifles loyal, you don’t get a Syria style outcome. You just force the Iranians to build more missiles for a time (or give up on missiles).
That sort of war can limit the ability of states to project power, but it can only acquire territory by fighting an occupation. I think Russia will have a bad 10 years in whatever they keep of the Donbass, unless they have produced mass displacement, fighting an occupation.
In Israel/Iran, the maximal war aim is to force the Iranian to fight an occupation. If they are legitimate enough to not need to do so, there is not much you can do with bombs. (I mean you can destroy bombs so they can’t bomb you, and you can make them poor, but that is not useful in itself)
Thus you can destroy power projection, but not create political change.
You also can’t dismiss protests. They can force you to fight an occupation whether you want to or not. And occupations still rely on rifles.
I don’t think this analysis works, because if we imagine that democracies exist because of the distribution of military power, the binding constraint on the destruction of states for profit is at this point not the ability to win an actual war, but the ability to win the occupation afterwards, which modern guerilla tactics have made excessively costly even in an ideal case.
For civil wars, you are even more hard stuck in that just because you know who opposes you does not make it profitable to kill them. That is to say, this analysis overestimates how good drones and robots are at occupations, because fundamentally the hard part of an occupation is to convince the conquered of the legitimacy of the government. Without that, you basically cannot draw any sort of power from modern economies, so why do the occupation. If you already are in control, and the civil war breaks legitimacy, you are still hard stuck.
That is to say, for the rifleman to be displaced the displacer would have to be able to win civil wars and occupations, currently the only successful strategy is truly dense occupation with riflemen and police, and I don’t see the argument that robots are any better at convincing people to go along.
I don’t know why, but the fact that the model was good at it makes explicit training not implausible, the most likely source is that if you just place in text it is very often labeled by author, and they might have scrubbed that for data quality reasons, because they don’t actually care, and stylometry came from a transfer from things they did for other reasons and stopped.
I was saying mainly that even though explicit training was not the likely prior case, we can only tell that they probably reduced training direction towards stylometry through suppression , removing post training that helped, or removing pretraining structures that helped, and not their previous position on that axis. It might have been the case that they were limiting stylometry before a little, and are now doing so a lot or a lot more effectively.
For instance, if they moved to more synthetic data stylometry might have gotten hit as a side effect because the human corpus shrunk, and so precision and recall went down enough that it got hit by honesty training.