I have long argued that models will either be uncensored or useless, and that there probably isn’t much between those poles.
A dull knife is more dangerous than a sharp one.
I have long argued that models will either be uncensored or useless, and that there probably isn’t much between those poles.
A dull knife is more dangerous than a sharp one.
Someone who is broadly anticapitalist making the case for the government nationalizing a new capital intensive industry doesn’t seem that surprising.
There have been job postings here: https://www.lesswrong.com/posts/QCrT2DTJNfvuqtppB/ai-safety-s-biggest-talent-gap-isn-t-researchers-it-s
Are you trying to hire someone for something, or are you hoping this is a good place to network?
What’s the best way to bet against your assumption that a pause happens in 2027?
I asked about this a while ago, thank you for posting the analysis: https://www.lesswrong.com/posts/WtqD9pehq8p83cesT?commentId=wfBAcwn6Efe3mrGBF
Pickup artists of the 2010s referred to these as ‘infields’.
If you look for videos using that term, you’ll find a mixture of people recording people who didn’t know or want to be recorded...and paid actors. I think anyone who might benefit from these sorts of descriptions is likely going to either be unable to tell the difference between real and fabricated stories, or too different from the storyteller to directly apply the lessons.
I’m not sure where the line between antivax moron and brave citizen scientist sits, but it looks blurry to me.
In this particular scandal, facebook mom groups compared anecdotal reports of symptoms, tried to understand science they weren’t educated in, and harassed surgeons at a convention: https://en.wikipedia.org/wiki/Essure
It ultimately led to Bayer removing a product from the market.
I very much enjoyed Inadequate Equilibria, in an ‘adequate’ world, the facebook mom groups should never be right about anything, but in the world we occupy...
So I tried having a conversation with 4.7 Opus about this and it cut me off (transcript available). So, an Anthropic model doesn’t like me thinking about Anthropic policy. I’m sure that’s fine and not a step on a road to somewhere stupid.
The longer they keep Mythos locked up, the more unstable and coup-promoting the situation becomes. By keeping a powerful asset to themselves, it becomes unique (like say, a radio station in 1969 Libya). This can go in three ways, and only three ways (Naunihal Singh’s coup-as-coordination-game framework applies here):
- Their unique asset becomes subordinate to an actor in the present power structure who wishes to maintain their power (counter-coup), or an actor who wishes to gain power (coup)
- They themselves attempt to seize power (a la a media executive such as Berlusconi using his media empire to gain political power)
- The powerful asset is no longer unique (either Mythos is widely released, or a competing lab catches up and releases their stuff widely).
The longer Mythos stays private, the larger the asymmetry it represents becomes, and the more pressure on the status quo to collapse to one of the above points. Exploits that are private increase in value, exploits that are public decrease in value; the same applies to a magical exploit-finding machine. Google Project Zero gives vendors 90 days before going public. They made an exception for Spectre, but the norm still operates on a clock. The longer Mythos-class cyber models remain closed, regardless of amendments to the Glasswing constitution, the more destabilizing to everything else they become, and the stronger the incentive for outside actors to pressure, co-opt, or infiltrate the lab.
A lot of infrastructure is currently vulnerable, as any hacker knows (hi!) it’s possible for motivated 20-somethings with commodity tools to break into uncomfortably important things, the exploit itself is rarely the pivotal action, and often when an exploit is needed, a publicly available or easily fuzzable exploit in a specific niche system on the target is suitable.
A capable model in the hands of currently overworked defenders would probably move the needle in a positive direction for anyone who tried to use it, and Mythos-derived exploits would lose value very quickly if everyone could generate them.
Glasswing’s membership criteria, as well as the selection criteria for the people inside Anthropic who have access to Mythos outputs would be very interesting, it’s basically a constitution trying to create a sovereign entity, and the exploits mythos finds in the hands of those people (currently private) could be fantastically valuable. Is Project Glasswing ensuring that any participant who finds an exploit in something interesting is actually sharing it, and not instead stockpiling it or selling it?
What if an Anthropic employee stumbles across a challenging zero day in a massive pile of Claude Code agent spam, would another Anthropic employee notice? What happens if that employee sells that exploit to an exploit broker? Would Anthropic notice? What if the person doing this is someone who maintains a widely used open source project, finds an exploit with Mythos, and decides that he is tired of being poor and does the same?
https://www.lesswrong.com/posts/DQKgYhEYP86PLW7tZ/how-factories-were-made-safe#jQBYeQq5ZqnLr24Ax I’m happy to see this post, I’ve been hoping someone would take this and run with it for years.
I think STAMP and the related CAST/STPA methodologies are going to be way more effective at preventing bad outcomes than the current methods.
Max Tegmark, Consciousness as a State of Matter might be interesting to you https://arxiv.org/abs/1401.1219
The defeat of the Maginot Line is somewhat misunderstood in general (but not in ways that undermine this argument). German technical overmatch played a significant role. There were two plans for defeating it. The first is best detailed in Adm McRaven’s 1993 masters thesis on the theory of special operations: https://www.afsoc.af.mil/Portals/86/documents/history/AFD-051228-021.pdf
The fortress of Eben Emael in Belgium was the hardest part of the line. It had artillery, built into bunkers, pointed at a key bridge. The Germans invented a man portable explosive that could destroy the bunkers, and trained glider-borne forces that could take the fortress by surprise. The germans succeeded and drove across the bridge.
If the Germans had failed in their attempt to take the fortress, their backup plan was a direct assault on the Maginot line using shells filled with Chlorine Trifluoride to set the concrete on fire. https://www.chemeurope.com/en/encyclopedia/Chlorine_trifluoride.html
In terms of the overall thesis, I think it persuades in the opposite of the intended direction. A lot of political challenges are like this, whether it’s the environment, certain construction projects, or passing certain kinds of laws. Most of life is oneshot in this sense.
Irrevocable decisions can be attractive when you know your time in power is fleeting.
If there’s a premium to be paid for taking a risk that you externalize onto ‘the whole biosphere’ or ‘the future survival of the human race’, someone who wants to take that premium is certain to emerge eventually.
So...‘hey if we get this right, we are rich and all our problems are solved by the god robot, if we get it wrong, we all die (and therefore don’t have to worry about these problems)’ is likely to be seen as an argument in favor of taking the risk.
Ah dang, guess my proposal to try for a born classified model has already been considered. I’m updating off this that we might not be able to point counterproliferation infrastructure at AI datacenters.
I use them.
I started during covid when I saw TP hoarding on the news. Prior to that, I was satisfied with the status quo and uninterested in trying a new hygiene scheme, as the switching risks were small but not fully understood.
I don’t use a fancy one with a million settings and a seat warmer, just a ‘twist lever to spray, twist the other way to spray at a slightly different angle’ that’s hooked to the toilet water supply with a y-splitter.
I also recommend using one of many options to elevate the feet while sitting.
The thread that comment came from was contentious, I got a lot of pushback here and elsewhere during the early GPT days for my opinion that transformers would be able to output interesting math.
Two years later when 3.5 was out, I felt that my ‘interesting’ threshold had been crossed and I had been technically correct, but was still hearing the same arguments. I’m happy that six years on, we have proof that my assessment of the potential of transformers, which to be clear, was absolutely viewed as ‘evidence that this person is crazy in a way that makes me want to avoid him’, was close to accurate.
From a meta perspective, this post is probably not helping me appear sane.
If you want to kill modern AI using existing law and have friends in the correct government offices, it should be fairly straightforward to do so without new law
This legal category is very aggressively defined: https://en.wikipedia.org/wiki/Restricted_Data
It was written to mean ‘if someone draws a working design for a nuclear bomb or certain kinds of nuclear material production equipment anywhere, that data is a state secret, regardless of the source of the information used to produce it’. This is commonly referred to as ‘born classified’. There are a good 70+ years of arguments about whether this is a good law, but that is the law.
Therefore, here is your process:
-find an AI model that you reasonably believe is capable of outputting something the government will view as a classified fact related to nuclear weapon design. Edit: you should probably build it yourself by either training from scratch or fine-tuning an open model.
-send the model weights, installation instructions, and a letter to the DOE Office of Classification requesting that they determine that your model is NOT restricted data. Offer to send them hardware to run the model (you won’t get it back). I am familiar with this classification regime, documents are not the only things that can be marked restricted data, physical embodiments (sculpture or actual objects) or computer programs (math models) are classified with it.
They can either determine that your model contains restricted data (if the model can invent novel tech, it should be able to figure out 80 year old tech so this might not be a hard bar to jump), determine that it does not contain restricted data (in which case you should send them a bunch of outputs that look bad and see what they say), or determine that the class of material (model) cannot be judged under the law.
Since the model is a unitary object, it will be quite hard to separate ‘these specific weights are where the restricted data lives’ from the rest, so suddenly frontier models will become, through the stroke of a bureaucrat’s pen, state secrets.
Edit: get someone with a current or former Q clearance to submit the model as their own work if you want to add ‘it would be ok for this to be published, but not by you’ to the list of possible outcomes. That would mean that an AI researcher who wants to credibly take themselves out of AI research permanently can simply acquire a Q-cleared job (LLNL is near the bay) at some point. The possible positive (for OP) outcomes are 1) the US bureaucracy has a reason to slam the door on AI research globally in the name of counterproliferation 2) there is a nunn-lugar type path for researchers to make a living without working on dangerous capabilities (or working slowly only within the government, which is conservative, cost constrained, and now staffed with people who wanted to make safety their mission). If you seize power, you don’t need new legislation on safety, you only need some bureaucrats to choose to enforce the rule, plus potentially extra funding for military industrial complex contractor jobs.
Arguments of law in this context appear to me to be less important than arguments of power, but...if law matters, here is a law you can use, I guess?
Hooray, my prediction six years ago is now unambiguously correct:
“I assert that at some point in the next two years, there will exist an AI engine which when given the total body of human work in mathematics and a small prompt (like the one used in gpt-2), is capable of generating mathematical works that humans in the field find interesting to read, provided of course that someone bothers to try.”
I’m going to go celebrate by saying something else that the people around me think is dumb.
Edit for the tags: I think I was right two years ago because the (low) threshold I set was exceeded with GPT-3 or 3.5 (which was in scope for two years), but there was still room for debate. I assert that six years on, there’s no ambiguity about whether the threshold was crossed.
This has unambiguously happened: https://www.lesswrong.com/posts/3LcyoqNTJuCZ65MbL/mo-putera-s-shortform?commentId=YrRtLbrWwnZB8LskW
I also notice that the person I was arguing with has fled from his comments.
“If an ASI ban is to accomplish anything at all, it has to be effective everywhere.”
Hypervelocity impactors for any alien civilization believed to be capable of developing to the point of being able to build a GPU or evolving an organic brain with a measurable IQ over some threshold?
When doing big data analysis on stuff like this, there’s a big difference between generating a story that seems to make sense and generating correct conclusions.
For these examples, how are you validating claude’s conclusions? Are you certain enough to put warheads on foreheads? People during that time were manually analyzing these types of data, and they absolutely were making those decisions.
What is the readiness level of the tech for doing this kind of analysis: 1) nobody can do it (you’re delusional if you believe we’re here) 2) some large organizations can do it if they really want to. 3) most places that want to do it right now can do it, slightly constrained by funding (I believe we are here) 4) absolutely anyone can do it to anyone at any time.
I don’t think data access availability shifts enough to move us from world 3 to world 4 when mythos hits. So...what are you worried about?
Has anyone written anything about the costs of pausing early? If the AI safety position on superintelligence eventually killing us all is correct, presumably there are points on the path to it that are better than others.
Is the best spot to pause in the past? If it’s in the future, what do we lose by stopping before we reach that point?
As I’ve written before, I think humans are on a glide path to extinction from non-AI causes. I think we are locked into a bunch of problems that require science and engineering solutions that are not currently available.
Pausing AI is likely pausing or rolling back technical development in general. I think the arguments for that leading to extinction long term are stronger than the arguments for superintelligence coming into being and instantly destroying the universe.
Please post the work test and grading template you used. A lot of people here might benefit from reading it.