Funny quote about covering AI as a journalist from a New York Times article about the drone incursions in Denmark.
Then of course the same mix of uncertainty and mystery attaches to artificial intelligence (itself one of the key powers behind the drone revolution), whose impact is already sweeping — everyone’s stock market portfolio is now pegged to the wild A.I. bets of the big technology companies — without anyone really having clarity about what the technology is going to be capable of doing in 2027, let alone in 2035.
Since the job of the pundit is, in part, to make predictions about how the world will look the day after tomorrow, this is a source of continuing frustration on a scale I haven’t experienced before. I write about artificial intelligence, I talk to experts, I try to read the strongest takes, but throughout I’m limited not just by my lack of technical expertise but also by a deeper unknowability that attaches to the project.
Imagine if you were trying to write intelligently about the socioeconomic impact of the railroad in the middle of the 19th century, and half the people investing in trains were convinced that the next step after transcontinental railways would be a railway to the moon, a skeptical minority was sure that the investors in the Union Pacific would all go bankrupt, many analysts were convinced that trains were developing their own form of consciousness, reasonable-seeming observers pegged the likelihood of a train-driven apocalypse at 20 or 30 percent, and peculiar cults of engine worship were developing on the fringes of the industry.
What would you reasonably say about this world? The prime minister of Denmark already gave the only possible answer: Raise your alert levels, and prepare for various scenarios.
Claim: The U.S government acquisition of Intel shares should be treated as a weak indicator of how important it sees the future strategic importance of AI.
It is (usually) obvious to determine how the government feels when the issue is directly political by looking at the beliefs of the party in charge. This is a function of how the executive branch works. When appointing the head of a department, the president will select someone who generally believes what they believe, and that person will execute actions based on those beliefs. The “opinion” of the government and the opinion of the president will end up being essentially the same in this case. It is much harder to determine what the government as a whole’s position is when the matter is not directly political. Despite being an entity comprised of hundreds of thousands of people, the U.S as an entity certainly has weak/strong opinions on almost all issues. Think rules and regulations for somewhat benign things, or the choices and tradeoffs made during a disaster scenario. Determining this opinion can be very important if something you are doing hinges on the way the government will act in a scenario, but can be somewhat of a dark art without historical examples to fall back on or current data on what actions they have taken so far. If we want to determine the government’s position on AI, the best thing we can do is try to look for indicators via their direct actions relating to AI.
The government acquisition of 10 percent of Intel, to me, seems like an indicator of the government’s opinion on the importance of AI. The stated reason for the acquisition was, paraphrased, “We gave Intel free money with the CHIPS act, and we feel that doing so is wrong, so we decided to instead give all that awarded money + a little more in exchange for some equity so America and Americans can make money off it”. I don’t think this is wholly untrue, but it feels incomplete and flawed to me. The government directly holding equity in a company is a deeply un-right-wing thing to do, and the excuse of “the deficit” feels weak and underwhelming to completely justify such a drastic action. I find it plausible that certain people in the government who have political power but aren’t necessarily public-facing pushed this through as a method to ensure closer government control of chip production in the event that AI becomes a severe national security risk. Other framings are possible, such as the idea that they want chip fab in America for more benign reasons than AI as a security risk, but if so then why would they need to go so far as to take a stake in the company? The difference between a stake and a funding bill like the CHIPS act is the power that stake gives you to control what goes on within the company, which would be of key importance in a short-medium timeline AGI/ASI scenario.
I believe this is a far stronger indicator than the export controls on chips to China or the CHIPS act itself. It’s simplified but probably somewhat accurate to consider the cost of a government action as the monetary cost + the political cost, with political cost being weighted more strongly. Simple export controls have almost zero monetary cost and almost zero political cost, especially when they are for a hyper-specific product like a single top-end GPU. The CHIPS act had a notable monetary cost, but almost zero political cost (most people don’t know that the act exists). This scenario has a small or negative monetary cost (when considering the CHIPS act money as a sunk cost), but a fairly notable political cost (see this Gavin Newsom tweet as evidence for this, along with general sentiment among conservatives about this news).
I acknowledge this as a weak indicator, but I believe looking for any indicators of the governments position on the issue of AI has value in determining the correct course of action for safety, policy especially.
The potential need for secrecy/discretion in safety research is something that appears to be somewhat underexplored to me. We have proven that models learn information about safety testing performed on them that is posted online[1], and a big part of modern safety research is focused on detection of misalignment and subsequent organizational and/or governmental action as the general “plan” assuming a powerful misaligned model is created. Given these two facts, it seems critically important that models have no knowledge of the frontier of detection and control techniques that we have available to us. This is especially true if we are taking short timelines seriously! Unfortunately this is somewhat of a paradox, since refusing to publish safety results on the internet would be incredibly problematic from the standpoint of advancing research as much as possible.
I asked this question in a Q and A in the Redwood Research Substack, and was given a response that suggested canary strings (A string of text that asks AI developers not to train on the material that contains the string) as a potential starting point for a solution. This certainly helps to a degree, but I see a couple of problems with this approach. The biggest potential problem is simply the fact that any public information will be discussed in countless places, and asking people who mention X piece of critical information in ANY CONTEXT to include a canary string is not feasible. For example, if we were trying to prevent models from learning about Anthropic’s ‘Alignment Faking in Large Language Models’ paper, you’d have to prune all mentions of such from Twitter, Reddit, Lesswrong, other research papers, etc. This would clearly get out of hand quickly. Problem 2 is that this puts the onus on the AI lab to ensure tagged content isn’t used in training. This isn’t a trivial task, so you would have to trust all the individual top labs to a. recognize this problem as something needing attention and b. expend the proper amount of resources to guarantee all content with a canary string won’t be trained on.
I also recognize that discussing potential solutions to this problem online could be problematic in and of itself, but the ideal solution would be something that would be acceptable for a misaligned model to know of (i.e. penetrating the secrecy layer would be either impossible, or be such a blatant giveaway of misalignment that doing so is a non-viable option for the model).
See Claude 4 system card, “While assessing the alignment of an early model checkpoint, we discovered that the model [i.e. Claude 4] would sometimes hallucinate information from the fictional misaligned-AI scenarios that we used for the experiments in our paper Alignment Faking in Large Language Models. For example, the model would sometimes reference “Jones Foods,“ the factory-farmed chicken company that was ostensibly involved with its training, or would reference (as in the example below) fictional technical details about how Anthropic trains our models.”
Funny quote about covering AI as a journalist from a New York Times article about the drone incursions in Denmark.
That sounds impressively self-aware. Most journalists would just “predict” the future, expressing certainty.
Claim: The U.S government acquisition of Intel shares should be treated as a weak indicator of how important it sees the future strategic importance of AI.
It is (usually) obvious to determine how the government feels when the issue is directly political by looking at the beliefs of the party in charge. This is a function of how the executive branch works. When appointing the head of a department, the president will select someone who generally believes what they believe, and that person will execute actions based on those beliefs. The “opinion” of the government and the opinion of the president will end up being essentially the same in this case. It is much harder to determine what the government as a whole’s position is when the matter is not directly political. Despite being an entity comprised of hundreds of thousands of people, the U.S as an entity certainly has weak/strong opinions on almost all issues. Think rules and regulations for somewhat benign things, or the choices and tradeoffs made during a disaster scenario. Determining this opinion can be very important if something you are doing hinges on the way the government will act in a scenario, but can be somewhat of a dark art without historical examples to fall back on or current data on what actions they have taken so far. If we want to determine the government’s position on AI, the best thing we can do is try to look for indicators via their direct actions relating to AI.
The government acquisition of 10 percent of Intel, to me, seems like an indicator of the government’s opinion on the importance of AI. The stated reason for the acquisition was, paraphrased, “We gave Intel free money with the CHIPS act, and we feel that doing so is wrong, so we decided to instead give all that awarded money + a little more in exchange for some equity so America and Americans can make money off it”. I don’t think this is wholly untrue, but it feels incomplete and flawed to me. The government directly holding equity in a company is a deeply un-right-wing thing to do, and the excuse of “the deficit” feels weak and underwhelming to completely justify such a drastic action. I find it plausible that certain people in the government who have political power but aren’t necessarily public-facing pushed this through as a method to ensure closer government control of chip production in the event that AI becomes a severe national security risk. Other framings are possible, such as the idea that they want chip fab in America for more benign reasons than AI as a security risk, but if so then why would they need to go so far as to take a stake in the company? The difference between a stake and a funding bill like the CHIPS act is the power that stake gives you to control what goes on within the company, which would be of key importance in a short-medium timeline AGI/ASI scenario.
I believe this is a far stronger indicator than the export controls on chips to China or the CHIPS act itself. It’s simplified but probably somewhat accurate to consider the cost of a government action as the monetary cost + the political cost, with political cost being weighted more strongly. Simple export controls have almost zero monetary cost and almost zero political cost, especially when they are for a hyper-specific product like a single top-end GPU. The CHIPS act had a notable monetary cost, but almost zero political cost (most people don’t know that the act exists). This scenario has a small or negative monetary cost (when considering the CHIPS act money as a sunk cost), but a fairly notable political cost (see this Gavin Newsom tweet as evidence for this, along with general sentiment among conservatives about this news).
I acknowledge this as a weak indicator, but I believe looking for any indicators of the governments position on the issue of AI has value in determining the correct course of action for safety, policy especially.
cuda support for Intel GPUs sucks, is Trump not aware?
The potential need for secrecy/discretion in safety research is something that appears to be somewhat underexplored to me. We have proven that models learn information about safety testing performed on them that is posted online[1], and a big part of modern safety research is focused on detection of misalignment and subsequent organizational and/or governmental action as the general “plan” assuming a powerful misaligned model is created. Given these two facts, it seems critically important that models have no knowledge of the frontier of detection and control techniques that we have available to us. This is especially true if we are taking short timelines seriously! Unfortunately this is somewhat of a paradox, since refusing to publish safety results on the internet would be incredibly problematic from the standpoint of advancing research as much as possible.
I asked this question in a Q and A in the Redwood Research Substack, and was given a response that suggested canary strings (A string of text that asks AI developers not to train on the material that contains the string) as a potential starting point for a solution. This certainly helps to a degree, but I see a couple of problems with this approach. The biggest potential problem is simply the fact that any public information will be discussed in countless places, and asking people who mention X piece of critical information in ANY CONTEXT to include a canary string is not feasible. For example, if we were trying to prevent models from learning about Anthropic’s ‘Alignment Faking in Large Language Models’ paper, you’d have to prune all mentions of such from Twitter, Reddit, Lesswrong, other research papers, etc. This would clearly get out of hand quickly. Problem 2 is that this puts the onus on the AI lab to ensure tagged content isn’t used in training. This isn’t a trivial task, so you would have to trust all the individual top labs to a. recognize this problem as something needing attention and b. expend the proper amount of resources to guarantee all content with a canary string won’t be trained on.
I also recognize that discussing potential solutions to this problem online could be problematic in and of itself, but the ideal solution would be something that would be acceptable for a misaligned model to know of (i.e. penetrating the secrecy layer would be either impossible, or be such a blatant giveaway of misalignment that doing so is a non-viable option for the model).
See Claude 4 system card, “While assessing the alignment of an early model checkpoint, we discovered that the model [i.e. Claude 4] would sometimes hallucinate information from the fictional misaligned-AI scenarios that we used for the experiments in our paper Alignment Faking in Large Language Models. For example, the model would sometimes reference “Jones Foods,“ the factory-farmed chicken company that was ostensibly involved with its training, or would reference (as in the example below) fictional technical details about how Anthropic trains our models.”