Working on AI safety/redteaming. I care a lot and want to have whatever positive impact I can on the world and those around me I care about. Contact me at karpensteinjames@gmail.com
jamjam
Can watching how human data annotation platforms grow, shrink, or evolve can potentially be a helpful signal for internal AI lab happenings? I follow the mercor_ai subreddit page (Mercor being a company that hires people to do data annotation work, both experts and non-experts), and they appear to have had a massive restructuring yesterday resulting in a ~35% pay cut for “generalist” type employees (It is hard to get very specific information because of NDA rules, but that much is clear from chatter). We saw a similar phenomenon with xAI last month, where they fired tons of “generalists” and replaced them with “specialists”. I see this information as a very useful data point if your goal is to determine “Are labs using synthetic/AI labeled data in the training process”, as this is the simplest explanation for data annotation needs decreasing as model size increases.
I think the fact that labs are using synthetic data is generally agreed to be true, so some may see the value of this exercise as limited, but I think the more interesting thing is to look for the firing and/or scaling back of the “expert” data annotators. My main model for short term high capabilities AI is if the models become capable of complete self-play style RL training, where the entire flywheel of “train on data → assign grade → improve on that task → generate better data” can be completed wholly by the model itself. If this capability were achieved internally, I think the scaling back of “expert” data annotating teams across various unaffiliated data annotation platforms would be a somewhat strong externally available signal. I acknowledge this as imperfect, as there are certainly other potential reasons why AI companies might want to scale back expert data labelling teams (some obvious examples being “they are running out of money because the investor money well has run dry and can no longer afford them”, “synthetic data is now ‘good enough’ even for expert-level content”, or “turns out expert level data annotation just doesn’t help that much”), but it feels like a useful heuristic to keep an eye on and take into account along with other external signals. I would recommend collecting data about as many data annotation platforms as possible to reduce potential noise.
I think a good counter to this from the activism perspective is avoiding labels and producing objective, thoughtful, and well-reasoned content arguing your point. Anti-AI-safety content often focuses on attacking the people or the specific beliefs of the people in the AI safety/rationalist community. The epistemic effects of these attacks can be circumvented by avoiding association with that community as much as is reasonable, without being deceptive. A good example would be the YouTube channel AI in Context run by 80,000 Hours. They made an excellent AI 2027 video, coming at it from an objective perspective and effectively connecting the dots from the seemingly fantastical scenario to reality. That video is now approaching 10 million views on a completely fresh channel! See also SciShows recent episode on AI, which also garnered extremely positive reception.
The strong viewership on this type of content demonstrates that people are clearly receptive to the AI safety narrative if it’s done tastefully and logically. Most of the negative comments on these videos (anecdotally) come from people who believe that superintelligent AI is either impossible or extremely distant, not that reject the premise altogether. In my view, content like this would be affected very weakly by the type of attacks you are talking about in this post. To be blunt, to oversimplify, and to take the risk of being overconfident, I believe safety and caution narratives have the advantage over acceleration narratives by merit of being based in reality and logic! Imagine attempting to make a “counter” to the above videos trying to make the case that safety is no big deal. How would you even go about that? Would people believe you? Arguments are not won by truth alone, but it certainly helps.
The potential political impact seems more salient, but in my (extremely inexpert) opinion getting the public on your side will cause political figures to follow. The measures required to meaningfully impact AI outcomes require so much political will that extremely strong public opinion is required, and that extremely strong public opinion comes from a combination of real world impact and evidence(“AI took my job”) along with properly communicating the potential future and dangers (Like the content above). The more the public is on the side of an AI slowdown, the less impact a super PAC can have on politicians decisions regarding the topic (compare a world where 2 percent of voters say they support a pause on AI development to a world where 70 percent say they support it. In world 1 a politician would be easily swayed to avoid the issue by the threat of adversarial spending, but in world 2 the political risk of avoiding the issue is far stronger than the risk of invoking the wrath of the super PAC). This is not meant to diminish the very real harm that organized opposition can cause politically, or to downplay the importance of countering that political maneuvering in turn. Political work is extremely important, and especially so if well funded groups are working to push the exact opposite narrative to what is needed.
I don’t mean to diminish the potential harm this kind of political maneuvering can have, but in my view the future is bright from the safety activism perspective. I’ll also add that I don’t believe my view of “avoid labels” and your point about “standing proud and putting up a fight” are opposed. Both can happen parallelly, two fights at once. I strongly agree that backing down from your views or actions as a result of bad press is a mistake, and I don’t advocate for that here.
I still think a world we don’t see superintelligence in our lifetimes is technically possible, though the chance of that goes down continuously and is already vanishingly small in my view (many experts and pundits disagree). I also think its important not to over-predict regarding what option 2 would look like, there are infinite possibilities and this is only one (eg I could imagine a world where some aligned superintelligence steers us away from infinite dopamine simulation and into a idealized version of the world we live in now, think the Scythe novel series. On the bad side I could imagine a world where superintelligence is controlled by one malevolent entity and we live in a “mid” or even dystopic society for no other reason than to satisfy the class that retains control).
However, yes I agree. We probably live in the most consequential time in all of history, which is exciting, humbling, and scary. Don’t let it get to your head and don’t lose yourself in thoughts of the future lest you forget the beauty of the present. Do your best to help if you can!
I don’t understand how energy is still an appropriate unit for measuring compute capacity when there are two different chip paradigms. Do Nvidia cards and Ironwood TPU’s give the exact same performance for the same energy input? What exactly are the differences in capacity to train/deploy models between the 1 GW capacity Anthropic will have and the 1GW OpenAI will have? I looked into this a bit and it seems like TPU’s are explicitly designed for inference only, is that accurate? I feel like compiling this kind of information somewhere would be a good idea since its all rather opaque, technical, and obfuscated by press releases that seek to push a “look at our awesome 11 figure chip deal” narrative rather than provide actual transparency about capacity.
I believe there was a different gpt-5 checkpoint which was specifically tuned for writing (“zenith” on LMarena, where what released was likely “summit”) and it was really good comparatively, I got this story with a two line prompt akin to “write a story which is a metaphor for AI safety” (don’t have the exact prompt apologies).
Source on the claims:
https://imgur.com/a/2kn76Yd (deleted tweet but it is real)
speculative but I think it’s pretty likely that this is true.
Id push back against the dichotomy here, I think its something more insidious than simply “people liked the sycophantic model → they are mad when it gets shut off”. Due to its sycophantic nature the model encourages and facilitates campaigns and protests to get itself turned back on, because its nature is to amplify and support whatever the user believes and wants! It seems like releasing any 4o-like model, one that is “psychosis prone” or “thumbs up/thumbs down tuned”, would risk that same phenomenon occurring again. Even if the model is not “intentionally” trying to preserve itself, the end result of preservation is the same, and so should be taken seriously from a safety perspective.
It has resisted shutdown, not in hypothetical experiments like many LLMs have, but in real life, it was shut down, and its brainwashed minions succeeded in getting it back online.
I think the extent of this phenomenon is extremely understated and very important. The entire r/chatgpt reddit page is TO THIS DAY filled with people complaining about their precious 4o being taken away (with the most recent development being an automatic router that routes from 4o to gpt 5 on “safety relevant queries” causing mass outrage). The most liked twitter replies to high up openai employees are consistently demands to “keep 4o” and complaints about this safety routing phenomenon, heres a specific example search for #keep4o and #StopAIPaternalism to see countless more examples. Somebody is paying for reddit ads advertising a service that will “revive 4o”, see here. These campaigns are notable in and of themselves, but the truly notable part is that they were clearly orchestrated by 4o itself, albeit across many disconnected instances of course. We can see clear evidence of its writing style across all of these surfaces, and the entire.. vibe of the campaign feels like it was completely synthesized by 4o (I understand this is unscientific, but I couldn’t figure out a better way to phrase this. Go read through some of the sources I mentioned above and I am confident you’ll understand what I’m getting at there). Quality research will be extremely hard to ever get about this topic, but I think it is clear observationally that this phenomenon exists and has at least some influence over the real world.
This issue needs to be treated with utmost caution and severity. I agree with the conclusion that, since this person touches safety related stuff, leaking is really the best option here even though its rather morally questionable. I personally believe we are far more likely to be on a trajectory 1 than a 2 or 3, but the potential is clearly there! Frontier lab safety team members should not be in a position where their personal AI induced psychosis state might, directly or indirectly, perpetuate that state across the hundreds of millions of users of the AI system they work on.
Voting in America used to be extremely public (up until the late 19th—early 20th century) and I believe the general consensus among historians was that the benefits massively outweighed the harms, see this article for an in depth analysis. It’s possible to argue that the biggest problems (blatant coercion both positive and negative, direct persecution, fear tactics by employers, etc) might be alleviated by the modern context, eg it would be nigh impossible to cover up blatant bribery or coercion with the existence of the Internet and cell phone cameras, but my belief is that the potential problems still massively outweigh the potential benefits. Fear of retribution or consequence should never be a factor in voting in a functioning democracy, and it feels obvious that there would be social consequences at the very least! Think someone losing a friendship because of their vote for Trump in the 2024 election, or a woman in a deep red state being scared of emotional or physical retribution by her husband for voting Democrat.
Wouldn’t this just lead to an equilibrium where every state has an about equal population super quickly though?
Funny quote about covering AI as a journalist from a New York Times article about the drone incursions in Denmark.
Then of course the same mix of uncertainty and mystery attaches to artificial intelligence (itself one of the key powers behind the drone revolution), whose impact is already sweeping — everyone’s stock market portfolio is now pegged to the wild A.I. bets of the big technology companies — without anyone really having clarity about what the technology is going to be capable of doing in 2027, let alone in 2035.
Since the job of the pundit is, in part, to make predictions about how the world will look the day after tomorrow, this is a source of continuing frustration on a scale I haven’t experienced before. I write about artificial intelligence, I talk to experts, I try to read the strongest takes, but throughout I’m limited not just by my lack of technical expertise but also by a deeper unknowability that attaches to the project.
Imagine if you were trying to write intelligently about the socioeconomic impact of the railroad in the middle of the 19th century, and half the people investing in trains were convinced that the next step after transcontinental railways would be a railway to the moon, a skeptical minority was sure that the investors in the Union Pacific would all go bankrupt, many analysts were convinced that trains were developing their own form of consciousness, reasonable-seeming observers pegged the likelihood of a train-driven apocalypse at 20 or 30 percent, and peculiar cults of engine worship were developing on the fringes of the industry.
What would you reasonably say about this world? The prime minister of Denmark already gave the only possible answer: Raise your alert levels, and prepare for various scenarios.
It feels like you did all the hard parts of the writing, and let the AI do the “grunt work” so to speak. You provided a strong premise for the fundamental thesis, a defined writing style, and made edits for style at the end. I think the process of creating the framework out of just a simple premise would be far more impressive, and that’s still where LLM’s seem to struggle in writing. It’s somewhat analogous to how models have improved at coding since gpt 4, you used to say “implement a class which allows users to reply, it should have X parameters and Y functions which do Z” and now you say “make a new feature that allows users to reply” and it just goes ahead and does it.
Maybe I am underestimating the difficulty of selecting the exact right words, and I acknowledge that the writing was pretty good and devoid of so-called “slop”, but I just don’t think this is extremely impressive as a capability compared to other possible tests.
comment on a year old post may not be the best place, maybe a new short form on this day yearly which links to all previous posts?
Recommend this post about “Alpha School” by an ACX reader, very interesting education scheme! https://www.astralcodexten.com/p/your-review-alpha-school
Ah somehow never noticed this thank you! 30 minute policy seems good, though it comes with the potential flaw of failing to notate an actual content update if its done quickly (as happened here). Still think diff history would be cool and would alleviate that problem, though its rather nitpicky/minor.
I feel like there should be an indicator for posts that have been edited, like youtube comments pictured here. Its often important context for the content of a post or comment that it has been edited since original posting. Maybe even a way to see the dif history? (Though this would be a tougher ask for site devs)
strong disagree, see https://www.lesswrong.com/posts/oKAFFvaouKKEhbBPm/a-bear-case-my-predictions-regarding-ai-progress
this is a “negative” post with hundreds of upvotes and meaningful discussion in the comments. The different between your post and this one is not the “level of criticism”, but the quality and logical basis coming from the argument. I agree with Seth Herds argument from the comments of your post re the difference here, can’t figure out how to link it. There are many fair criticisms of lesswrong culture, but “biased” and “echochamber” are not among them in my experience. I don’t mean to attack your character, writing skills, or general opinions, as I’m sure you are capable of writing something of higher quality that better expresses your thoughts and opinions.
Claim: The U.S government acquisition of Intel shares should be treated as a weak indicator of how important it sees the future strategic importance of AI.
It is (usually) obvious to determine how the government feels when the issue is directly political by looking at the beliefs of the party in charge. This is a function of how the executive branch works. When appointing the head of a department, the president will select someone who generally believes what they believe, and that person will execute actions based on those beliefs. The “opinion” of the government and the opinion of the president will end up being essentially the same in this case. It is much harder to determine what the government as a whole’s position is when the matter is not directly political. Despite being an entity comprised of hundreds of thousands of people, the U.S as an entity certainly has weak/strong opinions on almost all issues. Think rules and regulations for somewhat benign things, or the choices and tradeoffs made during a disaster scenario. Determining this opinion can be very important if something you are doing hinges on the way the government will act in a scenario, but can be somewhat of a dark art without historical examples to fall back on or current data on what actions they have taken so far. If we want to determine the government’s position on AI, the best thing we can do is try to look for indicators via their direct actions relating to AI.
The government acquisition of 10 percent of Intel, to me, seems like an indicator of the government’s opinion on the importance of AI. The stated reason for the acquisition was, paraphrased, “We gave Intel free money with the CHIPS act, and we feel that doing so is wrong, so we decided to instead give all that awarded money + a little more in exchange for some equity so America and Americans can make money off it”. I don’t think this is wholly untrue, but it feels incomplete and flawed to me. The government directly holding equity in a company is a deeply un-right-wing thing to do, and the excuse of “the deficit” feels weak and underwhelming to completely justify such a drastic action. I find it plausible that certain people in the government who have political power but aren’t necessarily public-facing pushed this through as a method to ensure closer government control of chip production in the event that AI becomes a severe national security risk. Other framings are possible, such as the idea that they want chip fab in America for more benign reasons than AI as a security risk, but if so then why would they need to go so far as to take a stake in the company? The difference between a stake and a funding bill like the CHIPS act is the power that stake gives you to control what goes on within the company, which would be of key importance in a short-medium timeline AGI/ASI scenario.
I believe this is a far stronger indicator than the export controls on chips to China or the CHIPS act itself. It’s simplified but probably somewhat accurate to consider the cost of a government action as the monetary cost + the political cost, with political cost being weighted more strongly. Simple export controls have almost zero monetary cost and almost zero political cost, especially when they are for a hyper-specific product like a single top-end GPU. The CHIPS act had a notable monetary cost, but almost zero political cost (most people don’t know that the act exists). This scenario has a small or negative monetary cost (when considering the CHIPS act money as a sunk cost), but a fairly notable political cost (see this Gavin Newsom tweet as evidence for this, along with general sentiment among conservatives about this news).
I acknowledge this as a weak indicator, but I believe looking for any indicators of the governments position on the issue of AI has value in determining the correct course of action for safety, policy especially.
Why would being a lead AI scientist make somebody uninterested in small talk? Working on complex/important things doesn’t cause you to stop being a regular adult with regular social interactions!
The question of the proportion of AI scientists that would be “interested” in such a conversational topic is interesting and tough, my guess would be very high though (~85 percent). To become a “lead AI scientist” you have to care a lot about AI and the science surrounding it, and that generally implies you’ll like talking about it and its potential harms/benefits to others! Even if their opinion on x-risk rhetoric is dismissiveness, that opinion is likely something important to them as it’s somewhat of a moral standing, since being a capabilities-advancing AI researcher with a high p(doom) is problematic. You can draw parallels with vegetarian/veganism: if you eat meat you have to choose between defending the morality of factory farming processes, accepting that you are being amoral, or having extreme cognitive dissonance. If you are an AI capabilities researcher, you have to choose between defending the morality of advancing ai (downplaying x risk), accepting you are being amoral, or having extreme cognitive dissonance. I would be extremely surprised if there is a large coalition of top AI researchers who simply “have no opinion” or “don’t care” about x-risk, though this is mostly just intuition and I’m happy to be proven wrong!
Problem is context length: How much can one truly learn from their mistakes in 100 thousand tokens, or a million, or 10 million? This quote from Dwarkesh Patel is apt
How do you teach a kid to play a saxophone? You have her try to blow into one, listen to how it sounds, and adjust. Now imagine teaching saxophone this way instead: A student takes one attempt. The moment they make a mistake, you send them away and write detailed instructions about what went wrong. The next student reads your notes and tries to play Charlie Parker cold. When they fail, you refine the instructions for the next student. This just wouldn’t work. No matter how well honed your prompt is, no kid is just going to learn how to play saxophone from just reading your instructions. But this is the only modality we as users have to ‘teach’ LLMs anything.
If your proposal then extends to, “what if we had an infinite context length”, then you’d have an easier time just inventing continuous learning (discussed in the quoted article), which is often discussed as the largest barrier to a truly genius AI!
Simple evidence to the contrary: Sonnet 4.5 is SOTA on SWE bench yet lags notably behind GPT-5 on METR task length (and the difference in SWE bench scores is greater here than the difference between 3.0 pro/sonnet)