It’s not a Schelling point if you communicate about it!
utilistrutil
I agree it’s possible and it’s worth thinking through considerations like this. But I still don’t think this is a good model of journalists’ incentives.
In practice, “probability of being seen as inaccurate” is the term that dominates, which means inaccuracies tend to show up at points in the news article that face the least scrutiny, eg the part of an AI article where the journalist rushes through what a transformer is. These are the parts that are often least important to readers, and least important to you as a source.
And then I would describe the motivation more as “career success” than “political benefit”. As in getting a big scoop or writing a successful story, more than pushing a particular agenda. I think what journalists’ consider a successful story is kind of correlated with importance to the reader, barely correlated with what’s impactful, and barely correlated with how frustrating it would be for you to be misquoted. Consider the ChatGPT suicide example: the journalist is focused on their big scoop, but probably cares much less about the paragraph I pulled out. Ditto for readers. But I think it’s very valuable it was included.
I’ll have more on this in the epistemics post.
To make the analogy stronger, what if it only inverts it 1⁄10 times? Then I think the answer is non-obvious and depends on your principles.
If you’re deontological about it, I think you could make a case that your hands are not dirty for making the best of a bad system.
If you’re consequentialist about it, I’m saying the 9⁄10 accuracies could outweigh the 1⁄10 inaccuracies. And as Zack said, the 1⁄10 errors are rarely true inversions. That’s why
even if you do get misquoted, it doesn’t mean talking to the journalist was net-negative, even for that particular piece and even ex-post. As annoying as it is, it might be outweighed by the value of steering the article in positive ways.
You are doing the lord’s work fr
MIT Tech Review doesn’t break much news. Try Techmeme.
Re “what people are talking about”
Sure, the news is biased toward topics people already think are important because you need readers to click etc etc. But you are people, so you might also think that at least some of those topics are important. Even if the overall news is mostly uncorrelated with your interests, you can filter aggressively.
Re “what they’re saying about it”
I think you have in mind articles that are mostly commentary, analysis, opinion. News in the sense I mean it here tells you about some event, action, deal, trend, etc that wasn’t previously public. News articles might also tell you what some experts are saying about it, but my recommendation is just to get the object-level scoop from the headline and move on.
Re is it worth the time of sifting through
Skimming headlines is fast. Maybe the news tends to be less action-relevant for your research, but I bet AI safety collectively wastes time and misses out on establishing expertise by being behind the news. Reading Zvi’s newsletter falls under what I’m advocating for (even though it’s mostly that what-people-are-saying commentary, the object-level news still comes through.)
Conditioning as a Crux Finding Device
Say you disagree with someone, e.g. they have low pdoom and you have high pdoom. You might be interested in finding cruxes with them.
You can keep imagining narrower and narrower scenarios in which your beliefs still diverge. Then you can back out properties of the final scenario to identify cruxes.
For example, you start by conditioning on AGI being achieved—both of your pdooms tick up a bit. Then you also condition on that AGI being misaligned, and again your pdooms increase a bit (if the beliefs move in opposite directions, that might be worth exploring!). Then you condition on the AGI self-exfiltrating, and your pdooms nudge up again.
Now you’ve found a very narrow scenario in which you still disagree! You think it’s obvious that a misaligned AGI proliferating around the world is an endgame, they don’t see what the big deal is. From there, you’re in a good position to find cruxes.
(Note that you’re not necessarily finding the condition of maximum disagreement, you’re just trying to get information about where you disagree.)
Got it thanks!
(eg. any o1 session which finally stumbles into the right answer can be refined to drop the dead ends and produce a clean transcript to train a more refined intuition)
Do we have evidence that this is what’s going on? My understanding is that distilling from CoT is very sensitive—reordering the reasoning, or even pulling out the successful reasoning, causes the student to be unable to learn from it.
I agree o1 creates training data, but that might just be high quality pre-training data for GPT-5.
Why does it make the CoT less faithful?
Favorite post of the year so far!
My favored version of this project would involve >50% of the work going into the econ literature and models on investor incentives, with attention to
Principal-agent problems
Information asymmetry
Risk preferences
Time discounting
And then a smaller fraction of the work would involve looking into AI labs, specifically. I’m curious if this matches your intentions for the project or whether you think there are important lessons about the labs that will not be found in the existing econ literature.
How does the fiduciary duty of companies to investors work?
OpenAI instructs investors to view their investments “in the spirit of a donation,” which might be relevant for this question.
I would really like to see a post from someone in AI policy on “Grading Possible Comprehensive AI Legislation.” The post would lay out what kind of safety stipulations would earn a bill an “A-” vs a “B+”, for example.
I’m imagining a situation where, in the next couple years, a big omnibus AI bill gets passed that contains some safety-relevant components. I don’t want to be left wondering “did the safety lobby get everything it asked for, or did it get shafted?” and trying to construct an answer ex-post.
I don’t know how I hadn’t seen this post before now! A couple weeks after you published this, I put out my own post arguing against most applications of analogies in explanations of AI risk. I’ve added a couple references to your post in mine.
Adult brains are capable of telekinesis, if you fully believe in your ability to move objects with your mind. Adults are generally too jaded to believe such things. Children have the necessary unreserved belief, but their minds are not developed enough to exercise the ability.
File under ‘noticing the start of an exponential’: A.I. Helped to Find a Vast Source of the Copper That A.I. Needs to Thrive
Scott Alexander says:
Suppose I notice I am a human on Earth in America. I consider two hypotheses. One is that everything is as it seems. The other is that there is a vast conspiracy to hide the fact that America is much bigger than I think—it actually contains one trillion trillion people. It seems like SIA should prefer the conspiracy theory (if the conspiracy is too implausible, just increase the posited number of people until it cancels out).
I am often confused by the kind of reasoning at play in the text I bolded. Maybe someone can help sort me out. As I increase the number of people in the conspiracy world, my prior in that world also decreases. If my prior falls faster than the number of people in the considered world grows, I will not be able to construct a conspiracy-world that allows the thought experiment to bite.
Consider the situation where I arrive at the airport, where I will wait in line at security. Wouldn’t I be more likely to discover a line 1000 people long than 100 people long? I am 10x more likely to exist in the longer line. The problem is that our prior on 1000 people security lines might be very low. The reasoning on display in the above passage would invite us to simply crank up the length of the line, say, to 1 million people. I suspect that SIA proponents don’t show up at the airport expecting lines this long. Why? Because the prior on a million-person line is more than a thousand times lower than the prior on a 100-person line.
This also applies to some presentations of Pascal’s mugging.
Jacob Steinhardt on predicting emergent capabilities:
There’s two principles I find useful for reasoning about future emergent capabilities:
If a capability would help get lower training loss, it will likely emerge in the future, even if we don’t observe much of it now.
As ML models get larger and are trained on more and better data, simpler heuristics will tend to get replaced by more complex heuristics. . . This points to one general driver of emergence: when one heuristic starts to outcompete another. Usually, a simple heuristic (e.g. answering directly) works best for small models on less data, while more complex heuristics (e.g. chain-of-thought) work better for larger models trained on more data.
The nature of these things is that they’re hard to predict, but general reasoning satisfies both criteria, making it a prime candidate for a capability that will emerge with scale.
I think you could also push to make government liable as part of this proposal
Re going along with lies—Yeah, I think the coverage of data center water usage has been an example of that at its worst :/
Re journalists sitting on scoops—I’m curious if you’re able to share any examples? I don’t doubt that it happens.