Here’s my attempt at a neutral look at Prop 50, which people in California can vote on Tuesday (Nov 4th). The bill seems like a case-study in high-stakes game theory and when to cooperate or defect.
The bill would allow the CA legislature to re-write the congressional district maps until 2030 (when district-drawing would go back to normal). Currently, the district maps are drawn by an independent body designed to be politically neutral. In essence, this would allow the CA legislature to gerrymander California. That would probably give Democrats an extra 3-5 seats in Congress. It seems like there’s a ~17% chance that it swings the House in the midterms.
Gerrymandering is generally agreed to be a bad thing, since it means elections are determined on the margin more by the map makers and less by the people. The proponents of this bill don’t seem to think otherwise. They argue the bill is in response to Texas passing a similar bill to redistrict in a way that is predicted to give Republicans 5 new house seats (not to mention similar bills in North Carolina and Missouri that would give republicans an additional 2 seats).
Trump specifically urged Texas, North Carolina, and Missouri to pass their bills, and the rationale was straightforwardly to give Republicans a greater chance at winning the midterms. For example, Rep. Todd Hunter, the author of Texas’s redistricting bill, said “The underlying goal of this plan is straightforward, [to] improve Republican political performance”.
Notably some Republicans have also tried to argue that the Texas bill is in response to Democrats gerrymandering and obstructionism, but this doesn’t match how Trump seems to have described the rationale originally.[1]
The opponents of Prop 50 don’t seem to challenge the notion that the Republican redistricting was bad.[2] They just argue that gerrymandering is bad for all the standard reasons.
So, it’s an iterated prisoners’ dilemma! Gerrymandering is bad, but the Republicans did it, maybe the Democrats should do it to (1) preserve political balance and (2) punish/disincentivize Republicans’ uncooperative behavior.
Some questions you might have:
Will this actually disincentivize gerrymandering? Maybe the better way to disincentivize it is to set a good example.
Generally I’m skeptical of arguments like “the other guys defect in this prisoners’ dilemma and so you should too”. In practice, it’s often hard to tell why someone is defecting or for the counterparty to credibly signal that they would in fact switch to the cooperate-cooperate equilibrium if it was available. Real life is messy, it’s easy to defect and blame it on your counterparty defecting even when they didn’t, and being the kind of person who will legibly reliably cooperate when it counts is very valuable. For these reasons I tend to err towards being cooperative in practice.
In this case, if CA passes Prop 50, maybe republican voters won’t see it as a consequence of Republican gerrymandering and will simply interpret it as “the Democrats gerrymander and go whatever uncooperative behavior gets them the most votes. We need to do whatever it takes to win” or “everyone gerrymanders, gerrymandering is normal and just part and parcel of how the sausage is made”.
On top of that, I’m wary of ending up in one of the defect-defect equilibria tit-for-tat is famous for. Tit-for-two-tats and forgiveness are sometimes helpful.
But I think Prop 50 handles these things fairly well. The bill only lasts until 2030 and has been framed explicitly and clearly as in direct response to redistricting in Texas. (In fact Governor Newsom’s original proposal was to make Prop 50 “Preserves California’s current congressional maps if Texas or other states also keep their original maps.” That provision was removed once Texas solidified its redistricting.) Fretting too much about if Republicans will take even more aggressive actions because of this bill also incentives Republicans to be more aggressive in their responses and to pay less attention to Democrats’ rationales, which seems bad.
Moreover, if Democrats are benefiting similarly to Republicans from gerrymandering, perhaps this creates more bipartisan support for federal regulation banning gerrymandering. In general, where possible, I think it’s good to have laws preventing this kind of uncooperative behavior rather than relying on both parties managing to hit cooperate in a complicated prisoner’s dilemma.
Are the costs to society simply too large to be worth it?
In some ways, Prop 50 undoes some of the damage of redistricting in Texas: in Texas republicans gained 5 seats in a way that isn’t as representative as it should have been, so by undoing that and giving Democrats 3-5 extra seats, the system becomes more representative. But in some ways two wrongs don’t make a right here: at the end of the day both Texans and California end up less representative. For instance, if you think it’s more important for congress being made up of politicians who represent their constituents well and less important that constituents’ views are represented federally.
Notably even if you buy that argument you might still think Prop 50 is worth it if you think the punishing effects are worth it.
What’s the historical context? If this is a prisoner’s dilemma, how much has each side hit cooperate in the past?
Republicans have sometimes said their redistricting bills are a response to Democrats’ gerrymandering. If so, maybe they’re justified. Let’s look into it! You can read the history here or look at an interactive map here.
It seems like Republicans engaged in a major, unprovoked bout of gerrymandering in 2010 with REDMAP. Since then both parties have tried to gerrymander and occasionally succeeded. Overall, Republicans have gerrymandered somewhat more than Democrats, but Democrats have still engaged in blatant gerrymandering, for example, in Illinois in 2021. In searching for more right-leaning narratives, I found that Brookings estimated in 2023 that no party majorly benefited from gerrymandering more than another at the time, regardless of how much they’d engaged in it. I haven’t really found a great source for anyone claiming Democrats have overall benefited more from gerrymandering.
Democrats have also tried to propose a bill to ban gerrymandering federally, the Freedom to Vote Act. (This bill also included some other provisions apart from just banning gerrymandering, like expanding voter registration and making Election Day a federal holiday.) The Freedom to Vote Act was widely opposed by Republicans and I don’t know of any similar legislation they’ve proposed to ban gerrymandering.
So overall, it seems like Republicans have been engaging in more gerrymandering than Democrats and been doing less to fix the issue.
Republicans have also argued the new districts in Texas represent the Hispanic population better, though they tend to frame this more as a reason it’s good and less as the reason they pursued this redistricting in the first place.
Specifically, they say “While Newsom and CA Democrats say Prop 50 is a response to Trump and Texas redistricting, California shouldn’t retaliate and sacrifice its integrity by ending fair elections.”
One argument against the bill that I didn’t explore above (because I haven’t actually heard anyone make it) is that the only reason Democrats aren’t gerrymandering more is because gerrymandering seems more helpful to Republicans for demographic reasons. But Democrats try to do other things that are arguably designed to give them more votes. For example, loosening voter ID laws. So maybe each party should carefully respond to the ways the other party tries to sneakily get themselves more votes in very measured ways that properly disincentivize bad behavior engage in a crazy ever-escalating no-holds-barred race to the bottom.
I think it’s good that the Republicans and Democrats have been somewhat specific that their attempts at gerrymandering are only retaliation against other gerrymandering, and not retaliation against things like this
To elaborate on this, a model of voting demographics is that the most engaged voters vote no matter what hoops they need to jump through, so rules and laws that make voting easier increase the share of less engaged voters. This benefits whichever party is comparatively favored by these less engaged voters. Historically this used to be the Democrats, but due to education polarization they’ve become the party of the college-educated nowadays. This is also reflected in things like Trump winning the Presidential popular vote in 2024. (Though as a counterpoint, this Matt Yglesias article from 2022 claims that voter ID laws “do not have a discernible impact on election results” but doesn’t elaborate.)
In addition, voter ID laws are net popular, so Democrats advocating against them hurts them both directly (advocating for an unpopular policy) and indirectly (insofar as it increases the pool of less engaged voters).
Seen in the light of Section 2 of the Voting Rights Act asymmetrically binding Republicans, what you’re calling an “unprovoked bout of gerrymandering” might be better understood as an attempt to reduce the unfair advantage Democrats have had nationally for decades.
If I am reading things correctly, section 2 of the Voting Rights Act says:
(a) No voting qualification or prerequisite to voting or standard, practice, or procedure shall be imposed or applied by any State or political subdivision in a manner which results in a denial or abridgement of the right of any citizen of the United States to vote on account of race or color, or in contravention of the guarantees set forth in section 10303(f)(2) of this title, as provided in subsection (b).
(and subsection (b) clarifies this in what seem like straightforward ways).
It seems to me that if this “asymmetrically binds Republicans” then the conclusion is “so much the worse for the Republicans” not “so much the worse for the Voting Rights Act”.
As for “the unfair advantage Democrats have had nationally for decades”:
Why different years (2022, 2020, 2020)? Because each of those was the first thing I found when searching for articles from at-least-somewhat-credible outlets about structural advantages for one or another party in presidential, Senate, and House races. I make no claim that those figures are representative of, say, the last 20 years, but I don’t think it’s credible to talk about “the unfair advantage Democrats have had nationally for decades” when all three of the major national institutions people in the US get to vote for have recently substantially favoured Republicans in the sense that to get equal results Democrats would need substantially more than equal numbers of votes.
The problem with gerrymandering is that it makes elections less representative. It seems to me that (section 2 of) the Voting Rights Act makes elections more representative, so that’s good. It seems reasonable to be mad at republicans when they implement measures that make elections less representative that benefit them, but not when you want elections to stay less fair.
I don’t think this outcome was overdetermined; there’s no recent medical breakthrough behind this progress. It just took a herculean act of international coordination and logistics. It took distributing millions of water filters, establishing village-based surveillance systems in thousands of villages across multiple countries, and meticulously tracking every single case of Guinea worm in humans or livestock around the world. It took brokering a six-month ceasefire in Sudan (the longest humanitarian ceasefire in history!) to allow healthcare workers to access the region. I’ve only skimmed the history, and I’m generally skeptical of historical heroes getting all the credit, but I tentatively think it took Jimmy Carter for all of this to happen.
I’m compelled to caveat that top GiveWell charities are probably in the ballpark of $50/DALY, and the Carter Center has an annual budget of ~$150 million a year, so they “should” be able to buy 2 million DALYs every single year by donating to more cost-effective charities. But c’mon this worm is super squicky and nearly eradicating it is an amazing act of agency.
I don’t think you need that footnoted caveat, simply because there isn’t $150M/year worth of room for more funding in all of AMF, Malaria Consortium’s SMC program, HKI’s vitamin A supplementation program, and New Incentives’ cash incentives for routine vaccination program all combined; these comprise the full list of GiveWell’s top charities.
Another point is that the benefits of eradication keep adding up long after you’ve stopped paying for the costs, because the counterfactual that people keep suffering and dying of the disease is no longer happening. That’s how smallpox eradication’s cost-effectiveness can plausibly be less than a dollar per DALY averted so far and dropping (Guesstimate model, analysis). Quoting that analysis:
3.10.) For how many years should you consider benefits?
It is not clear for how long we should continue to consider benefits, since the benefits of vaccines would potentially continue indefinitely for hundreds of years. Perhaps these benefits would eventually be offset by some other future technology, and we could try to model that. Or perhaps we should consider a discount rate into the future, though we don’t find that idea appealing.
Instead, we decided to cap at an arbitrary fixed amount of years set to 20 by default, though adjustable as a variable in our spreadsheet model (or by copying and modifying our Guesstimate models). We picked 20 because it felt like a significant enough amount of time for technology and other dynamics to shift.
It’s important to think through what cap makes the most sense, though, as it can have a large effect on the final model, as seen in this table where we explore the ramifications of smallpox eradication with different benefit thresholds:
I live in the Bay Area, but my cost of living is pretty low: roughly $30k/year. I think I live an extremely comfortable life. I try to be fairly frugal, both so I don’t end up dependent on jobs with high salaries and so that I can donate a lot of my income, but it doesn’t feel like much of a sacrifice. Often when I tell people how little I spend, they’re shocked. I think people conceive of the Bay as exorbitantly expensive, and it can be, but it doesn’t have to be.
Rent: I pay ~$850 a month for my room. It’s a small room in a fairly large group house I live in with nine friends. It’s a nice space with plenty of common areas and a big backyard. I know of a few other places like this (including in even pricier areas like Palo Alto). You just need to know where to look and to be willing to live with friends. On top of rent I pay ~$200/month (edit: I was missing one expense, it’s more like $300) for things like utilities, repairs on the house, and keeping the house tidy.
I pool the grocery bill with my housemates so we can optimize where we shop a little. We also often cook for each other (notably most of us, including myself, also get free meals on weekdays in the offices we work from, though I don’t think my cost of living was much higher when I was cooking for myself each day not that long ago). It works out to ~$200/month.
I don’t buy that much stuff. I thrift most of my clothes, but I buy myself nice items when it matters (for example comfy, somewhat-expensive socks really do make my day better when I wear them). I have a bunch of miscellaneous small expenses like my Claude subscription, toothpaste, etc, but they don’t add up to much.
I don’t have a car, a child, or a pet (but my housemate has a cat, which is almost the same thing).
I try to avoid meal delivery and Ubers, though I use them in a pinch. Public transportation costs aren’t nothing, but they’re quite manageable.
I actually have a PA who helps me with some personal accounting matters that I’m particularly bad at handling myself. He works remotely from Canada and charges $15/hour. I probably average a few hours of his time each week.
I shy away from super expensive hobbies or events, but I still partake when they seem really fulfilling. Most of the social events I’m invited to are free. I take a couple (domestic) non-work trips each year, usually to visit family.
I also have occasional surprise $500-$7,000 expenses, like buying a new laptop when mine breaks. Call that an extra $10k a year.
In many ways, I’m very fortunate to be able to have this lifestyle.
I honestly feel a little bewildered by how much money people around me spend and how dependent some people seem on earning a very large salary. Many people around me also seem kind of anxious about their financial security, even though they earn a good amount of money. Because my lifestyle is pretty frugal, I feel very good about how much runway I have.
I realize that people’s time is often extremely valuable, and I absolutely believe you can turn money into more time. Sometimes people around me are aghast at how much time I waste walking to the office or sitting on the BART. But for me, I don’t think I would actually be much more productive if I spent 10x as much money on productivity, and it feels extremely freeing to know I could quit my (nonprofit) job any time and fairly easily scrape by. I recommend at least considering it, if you haven’t already.
Pay 800-ish a month in rent for one room in a shared house.
Pay a few hundred a month for a PA to help me with tasks like laundry and packaging supplements.
Walk to and from work, am happy to use ubers when I travel farther afield.
Eat almost exclusively at the office, and generally buy simple groceries that require minimal prep rather than eating out.
If I think something might make me more effective, and it costs less than ~150, I buy it and try it out, and give it away if it doesn’t work out. (Things like “kneeling chair”, “lifting shoes”, “heart rate monitor”, “backpack that’s better for running in”, “shirt that might fit me”, “heavy duty mask and filters”, “textbooks”, “bluetooth headphones”, “extra chargers”.)
I currently save (and invest) something like 90% of my income. Though my my income has changed a lot in different years. When I’m working a lot less on paid projects, and don’t have a salary, I make less money, and only save like 20% to 40%.
However, I’m semi-infamously indifferent to fun (and to most forms of physical pleasure), and I spend almost all my time working or studying. So my situation probably doesn’t generalize to most people.
Note that most people either have or want children, which changes the calculus here: you need a larger place (often a whole house if you have many or want to live with extended family), and are more likely to benefit from paying a cleaner/domestic help (which is surprisingly expensive in the Bay and cannot be hired remotely). Furthermore, if you’re a meat-eater and want to buy ethically sourced meat or animal products, this increases the cost of food a lot.
I want to push back on the idea of needing a large[1] place if you have a family.
In the US a four person family will typically live in a 2,000-2,500 square foot place, but in Europe the same family will typically live in something like 1,000-1,400 square feet. In Asia it’s often less, and earlier in the US’s history it also was much less than what it is today.
If smaller sizes work for others across time and space I believe it is often sufficient for people in the US today.
Yeah that’s fair. But the lifestyle of ~$850 a month room in a group house isn’t that nice if you have many kids, and so it makes sense that people benefit from more money to afford a nicer place.
And like, sure, you can get by on less money than some people assume, but the original comment imo understates how much you and your family benefit from more money (e.g the use of “bewildered”).
As the father of 2 kids (a 5 y/o and 2 y/o) in Palo Alto, I can confirm that childcare is a lot. $2k per kid per month at our subsidized academic-affiliation rate. At $48k, it’s almost the entirety of my wife’s PhD salary. Fortunately, I have a well-paying job and we are not strapped for money.
We also got along with just an e-bike for 6 years, saving something like $15k per year in car insurance and gas (save for 9 months when we had the luxury of borrowing a car from family) [Incorrect, see below]. We got a car recently due to a longer commute, but even then, I still use the e-bike almost everyday because the car is not much faster and overlapping with exercise time is valuable (plus the 5 y/o told me he likes fresh air),
For clothes/toys/etc., we’ve used Facebook market place, “Buy Nothing” groups, and our neighbors to source pretty much everything. The best toys have just been cardboard, masking tape, and scissors, which are very cheap.
[Edit: As comments below point out, the figure for no-car savings was incorrect. It’s closer to $8k, taking into account gas, insurance, maintenance, and repairs. Apologies for the embellishment—I think it was from a combination of factors including (i) being proud of previously not owning a car, (ii) making enough not to track it closely, and (iii) deferring to my spouse for most of our household payments/financial management (which is not great on my part—she is busy and household management is a real burden).
To shore up my credibility on child care, I pulled our receipts, and we’re currently at $2,478 per month for the toddler, and $1,400 per month for the kindergartener’s after-school program (though cheaper options were available for the after-school program).]
It can vary enormously based on risk factors, choice of car, and quantity of coverage, but that does still sound extremely high to me. I think even if you’re a 25-yo male with pretty generous coverage above minimum liability, you probably won’t be paying more than ~$300/mo unless you have recent accidents on your record. Gas costs obviously scale ~linearly with miles driven, but even if your daily commute is a 40 mile round-trip, that’s still only like $200/mo. (There are people with longer commutes than that, but not ones that you can easily substitute for with an e-bike; even 20 miles each way seems like a stretch.)
Thank you both for calling this out, because I was clearly incorrect. I was trying to recall my wife’s initial calculation, which I believe included maintenance, insurance, gas, and repairs.
I think this is one of those things where I was so proud of not owning a car that the amount saved morphed from $8k to $10k to $15k in the retelling. I need to stop doing that.
Also, I’m feeling some whiplash reading my reply because I totally sound like an LLM when called out for a mistake. Maybe similar neural pathways for embellishment were firing, haha.
My rent, also in a small room in a Bay Area group house, is around $1050. This is an interesting group house phenomenon where if rent is $1800 on average, the good rooms go for $2600 and the bad ones have to be $1000 to balance out total rent. The best rooms in a group house are a limited supply good and bc people (or even couples) often are indifferent between group house with good social scene and a $4000 luxury 1bed, prices are roughly similar. There is lots of road noise, but I realized I could pay $1000 for extra-thick blackout curtains, smart lightbulbs, etc. to mitigate this, which has saved me thousands over the past couple of years.
As for everything else, my sense is it’s not for most people. To have expenses as low as OP’s you basically need to have only zero-cost or cost-saving hobbies like cooking and thrifting, and enjoy all aspects of them. I got into cooking at one point but didn’t like shopping and wanted to use moderately nice ingredients, so when cooking for my housemates the ingredients (from an expensive grocery store through Instacart) came out to $18/serving. A basic car is also super useful, bay area or not.
I am probably one of the people OP mentions, with a bunch of financial anxiety despite being able to save close to $100k/year, but this is largely due to a psychological block keeping me from investing most of my money.
This resonates with me. I’ve always been a fan of Mr. Money Mustache’s perspective that it doesn’t take much money at all to live a really awesome life, which I think is similar to the perspective you’re sharing.
Some thoughts:
Housing is huge. And living with friends is a huge help. But I think for a lot of people that isn’t a pragmatic option (tied to an area; friends unwilling or incompatible; need privacy), and then they get stuck paying a lot for housing.
Going car free helps a lot. Unfortunately, I think most places in North America make this somewhat difficult, and the places that don’t tend to have high housing costs.
Traveling is expensive. Flights, hotels, Ubers, food. I find myself in lots of situations where I feel socially obligated to travel, like for weddings and stuff, and so end up traveling maybe 4-6x/year, but this isn’t the hardest thing in the world to avoid. You could explain to people that you have a hard budget for two trips a year.
Spending $200/month or whatever on food means being strategic about ingredients. Which I very much thinkisdoable, but yeah, it requires a fair amount of agency.
So… the biggest savings here, by far, is the rent. At a guess it’s bigger than everything else combined. If you don’t have enough friends, or your friends all live in full houses, guess you’re screwed here. Hope we’re OK with the tyranny of structurelessness?
There’s a cottage industry that thrives off of sneering, gawking, and maligning the AI safety community. This isn’t new, but it’s probably going to become more intense and pointed now that there are two giant super PACs that (allegedly[1]) see safety as a barrier to [innovation/profit, depending on your level of cynicism]. Brace for some nasty, uncharitable articles.
I think the largest cost of this targeted bad press will be the community’s overreaction, not the reputational effects outside the AI safety community. I’ve already seen people shy away from doing things like donating to politicians that support AI safety for fear of provoking the super PACs.
Historically, the safety community often freaked out in the face of this kind of bad press. People got really stressed out, pointed fingers about whose fault it was, and started to let the strong frames in the hit pieces get into their heads.[2] People disavowed AI safety and turned to more popular causes. And the collective consciousness decided that the actions and people who ushered in the mockery were obviously terrible and dumb, so much so that you’d get a strange look if you asked them to justify that argument. In reality I think many actions that were publicly ridiculed were still worth it ex-ante despite the bad press.
It seems bad press is often much, much more salient to the subjects of that press than it is to society at large, and it’s best to shrug it off and let it blow over. Some of the most PR-conscious people I know are weirdly calm during actual PR blowups and are sometimes more willing than the “weird” folks around me to take dramatic (but calculated) PR risks.
In the activist world, I hear this is a well-known phenomenon. You can get 10 people to protest a multi-billion-dollar company and a couple journalists to write articles, and the company will bend to your demands.[3] The rest of the world will have no idea who you are, but to the executives at the company, it will feel the world is watching them. These executives are probably making a mistake![4] Don’t be like them.
With all these (allegedly anti-safety[1]) super PACs, there will probably be a lot more bad press than usual. All else being equal, avoiding the bad press is good, but in order to fight back, people in the safety community will probably take some actions, and the super PACs will probably twist any actions into headlines about cringe doomer tech bros.
I do think people should take into account when deciding what to do that provoking the super PACs is risky, and should think carefully before doing it. But often I expect it will be the right choice and the blowback will be well worth it.
If people in the safety community refuse to stand up to them, then they super PACs will get what they want anyway and the safety community won’t even put up a fight.
Ultimately I think the AI safety community is an earnest, scrupulous group of people fighting for an extremely important cause. I hope we continue to hold ourselves to high standards for integrity and honor, and as long as we do, I will be proud to be part of this community no matter what the super PACs say.
They haven’t taken any anti-safety actions yet as far as I know (they’re still new). The picture they paint of themselves isn’t opposed to safety, and while I feel confident they will take actions I consider opposed to safety, I don’t like maligning people before they’ve actually taken actions worthy of condemnation.
I think it’s really healthy to ask yourself if you’re upholding your principles and acting ethically. But I find it a little suspicious how responsive some of these attitudes can be to bad press, where people often start tripping over themselves to distance themselves from whatever the journalist happened to dislike. If you’ve ever done this, consider asking yourself before you take any action how you’d feel if the fact that you took that action was on the front page of the papers. If you’d feel like you could hold your head up high, do it. Otherwise don’t. And then if you do end up on the front page of the papers, hold your head up high!
To a point. They won’t do things that would make them go out of business, but they might spend many millions of dollars on the practices you want them to adopt.
Tactically, that is. In many cases I’m glad the executives can be held responsible in this way and I think their changed behavior is better for the world.
I hope we continue to hold ourselves to high standards for integrity and honor, and as long as we do, I will be proud to be part of this community no matter what the super PACs say.
I don’t think the AI safety community has particularly much integrity or honor. I would like to make there be something in the space that has those attributes, but please don’t claim valor we/you don’t have!
For context, how would you rank the AI safety community w.r.t. integrity and honor, compared to the following groups:
1. AGI companies 2. Mainstream political parties (the organizations, not the voters, so e.g. the politicians and their staff) 3. Mainstream political movements e.g. neoliberalism, wokism, china hawks, BLM, 4. A typical university department 5. Elite opinion formers (e.g. the kind of people whose Substacks and op-eds are widely read and highly influential in DC, silicon valley, etc.) 6. A typical startup 7. A typical large bloated bureaucracy or corporation 8. A typical religion e.g. christianity, islam, etc. 9. The US military
My current best guess is that you have a higher likelihood of being actively deceived/have someone actively plot to mislead you/have someone put in very substantial optimization pressure to get you to believe something false or self-serving, if you interface with the AI safety community than almost any of the above.
A lot of that is the result of agency, which is often good, but in this case a double-edged sword. Naive consequentialism and lots of intense group-beliefs make the appropriate level of paranoia when interfacing with the AI Safety community higher than with most of these places.
“Appropriate levels of paranoia when interfacing with you” is of course not the only measure of honor and integrity, though as I am hoping to write about sometime this week, it’s kind of close to the top.
On that dimension, I think the AI Safety community is below AGI companies and the US military, and above all the other ones on this list. For the AGI companies, it’s unclear to me how much of it is the same generator. Approximately 50% of the AI Safety community are employed by AI labs, and they have historically made up a non-trivial fraction of the leadership of those companies, so those datapoints are highly correlated.
My current best guess is that you have a higher likelihood of being actively deceived/have someone actively plot to mislead you/have someone put in very substantial optimization pressure to get you to believe something false or self-serving, if you interface with the AI safety community than almost any of the above.
This is a wild claim. Don’t religions sort of centrally try to get you to believe known-to-be-false claims? Don’t politicians famously lie all the time?
Are you saying that EAs are better at deceiving people than typical members of those groups?
Are you claiming that members of those groups may regularly spout false claims, but they’re actually not that invested in getting others to believe them?
Can you be more specific about the way in which you think AI Safety folk are worse?
Don’t religions sort of centrally try to get you to believe known-to-be-false claims?
I agree that institutionally they are set up to do a lot of that, but the force they bring to bear on any individual is actually quite small in my experience, compared to what I’ve seen in AI safety spaces. Definitely lots of heterogeneity here, but most optimization that religions do to actually keep you believing in their claims are pretty milquetoast.
Are you saying that EAs are better at deceiving people than typical members of those groups?
Definitely in-expectation! I think SBF, Sam Altman, Dario, Geoff Anders plus a bunch of others are pretty big outliers on these dimensions. I think in-practice there is a lot variance between individuals, with a very high-level gloss being something like “the geeks are generally worse, unless they make it an explicit optimization target, but there are a bunch of very competent sociopaths around, in the Venkatesh Rao sense of the word, which seem a lot more competent and empowered than even the sociopaths in other communities”.
Are you claiming that members of those groups may regularly spout false claims, but they’re actually not that invested in getting others to believe them?
Yeah, that’s a good chunk of it. Like, members of those groups do not regularly sit down and make extensive plans about how to optimize other people’s beliefs in the same way as seems routine around here. Some of it is a competence side-effect. Paranoia becomes worse the more competent your adversary is. The AI Safety community is a particularly scary adversary in that respect (and one that due to relatively broad buy-in for something like naive-consequentialism can bring more of its competence to bear on the task of deceiving you).
Like, members of those groups do not regularly sit down and make extensive plans about how to optimize other people’s beliefs in the same way as seems routine around here.
I’ve been around the community for 10 years. I don’t think I’ve ever seen this?[1]
Am I just blind to this? Am I seeing it all the time, except I have lower standards what should “count”? Am I just selected out of such conversations somehow?
I currently work for an org that is explicitly focused on communicating the AI situation to the world, and to policymakers in particular. We are definitely attempting to be strategic about that, and we put a hell of a lot of effort into doing it well (eg running many many test sessions, where we try to explain what’s up to volunteers, see what’s confusing, and adjust what we’re saying).
(Is this the kind of thing you mean?)
But, importantly, we’re clear about trying to frankly communicate our actual beliefs, including our uncertainties, and are strict about adhering to standards of local validity and precise honesty: I’m happy to talk with you about the confusing experimental results that weaken our high level claims (though admittedly, under normal time constraints, I’m not going to lead with that).
Pretty much every day, I check “If someone had made this argument against [social media], would that have made me think that it was imperative to shut it down?”, about proffered anti AI arguments.
I’ve been around the community for 10 years. I don’t think I’ve ever seen this?
Also, come on, this seems false. I am pretty sure you’ve seen Leverage employees do this, and my guess is you’ve seen transcripts of chats of this happening with quite a lot of agency at FTX with regards to various auditors and creditors.
(Some) Leverage people used to talk as if they were doing this kind of thing, though it’s not like they let me in on their “optimize other people” planning meetings. I’m not counting chat transcripts that I read of meetings that I wasn’t present for.
Ah, OK, if you meant “see” in the literal sense, then yeah, seems more plausible, but also kind of unclear what its evidential value is. Like, I think you know that it happened a bunch. I agree we don’t want to double count evidence, but I think your message implied that you thought it wasn’t happening, not that it was happening and you just hadn’t seen it.
Well what I’ve seen personally bears on frequently with which this happens.
I think FTX and Leverage are regarded to be particularly bad and outlier-y cases, along several dimensions, including deceptiveness and willingness to cause harm.
If our examples are limited to those two groups, I don’t think that alone justifies saying that it is “routine” in the EA community to “regularly sit down and make extensive plans about how to optimize other people’s beliefs”.
I think you’re making a broader claim that this is common even beyond those particularly extreme examples.
I currently work for an org that is explicitly focused on communicating the AI situation to the world, and to policymakers in particular. We are definitely attempting to be strategic about that, and we put a hell of a lot of effort into doing it well (eg running many many test sessions, where we try to explain what’s up to volunteers, see what’s confusing, and adjust what we’re saying).
Yeah, that does sound roughly like what I mean, and then I think most people just drop the second part:
But, importantly, we’re clear about trying to frankly communicate our actual beliefs, including our uncertainties, and are strict about adhering to standards of local validity and precise honesty: I’m happy to talk with you about the confusing experimental results that weaken our high level claims (though admittedly, under normal time constraints, I’m not going to lead with that).
I do not think that SBF was doing this part. He was doing the former though!
Am I just blind to this? Am I seeing it all the time, except I have lower standards what should “count”? Am I just selected out of such conversations somehow?
My best guess you are doing a mixture of:
Indeed self-selecting yourself out of these environments
Having a too-narrow conception of the “AI Safety community” that forms a Motte where you conceptually exclude people who do this a lot (e.g. the labs themselves), but in a way that then makes posts like the OP we are commenting on misleading
Probably have somewhat different standards for this (indeed, a thing I’ve updated on over the years is that a lot of powerful optimization can happen here between people, where e.g. one party sets up a standard in good-faith, and then another party starts goodharting on that standard in largely good-faith, and the end-result is a lot of deception).
indeed, a thing I’ve updated on over the years is that a lot of powerful optimization can happen here between people, where e.g. one party sets up a standard in good-faith, and then another party starts goodharting on that standard in largely good-faith, and the end-result is a lot of deception
Do you have an example of this? (It sounds like you think that I might be participating in this dynamic on one side or the other.)
I think this is roughly what happened when FTX was spending a huge amount of money before it all collapsed and a lot of people started new projects under pretty dubious premises to look appealing to them. I also think this is still happening quite a lot around OpenPhil, with a lot of quite bad research being produced, and a lot of people digging themselves into holes (and also trying to enforce various norms that don’t really make sense, but where they think if they enforce it, they are more likely to get money, which does unfortunately work).
members of those groups do not regularly sit down and make extensive plans about how to optimize other people’s beliefs in the same way as seems routine around here
Is this not common in politics? I thought this was a lot of what politics was about. (Having never worked in politics.)
Is this not common in politics? I thought this was a lot of what politics was about.
I have been very surprised by how non-agentic politics is! Like, there certainly is a lot of signaling going on, but when reading stuff like Decidingtowin.org it becomes clear how little optimization actually goes into saying things that will get you voters and convince stakeholders.
I do think a lot of that is going on there, and in the ranking above I would probably put the current political right above AI safety and the current political left below AI safety. Just when I took the average it seemed to me like it would end up below, largely as a result of a severe lack of agency as documented in things like deciding-to-win.
Re corporate campaigns: I think those are really very milquetoast. Yes, you make cool ads, but the optimization pressure here seems relatively minor (barring some intense outliers, like Apple and Disney, which I do think are much more agentic here than others, and have caused pretty great harm in doing so, like Disney being responsible for copyright being far too long in the US because Disney was terribly afraid of anyone re-using their characters and so tainting Disney’s image).
“the geeks are generally worse, unless they make it an explicit optimization target, but there are a bunch of very competent sociopaths around, in the Venkatesh Rao sense of the word, which seem a lot more competent and empowered than even the sociopaths in other communities”
Are you combining Venkatesh Rao’s loser/clueless/sociopath taxonomy with David Chapman’s geek/mop/sociopath?
(ETA: I know this is not relevant to the discussion, but I confuse these sometimes.)
Very low, though trending a bit higher over time. The policy-focused playbook has to deal with a lot more trickiness here than AI-2027, and you have to deal more with policymakers and stuff, but currently y’all don’t do very much of the kind of thing I am talking about here.
I really appreciate your clear-headedness at recognizing these phenomena even in people “on the same team”, i.e. people very concerned about and interested in preventing AI X-Risk.
However, I suspect that you also underrate the amount of self-deception going on here. It’s much easier to convince others if you convince yourself first. I think people in the AI Safety community self-deceive in various ways, for example by choosing to not fully think through how their beliefs are justified (e.g. not acknowledging the extent to which they are based on deference—Tsvi writes about this in his recent post rather well).
There are of course people who explicitly, consciously, plan to deceive, thinking things like “it’s very important to convince people that AI Safety/policy X is important, and so we should use the most effective messaging techniques possible, even if they use false or misleading claims.” However, I think there’s a larger set of people who, as they realize claims A B C are useful for consequentialist reasons, internally start questioning A B C less, and become biased to believe A B C themselves.
Sure! I definitely agree that’s going on a lot as well. But I think that kind of deception is more common in the rest of the world, and the things that set this community apart from others is the ability to do something more intentional here (which then combined with plenty of self-deception can result in quite catastrophic outcomes, as FTX illustrates).
I do think it’s not good! But also, it’s an important issue and you have to interface with people who aren’t super principled all the time. I just don’t want people to think of the AI Safety community as some kind of community of saints. I think it’s pretty high variance, and you should have your guard up a good amount.
For those that rely on intelligence enhancement as a component of their AI safety strategy, it would be a good time to get your press lines straight. The association of AI safety with eugenics (whether you personally agree with that label or not) strikes me as a soft target and a simple way to keep AI safety as a marginal movement.
I think a good counter to this from the activism perspective is avoiding labels and producing objective, thoughtful, and well-reasoned content arguing your point. Anti-AI-safety content often focuses on attacking the people or the specific beliefs of the people in the AI safety/rationalist community. The epistemic effects of these attacks can be circumvented by avoiding association with that community as much as is reasonable, without being deceptive. A good example would be the YouTube channel AI in Context run by 80,000 Hours. They made an excellent AI 2027 video, coming at it from an objective perspective and effectively connecting the dots from the seemingly fantastical scenario to reality. That video is now approaching 10 million views on a completely fresh channel! See also SciShows recent episode on AI, which also garnered extremely positive reception.
The strong viewership on this type of content demonstrates that people are clearly receptive to the AI safety narrative if it’s done tastefully and logically. Most of the negative comments on these videos (anecdotally) come from people who believe that superintelligent AI is either impossible or extremely distant, not that reject the premise altogether. In my view, content like this would be affected very weakly by the type of attacks you are talking about in this post. To be blunt, to oversimplify, and to take the risk of being overconfident, I believe safety and caution narratives have the advantage over acceleration narratives by merit of being based in reality and logic! Imagine attempting to make a “counter” to the above videos trying to make the case that safety is no big deal. How would you even go about that? Would people believe you? Arguments are not won by truth alone, but it certainly helps.
The potential political impact seems more salient, but in my (extremely inexpert) opinion getting the public on your side will cause political figures to follow. The measures required to meaningfully impact AI outcomes require so much political will that extremely strong public opinion is required, and that extremely strong public opinion comes from a combination of real world impact and evidence(“AI took my job”) along with properly communicating the potential future and dangers (Like the content above). The more the public is on the side of an AI slowdown, the less impact a super PAC can have on politicians decisions regarding the topic (compare a world where 2 percent of voters say they support a pause on AI development to a world where 70 percent say they support it. In world 1 a politician would be easily swayed to avoid the issue by the threat of adversarial spending, but in world 2 the political risk of avoiding the issue is far stronger than the risk of invoking the wrath of the super PAC). This is not meant to diminish the very real harm that organized opposition can cause politically, or to downplay the importance of countering that political maneuvering in turn. Political work is extremely important, and especially so if well funded groups are working to push the exact opposite narrative to what is needed.
I don’t mean to diminish the potential harm this kind of political maneuvering can have, but in my view the future is bright from the safety activism perspective. I’ll also add that I don’t believe my view of “avoid labels” and your point about “standing proud and putting up a fight” are opposed. Both can happen parallelly, two fights at once. I strongly agree that backing down from your views or actions as a result of bad press is a mistake, and I don’t advocate for that here.
There’s a cottage industry that thrives off of sneering, gawking, and maligning the AI safety community. This isn’t new, but it’s probably going to become more intense and pointed now that there are two giant super PACs that (allegedly) see safety as a barrier to [innovation/profit, depending on your level of cynicism]. Brace for some nasty, uncharitable articles.
One such article came out yesterday; I think it’s a fairly representative example of the genre.
The advice and techniques from the rationality community seem to work well at avoiding a specific type of high-level mistake: they help you notice weird ideas that might otherwise get dismissed and take them seriously. Things like AI being on a trajectory to automate all intellectual labor and perhaps take over the world, animal suffering, longevity, cryonics. The list goes on.
This is a very valuable skill and causes people to do things like pivot their careers to areas that are ten times better. But once you’ve had your ~3-5 revelations, I think the value of these techniques can diminish a lot.[1]
Yet a lot of the rationality community’s techniques and culture seem oriented around this one idea, even on small scales: people pride themselves on being relentlessly truth-seeking and willing to consider possibilities they flinch away from.
On the margin, I think the rationality community should put more empasis on skills like:
I think very few people in the community could put together an analysis like this one from Eric Neyman on the value of a particular donation opportunity (see the section “Comparison to non-AI safety opportunities”). I’m picking this example not because it’s the best analysis of its kind, but because it’s the sort of analysis I think people should be doing all the time and should be practiced at, and I think it’s very reasonable to produce things of this quality fairly regularly.
When people do practice this kind of analysis, I notice they focus on Fermi estimates where they get good at making extremely simple models and memorizing various numbers. (My friend’s Anki deck includes things like the density of typical continental crust, the dimensions of a city block next to his office, the glide ratio of a hang glider, the amount of time since the last glacial maximum, and the fraction of babies in the US that are twins).
I think being able to produce specific models over the course of a few hours (where you can look up the glide ratio of a hang glider if you need it) is more neglected but very useful (when it really counts, you can toss the back of the napkin and use a whiteboard).
Simply noticing something might be a big deal is only the first step! You need to decide if it’s worth taking action (how big a deal is it exactly?) and what action to take (what are the costs and benefits of each option?). Sometimes it’s obvious, but often it isn’t, and these analyses are the best way I know of to improve at this, other than “have good judgement magically” or “gain life experience”.
Articulating all the assumptions underlying an argument
A lot of the reasoning I see on LessWrong feels “hand-wavy”: it makes many assumptions that it doesn’t spell out. That kind of reasoning can be valuable: often good arguments start as hazy intuitions. Plus many good ideas are never written up at all and I don’t want to make the standards impenetrably high. But I wish people recognized this shortcoming and tried to remedy it more often.
By “articulating assumptions” I mean outlining the core dynamics at play that seem important, the ways you think these dynamics work, and the many other complexities you’re ignoring in your simple model. I don’t mean trying to compress a bunch of Bayesian beliefs into propositional logic.
Contact with reality
It’s really really powerful to look at things directly (read data, talk to users, etc), design and run experiments, and do things in the world to gain experience.
Everyone already knows this, empiricism is literally a virtue of rationality. But I don’t see people employing it as much as they should be. If you’re worried about AI risk, talk to the models! Read raw transcripts!
Scholarship
Another virtue of rationality. It’s in the sequences, just not as present in the culture as you might expect. Almost nobody I know reads enough. I started a journal club at my company and after nearly every meeting folks tells me how useful it is. I so often see so much work that would be much better if the authors engaged with the literature a little more. Of course YMMV depending on the field you’re in; some literature isn’t worth engaging with.
Being overall skilled and knowledgeable and able to execute on things in the real world
Maybe this doesn’t count as a rationality skill per-se, but I think the meta skill of sitting down and learning stuff and getting good at it is important. In practice the average person reading this short form would probably be more effective if they spent their energy developing whatever specific concrete skills and knowledge were most blocking them.
This list is far from complete.[2] I just wanted to gesture at the general dynamic.
They’re still useful. I could rattle off a half-dozen times this mindset let me notice something the people around me were missing and spring into action.
I especially think there’s some skill that separates people with great research taste from people with poor research taste that might be crucial, but I don’t really know what it is well enough to capture it here.
I think very few people in the community could put together an analysis like this one from Eric Neyman on the value of a particular donation opportunity (see the section “Comparison to non-AI safety opportunities”).
Huh, FWIW, I thought this analysis was a quite classical example of streetlighting. It succeeded at quantifying some things related to the donation opportunity at hand, but it failed to cover the ones I considered most important. This seems like the standard error mode of this kind of estimate, and I was quite sad to see it here.
Like, the most important thing to estimate when evaluating a political candidate is their trustworthiness and integrity! It’s the thing that would flip the sign on whether supporting someone is good or bad for the world. The model is silent on this point, and weirdly, it indeed, when I talked to many others about it, seemed to serve as a semantic stopsign for asking the much more important questions about the candidate.
Like, I am strongly in favor of making quick quantitative model, but I felt like this one missed the target. I mean, like, it’s fine, I don’t think it was a bad thing, but at least various aspects about how it was presented made me think that Eric and others think this might come close to capturing the most important considerations, as opposed to a thing that puts some numbers on some second-order considerations that maybe become relevant once the more important questions are answered.
ETA: I think this comment is missing some important things and I endorse Habryka’s reply more than I endorse this comment
Like, the most important thing to estimate when evaluating a political candidate is their trustworthiness and integrity! It’s the thing that would flip the sign on whether supporting someone is good or bad for the world.
I agree that this is an important thing that deserved more consideration in Eric’s analysis (I wrote a note about it on Oct 22 but then I forgot to include it in my post yesterday). But I don’t think it’s too hard to put into a model (although it’s hard to find the right numbers to use). The model I wrote down in my note is
30% chance Bores would oppose an AI pause / strong AI regulations (b/c it’s too “anti-innovation” or something)
40% chance Bores would support strong regulations
30% chance he would vote for strong regulations but not advocate for them
90% chance Bores would support weak/moderate AI regulations
My guess is that 2⁄3 of the EV comes from strong regulations and 1⁄3 from weak regulations (which I just came up with a justification for earlier today but it’s too complicated to fit in this comment), so these considerations reduce the EV to 37% (i.e., roughly divide EV by 3).
FWIW I wouldn’t say “trustworthiness” is the most important thing, more like “can be trusted to take AI risk seriously”, and my model is more about the latter. (A trustworthy politician who is honest about the fact that they don’t care about AI safety will not be getting any donations from me.)
FWIW I wouldn’t say “trustworthiness” is the most important thing, more like “can be trusted to take AI risk seriously”, and my model is more about the latter.
No. Bad. Really not what I support. Strong disagree. Bad naive consequentialism.
Yes, of course I care about whether someone takes AI risk seriously, but if someone is also untrustworthy, in my opinion this serves as a multiplier of their negative impact on the world. I do not want to create scheming and untrustworthy stakeholders that start doing sketchy stuff around AI risk. That’s how really a lot of bad stuff in the past has already happened.
I think political donations to trustworthy and reasonable politicians who are open to AI X-risk, but don’t have an opinion on it are much better for the world (indeed, infinitely better due to inverted sign), than untrustworthy ones that do seem interested.
That said, I agree that you could put this in the model! I am not against quantitatively estimating integrity and trustworthiness, and think the model would be a bunch better for considering it.
Yes, of course I care about whether someone takes AI risk seriously, but if someone is also untrustworthy, in my opinion this serves as a multiplier of their negative impact on the world. I do not want to create scheming and untrustworthy stakeholders that start doing sketchy stuff around AI risk. That’s how really a lot of bad stuff in the past has already happened.
No-true-Scotsman-ish counterargument: no-one who actually gets AI risk would engage in this kind of tomfoolery. This is the behavior of someone who almost got it, but then missed the last turn and stumbled into the den of the legendary Black Beast of Aaargh. In the abstract, I think “we should be willing to consider supporting literal Voldemort if we’re sure he has the correct model of AI X-risk” goes through.
The problem is that it just totally doesn’t work in practice, not even on pure consequentialist grounds:
You can never tell whether Voldemorts actually understand and believe your cause, or whether they’re just really good at picking the right things to say to get you to support them. No, not even if you’ve considered the possibility that they’re lying and you still feel sure they’re not. Your object-level evaluations just can’t be trusted. (At least, if they’re competent at their thing. And if they’re not just evil, but also bad at it, so bad you can tell when they’re being honest, why would you support them?)
Voldemorts and their plans are often more incompetent than they seem,[1] and when their evil-but-”effective” plan predictably blows up, you and your cause are going to suffer reputational damage and end up in a worse position than your starting one. (You’re not gonna find an Altman, you’ll find an SBF.)
Voldemorts are naturally predisposed to misunderstanding the AI risk in precisely the ways that later make them engage in sketchy stuff around it. They’re very tempted to view ASI as a giant pile of power they can grab. (They hallucinate the Ring when they look into the Black Beast’s den, if I’m to mix my analogies.)
In general, if you’re considering giving power to a really effective but untrustworthy person because they seem credibly aligned with your cause, despite their general untrustworthiness (they also don’t want to die to ASI!), you are almost certainly just getting exploited. These sorts of people should be avoided like wildfire. (Even in cases where you think you can keep them in check, you’re going to have to spend so much effort paranoidally looking over everything they do in search of gotchas that it almost certainly wouldn’t be worth it.)
Probably because of that thing where if a good person dramatically abandons their morals for the greater good, they feel that it’s a monumental enough sacrifice for the universe to take notice and make it worth it.
A lot of Paranoia: A Beginner’s Guide is actually trying to set up a bunch of the prerequisites for making this kind of argument more strongly. In particular, a feature of people who act in untrustworthy ways, and surround themselves with unprincipled people, is that they end up sacrificing most of their sanity on the altar of paranoia.
Like, fiction HPMoR Voldemort happened to not have any adversaries who could disrupt his OODA loop, but that was purely a fiction. A world with two Voldemort-level competent players results in two people nuking their sanity as they try to get one over each other, and at that point, you can’t really rely on them having good takes, or sane stances on much of anything (or, if they are genuinely smart enough, them making an actually binding alliance, which via utilization of things like unbreakable vows is surprisingly doable in the HPMoR universe, but which in reality runs into many more issues).
Tone note: I really don’t like people responding to other people’s claims with content like “No. Bad… Bad naive consequentialism” (I’m totally fine with “Really not what I support. Strong disagree.”). It reads quite strongly to me as trying to scold someone or socially punish them using social status for a claim that you disagree with; they feel continuous with some kind of frame that’s like “habryka is the arbiter of the Good”
It sounds like scolding someone because it is! Like, IDK, sometimes that’s the thing you want to do?
I mean, I am not the “arbiter of the good”, but like, many things are distasteful and should be reacted to as such. I react similarly to people posting LLM slop on LW (usually more in the form of “wtf, come on man, please at least write a response yourself, don’t copy paste from an LLM”) and many other things I see as norm violations.
I definitely consider the thing I interpreted Michael to be saying a norm violation of LessWrong, and endorse lending my weight to norm enforcement of that (he then clarified in a way that I think largely diffused the situation, but I think I was pretty justified in my initial reaction). Not all spaces I participate in are places where I feel fine participating in norm enforcement, but of course LessWrong is one such place!
Now, I think there are fine arguments to be made that norm enforcement should also happen at the explicit intellectual level and shouldn’t involve more expressive forms of speech. IDK, I am a bit sympathetic to that, but feel reasonably good about my choices here, especially given that Michael’s comment started with “I agree”, therefore implying that the things he was saying were somehow reflective of my personal opinion. It seems eminently natural that when you approach someone and say “hey, I totally agree with you that <X>” where X is something they vehemently disagree with (like, IDK imagine someone coming to you and saying “hey, I totally agree with you that child pornography should be legal” when you absolutely do not believe this), that they respond the kind of way I did.
Overall, feedback is still appreciated, but I think I would still write roughly the same comment in a similar situation!
Michael’s comment started with “I agree”, therefore implying that the things he was saying were somehow reflective of my personal opinion
Michael’s comment started with a specific point he agreed with you on.
I agree that this is an important thing that deserved more consideration in Eric’s analysis
He specifically phrased the part you were objecting to as his opinion, not as a shared point of view.
FWIW I wouldn’t say “trustworthiness” is the most important thing, more like “can be trusted to take AI risk seriously”, and my model is more about the latter.
I am pretty sure Michael thought he was largely agreeing with me. He wasn’t saying “I agree this thing is important, but here is this totally other thing that I actually think is more important”. He said (and meant to say) “I agree this thing is important, and here is a slightly different spin on it”. Feel free to ask him!
I claim you misread his original comment, as stated. Then you scolded him based on that misreading. I made the case you misread him via quotes, which you ignored, instead inviting me to ask him about his intentions. That’s your responsibility, not mine! I’d invite you to check in with him about his meaning yourself, and to consider doing that in the future before you scold.
I mean, I think his intention in communicating is the ground truth! I was suggesting his intentions as a way to operationalize the disagreement. Like, I am trying to check that you agree that if that was his intention, and I read it correctly, then you agree that you were wrong to say that I misread him. If that isn’t the case then we have a disagreement about the nature of communication on our hand, which I mean, we can go into, but doesn’t sound super exciting.
I do happen to be chatting with Michael sometime in the next few days, so I can ask. Happy to bet about what he says about what he intended to communicate! Like, I am not overwhelmingly confident, but you seem to present overwhelming confidence, so presumably you would be up for offering me a bet at good odds.
I would generally agree, but a mitigating factor here is that that MichaelDickens is presenting himself as agreeing with habryka. It seems more reasonable for habryka to strongly push back against statements that make claims about his own beliefs.
Yeah I pretty much agree with what you’re saying. But I think I misunderstood your comment before mine, and the thing you’re talking about was not captured by the model I wrote in my last comment; so I have some more thinking to do.
I didn’t mean “can be trusted to take AI risk seriously” as “indeterminate trustworthiness but cares about x-risk”, more like “the conjunction of trustworthy + cares about x-risk”.
Fair enough. This doesn’t seem central to my point so I don’t really want to go down a rabbit-hole here. As I said originally “I’m picking this example not because it’s the best analysis of its kind, but because it’s the sort of analysis I think people should be doing all the time and should be practiced at, and I think it’s very reasonable to produce things of this quality fairly regularly.” I know this particular analysis surfaced some useful considerations others’ hadn’t thought of, and I learned things from reading it.
I also suspect you dislike the original analysis for reasons that stem from deep-seated worldview disagreements with Eric, not because the methodology is flawed.
I also suspect you dislike the original analysis for reasons that stem from deep-seated worldview disagreements with Eric, not because the methodology is flawed.
I think the methodology of elevating cost-effectiveness estimates that thereby (usually, at least at a community level) produce lots of naive consequentialist choices, is a large chunk of the deep-seated worldview disagreement!
I actually think I probably have it less with Eric than other people, but I think the disagreement here is at least not uncorrelated from the worldview divergence.
I know this particular analysis surfaced some useful considerations others’ hadn’t thought of, and I learned things from reading it.
Agree! I am glad to have read it and wish more people produced things like it. It’s also not particularly high on my list of things to strongly incentivize, but it’s nice because it scales well, and lots of people doing more things like this seems like it just makes things a bit better.
My only sadness about it comes from the context in which it was produced. It seems eminently possible to me to have a culture of producing these kinds of estimates without failing to engage with the most important questions (or like, to include them in your estimates somehow), but I think it requires at least a bit of intentionality, and in the absence of that does seem like a bit of a trap.
Is there reason to think that Bores or Wiener are not trustworthy or lack integrity? Genuine question, asking because it could affect my donation choices. (I couldn’t tell from your post if there were, e.g., rumors floating around about them, or if you were just using this as an example of a key question that you thought was missed in Neyman’s analysis.)
I mean, I think there are substantial priors that trustworthiness or lack of integrity differ quite a lot between different politicians.
That said, I overall had reasonably positive impressions after talking to Bores in-person. I… did feel a bit worried he was a bit too naive consequentialist, but various other things he said made me overall think he is a good person to donate to. But I am glad I talked to him since I was pretty uncertain before I did.
For “Performing simple cost-effectiveness estimates accurately”, I would like to be better at this but I feel like I’m weak on some intermediate skills. I’d appreciate a post laying out more of the pieces.
(A thing I find hard is somewhat related to the thing habryka is saying, where the real crux is often a murky thing that’s particularly hard to operationalize. Although in the case of the Eric Neyman thing, I think I separately asked those questions, and found Eric’s BOTEC useful for the thing it was trying to do)
When I was first trying to learn ML for AI safety research, people told me to learn linear algebra. And today lots of people I talk to who are trying to learn ML[1] seem under the impression they need to master linear algebra before they start fiddling with transformers. I find in practice I almost never use 90% of the linear algebra I’ve learned. I use other kinds of math much more, and overall being good at empiricism and implementation seems more valuable than knowing most math beyond the level of AP calculus.
The one part of linear algebra you do absolutely need is a really, really good intuition for what a dot product is, the fact that you can do them in batches, and the fact that matrix multiplication is associative. Someone smart who can’t so much as multiply matrices can learn the basics in an hour or two with a good tutor (I’ve taken people through it in that amount of time). The introductory linear algebra courses I’ve seen[2] wouldn’t drill this intuition nearly as well as the tutor even if you took them.
In my experience it’s not that useful to have good intuitions for things like eigenvectors/eigenvalues or determinants (unless you’re doing something like SLT). Understanding bases and change-of-basis is somewhat useful for improving your intuitions, and especially useful for some kinds of interp, I guess? Matrix decompositions are useful if you want to improve cuBLAS. Sparsity sometimes comes up, especially in interp (it’s also a very very simple concept).
The same goes for much of vector calculus. (You need to know you can take your derivatives in batches and that this means you write your d/dx as ∂/∂x or an upside-down triangle. You don’t need curl or divergence.)
I find it’s pretty easy to pick things like this up on the fly if you ever happen to need them.
Inasmuch as I do use math, I find I most often use basic statistics (so I can understand my empirical results!), basic probability theory (variance, expectations, estimators), having good intuitions for high-dimensional probability (which is the only part of math that seems underrated for ML), basic calculus (the chain rule), basic information theory (“what is KL-divergence?”), arithmetic, a bunch of random tidbits like “the log derivative trick”, and the ability to look at equations with lots of symbols and digest them.
In general most work and innovation[3] in machine learning these days (and in many domains of AI safety[4]) is not based in formal mathematical theory, it’s based on empiricism, fussing with lots of GPUs, and stacking small optimizations. As such, being good at math doesn’t seem that useful for doing most ML research. There are notable exceptions: some people do theory-based research. But outside these niches, being good at implementation and empiricism seems much more important; inasmuch as math gives you better intuitions in ML, I think reading more empirical papers or running more experiments or just talking to different models will give you far better intuitions per hour.
It’s pretty plausible to me that I’ve only been exposed to particularly mediocre math courses. My sample-size is small, and it seems like course quality and content varies a lot.
The standard counterargument here is these parts of AI safety are ignoring what’s actually hard about ML and that empiricism won’t work. For example we need to develop techniques that work on the first model we build that can self-improve. I don’t want to get into that debate.
Here’s some stuff that isn’t in your list that I think comes up often enough that aspiring ML researchers should eventually know it (and most of this is indeed universally known). Everything in this comment is something that I’ve used multiple times in the last month.
Linear algebra tidbits
Vector-matrix-vector products
Probably einsums more generally
And the derivative of an einsum wrt any input
Matrix multiplication of matrices of shape [A,B] and [B,C] takes 2ABC flops.
This stuff comes up when doing basic math about the FLOPs of a neural net architecture.
Stuff that I use as concrete simple examples when thinking about ML
A deep understanding of linear regression, covariance, correlation. (This is useful because it is a simple analogy for fitting a probabilistic model, and it lets you remember a bunch of important facts.)
Basic facts about (multivariate) Gaussians; Bayesian updates on Gaussians
Variance reduction, importance sampling. Lots of ML algorithms, e.g. value baselining, are basically just variance reduction tricks. Maybe consider the difference between paired and unpaired t-tests as a simple example.
This is relevant for understanding ML algorithms, for doing basic statistics to understand empirical results, and for designing sample-efficient experiments and algorithms.
Errors go as 1/sqrt(n) so sample sizes need to grow 4x if you want your error bars to shrink 2x
AUROC is the probability that a sample from distribution A will be greater than a sample from distribution B, this is the obvious natural way of comparing distributions over a totally ordered set
Maximum likelihood estimation, MAP estimation, full Bayes
The Boltzmann distribution (aka softmax)
And some stuff I’m personally very glad to know:
The Price equation/the breeder’s equation—we’re constantly thinking about how neural net properties change as you train them, it is IMO helpful to have the quantitative form of natural selection in your head as an example
SGD is not parameterization invariant; natural gradients
(barely counts) Conversions between different units of time (e.g. “there are 30M seconds in a year, there are 3k seconds in an hour, there are 1e5 seconds in a day”)
In general most work and innovation[3] in machine learning these days (and in many domains of AI safety[4]) is not based in formal mathematical theory, it’s based on empiricism, fussing with lots of GPUs, and stacking small optimizations. As such, being good at math doesn’t seem that useful for doing most ML research.
I think I somewhat disagree here, I think that often even good empirics-focused researchers have background informal and not-so-respectable models informed by mathematical intuition. Source is probably some Dwarkesh Patel interview, but I’m not sure which.
this feels intuitively true to me, but I’m also very biased—I’ve basically shovelled all of my skill points into engineering and research intuition, and have only a passable understanding of math, and this generally has not been a huge bottleneck for me. but maybe if I knew more math i’d know what I’m missing out on
I think this is largely right point by point, except that I’d flag that if you are rarely using eigendecomposition (mostly at the whiteboard, less so in code), you are possibly bottlenecked by a poor grasp of eigenvectors and eigenvalues.
Also, a fancy linear algebra education will tell you exactly how matrix log and matrix exponent work, but all you need is that 99% of the time any number manipulation you can do with regular logs and exponents will work completely unmodified with square matrices and matrix logs and exponents, but if you don’t know about matrix logs at all this will be a glaring hole: I use these constantly in actual code. ( Actually 99% is definitely sampling bias- for example, given matrices A and B, log(AB) only equals log(A) + log(B) if A and B share eigenvalues, and them being numerically equal may require being tricky about which branch of the log to pick, and my pleading may fall on deaf ears that well of course, but you’d only think to try it if they share eigenvalues and you’re doing an operation later that kills branch differences so in practice when you try it it works)
LessWrong feature request: make it easy for authors to opt-out of having their posts in the training data.
If most smart people were put in the position of a misaligned AI and tried to take over the world, I think they’d be caught and fail.[1] If I were a misaligned AI, I think I’d have a much better shot at succeeding, largely because I’ve read lots of text about how people evaluate and monitor models, strategies schemers can use to undermine evals and take malicious actions without being detected, and creative paths to taking over the world as an AI.
A lot of that information is from LessWrong.[2] It’s unfortunate that this information will probably wind up in the pre-training corpus of new models (though sharing the information is often still worth it overall to share most of this information[3]).
LessWrong could easily change this for specific posts! They could add something to their robots.txt to ask crawlers looking to scrape training data to ignore the pages. They could add canary strings to the page invisibly. (They could even go a step further and add something like copyrighted song lyrics to the page invisibly.) If they really wanted, they could put the content of a post behind a captcha for users who aren’t logged in. This system wouldn’t be perfect (edit: please don’t rely on these methods. They’re harm-reduction for information where you otherwise would have posted without any protections), but I think even reducing the odds or the quantity of this data in the pre-training corpus could help.
I would love to have this as a feature at the bottom of drafts. I imagine a box I could tick in the editor that would enable this feature (and maybe let me decide if I want the captcha part or not). Ideally the LessWrong team could prompt an LLM to read users’ posts before they hit publish. If it seems like the post might be something the user wouldn’t want models trained on, the site could could proactively ask the user if they want to have their post be removed from the training corpus if it seems likely the user might want that.
As far as I know, no other social media platform has an easy way to try to avoid having their data up in the training corpus (and many actively sell it for this purpose). So LessWrong would be providing a valuable service.
The actual decisions around what should or shouldn’t be part of the pre-training corpus seem nuanced: if we want to use LLMs to help with AI safety, it might help if those LLMs have some information about AI safety in their pre-training corpus (though adding that information back in during post-training might work almost as well). But I want to at least give users the option to opt out of the current default.
That’s not to say all misaligned AIs would fail; I think there will be a period where AIs are roughly as smart as me and thus could at least bide their time and hide their misalignment without being caught if they’d read LessWrong and might fail to do so and get caught if they hadn’t. But you can imagine we’re purchasing dignity points or micro-dooms depending on your worldview. In either case I think this intervention is relatively cheap and worthwhile.
Of course much of it is reproduced outside LessWrong as well. But I think (1) so much of it is still on LessWrong and nowhere else that it’s worth it, and (2) the more times this information is reported in the pre-training dats the more likely the model is to memorize it or have the information be salient to it.
And the information for which the costs of sharing it aren’t worth it probably still shouldn’t be posted even if the proposal I outline here is implemented, since there’s still a good chance it might leak out.
I worry that canary strings and robots.txt are ~basically ignored by labs and that this could cause people to share things that on the margin they wouldn’t if there were no such option[1]. Morereliable methods exist, but they come with a lot of overhead and I expect most users wouldn’t want to deal with it.
Especially since as the post says, canaries often don’t serve the purpose of detection either with publicly accessible models claiming ignorance of them.
Probably I should have included a footnote about this. I’m well aware that this is not a foolproof mechanism, but it still seems better than nothing and I think it’s very easy to have a disclaimer that makes this clear. As I said in the post, I think that people should only do this for information they would have posted on LessWrong anyway.
I disagree that these things are basically ignored by labs. My guess is many labs put some effort into filtering out data with the canary string, but that this is slightly harder than you might think and so they end up messing it up sometimes. (They might also sometimes ignore it on purpose, I’m not sure.)
Even if labs ignore the canary string now having the canary string in there would make it much easier to filter these things out if labs ever wanted to do that in the future.
I also suggest using better methods like captchas for non-logged-in users. I expect something like this to work somewhat well (though it still wouldn’t be foolproof).
I disagree that these things are basically ignored by labs. My guess is many labs put some effort into filtering out data with the canary string, but that this is slightly harder than you might think and so they end up messing it up sometimes. (They might also sometimes ignore it on purpose, I’m not sure.)
Our infrastructure has been under attack since August 2024. Large Language Model (LLM) web crawlers have been a significant source of the attacks, and as for the rest, we don’t expect to ever know what kind of entity is targeting our sites or why.
This makes the big deployments that I know about include:
The Linux Kernel Mailing List archives
FreeBSD’s SVN (and soon git)
SourceHut
FFmpeg
Wine
UNESCO
The Science Olympiad Student Center
Enlightenment (the desktop environment)
GNOME’s GitLab
The first notable one is the Gnome GitLab. And
3:57
from what I’ve learned talking with the CIS admin team, it was a hailmary. Mhm.
4:03
Right. It’s like nothing else worked. What could we lose? So you’ve had Sorry. GitLab pod
4:11
instantly scaled down to three from six. So you’ve had discussions since then. Um
I think having copyrighted content in between might work, but it depends on the the labs on how they’re processing it but it being really difficult to prevent AI scraping seems to be largely accurate.
I think the canary string is slightly more likely to work than the robots.txt. And the things you link don’t say which labs. I totally believe some labs scrape aggressively and ignore all robots.txt, but it wouldn’t surprise me if others don’t! In my opinion, if even one frontier lab doesn’t train on this data, that’s still quite valuable. And it seems so cheap to just try this as a bit of a Hail Mary.
I think success of anubis (as shown above) does tell us that a lot of them are not doing scraping in good faith. The fact that performing crypographic computation is needed to stop them is ridiculous. I am not sure whether I have a list but there’s even a case of :
Report: huawei bound scrapers are bypassing Anubis #978
Where they started performing the computations to scrape the data.
After further investigation and communication. This is not a bug. The threat actor group in question installed headless chrome and simply computed the proof of work. I’m just going to submit a default rule that blocks huawei.
We received complaints from customers who had both disallowed Perplexity crawling activity in their robots.txt files and also created WAF rules to specifically block both of Perplexity’s declared crawlers: PerplexityBot and Perplexity-User. These customers told us that Perplexity was still able to access their content even when they saw its bots successfully blocked. We confirmed that Perplexity’s crawlers were in fact being blocked on the specific pages in question, and then performed several targeted tests to confirm what exact behavior we could observe.
There has been multiple reports that openAI also ignores robots.txt but it’s much less agressive.
Even if various pages are not used in pretraining, they can still be read by the models in deployment, right? When Claude or GPT 5 go out and finds up to date information on the internet, they’re not checking the robots.txt files?
Absolutely! But having models that don’t know how to effectively scheme without access to external resources is still very useful. As just one example, we could interrogate the models and give them honeypots without giving them address to the internet.
Labs could also ban some of their models from reading certain webpages or ingesting certain information (as in filter at inference time all inputs to the model) in deployment (especially for models performing especially high-stakes tasks where sabotage would be easier).
I also suspect digesting a LessWrong post the model reads in deployment is more likely to involve explicit reasoning about the post in the CoT (or probes etc) and thus to be caught. By contrast, if a model has seen lots of of transcripts of misalignment in pre-training, the misaligned actions might be intuitive and second-nature, and thus harder to detect.
TurboTax and H&R Block famously lobby the US government to make taxes more annoying to file to drum up demand for their products.[1] But as far as I can tell, they eachonly spend ~$3-4 million a year on lobbying. That’s… not very much money (contrast it with the $60 billion the government gave the IRS to modernize its systems or the $4.9 billion in revenue Intuit made last fiscal year from TurboTax or the hundreds of millions of hours[2] spent that a return-free tax filing system could save).
Perhaps it would “just” take a multimillionaire and a few savvy policy folks to make the US tax system wildly better? Maybe TurboTax and H&R Block would simply up their lobbying budget if they stopped getting their way, but maybe they wouldn’t. Even if they do, I think it’s not crazy to imagine a fairly modest lobbying effort could beat them, since simpler tax filing seems popular across party lines/is rather obviously a good idea, and therefore may have an easier time making its case. Plus I wonder if pouring more money into lobbying hits diminishing returns at some point such that even a small amount of funding against TurboTax could go a long way.
Nobody seems to be trying to fight this. The closest things are an internal department of the IRS and some sporadic actions from broad consumer protection groups that don’t particularly focus on this issue (for example ProPublica wrote an amazing piece of investigative journalism in 2019 that includes gems like the below Intuit slide:)
In the meantime, the IRS just killed its pilot direct file program. While the program was far from perfect, it seemed to me like the best bet out there for eventually bringing the US to a simple return-free filing system, like the UK, Japan, and Germany use. It seems like a tragedy that the IRS sunset this program.[3]
In general, the amount of money companies spend on lobbying is often very low, and the harm to society that lobbying causes seems large. If anyone has examples of times folks tried standing up to corporate lobbying like this that didn’t seem to involve much money, I’d love to know more about how that’s turned out.
I haven’t deeply investigated how true this narrative is. It seems clear TurboTax/Intuit lobbies actively with this goal in mind, but it seems possible that policymakers are ignoring them and that filing taxes is hard for some other reason. That would at least explain why TurboTax and H&R Block spend so little here.
I don’t trust most sources that quote numbers like this. This number comes from this Brookings article from 2006, which makes up numbers just like everyone else but at least these numbers are made up by a respectable institution that doesn’t have an obvious COI.
In general, I love when the government lets the private sector compete and make products! I want TurboTax to keep existing, but it’s telling that they literally made the government promise not to build a competitor. That seems like the opposite of open competition.
Joe Bankman decided to make easy tax filing his personal mission, and he spent $30,000 to hire a lobbyist to counter lobbying by Intuit, the maker of TurboTax software.
“I can’t cure cancer,” says Bankman. “But I can help simplify tax filing.”
I had thought that Patrick McKenzie claims here, that lobbying by intuit is not the reason why US tax filing is so complicated, and actually it’s because of a republican advocacy group, that doesn’t want to simplify tax filing, because that would be a stealth tax hike.
But rereading the relevant section, I’m confused. It sounds like the relevant advocacy group is in favor of simplifying the tax system, and in particular, removing withholding?
It is widely believed in the tech industry that the reason the United States requires taxpayers to calculate their own tax returns, which is not required in many peer nations, is because Intuit (who make Turbotax, the most popular software for doing one’s taxes) spends money lobbying policymakers to oppose the IRS creating a competing product. People who believe this have a poorly calibrated understanding about the political economy of taxation in the American context.
I will have to take notice about uncontroversial but politically inflected facts about the world we live in to describe why you must use the software you use. If you’d prefer to not get politics mixed in with your finances and software, mea maxima culpa. That said, a democratically accountable government which deputizes the private sector to achieve state aims is invariably subject to the political process, and this is on net a good thing. To the extent one has a complaint about the outcome, one’s complaint is not with some unaccountable or corrupt actor in a smoky backroom somewhere. It is with one’s countrymen.
In particular, the tech industry zeitgeist that blames Intuit for us needing tax preparation software fails to understand the preferences of Congressional representatives of the Republican Party. Any fairminded observer of U.S. politics understands the Republicans to be institutionally extremely interested in tax policy (and tax rates in particular), in the sense that doctors are interested in heart attacks. Their most recent platform includes the quote “Republicans consider the establishment of a pro-growth tax code a moral imperative. More than any other public policy, the way the government raises revenue—how much, at what rates, under what circumstances, from whom, and for whom—has the greatest impact on our economy’s performance.” This is far from the only flag proudly planted by the elected representatives who enjoy the enthusiastic support of about half of Americans for their views on tax administration.
The specific policy implications of those shared values are frequently outsourced, in a fashion extremely common in Washington and critical to your understanding of U.S. politics. Washington has an unofficial ecosystem of organizations and public intellectuals who, by longstanding practice, have substantial influence on policy. When a Republican candidate promises to voters that they are anti-tax, as their voters (particularly in primaries) demand they must be, the thing they will offer in support of that is “Grover Norquist gave me a passing grade.”
Norquist runs Americans for Tax Reform, a non-profit political advocacy group which opposes all tax increases. ATR is institutionally skeptical of withholding, because they believe that withholding allows one to increase taxes by stealth. I don’t think it is excessively partisan to say that, if one phrases that claim a bit more neutrally as “withholding increases tax compliance by decoupling public sentiment and policy changes,” the people who designed the withholding system would say “I’m glad the National Archives makes our design documents so accessible. We wrote them to be read!”
And, relevant to the question of whether Intuit controls U.S. tax policy: it can’t, because that would imply they have wrested control from Norquist. Norquist considers a public filing option a tax increase by stealth and opposes it automatically. (I offer in substantiation ATR’s take on a specific policy, which was bolded for emphasis in the original: “Americans for Tax Reform rejects the use of unauthorized taxpayer dollars being used to expand the IRS into the tax preparation business and urges states to reject participation in the program.” You can find much more in the same vein.)
Interesting! How did Norquist/Americans for Tax Reform get so much influence? They seem to spend even less money than Intuit on lobbying, but maybe I’m not looking at the right sources or they have influence via means other than money?
I’m also somewhat skeptical of the claims. The agreement between the the IRS and the Free File Alliance feels too favorable to the Free File Alliance for them to have had no hand in it.
As to your confusion, I can see why an advocacy group that wants to lower taxes might want the process of filing taxes to be painful. I’m just speculating, but I bet the fact that taxes are annoying to file and require you to directly confront the sizable sum you may owe the government makes people favor lower taxes and simpler tax codes.
As to your confusion, I can see why an advocacy group that wants to lower taxes might want the process of filing taxes to be painful. I’m just speculating, but I bet the fact that taxes are annoying to file and require you to directly confront the sizable sum you may owe the government makes people favor lower taxes and simpler tax codes.
This is what I remembered the piece as saying, but unless I’m misreading it now, that’s not actually in the text.
The world seems bottlenecked on people knowing and trusting each other. If you’re a trustworthy person who wants good things for the world, one of the best ways to demonstrate your trustworthiness is by interacting with people a lot, so that they can see how you behave in a variety of situations and they can establish how reasonable, smart, and capable you are. You can produce a lot of value for everyone involved by just interacting with people more.
I’m an introvert. My social skills aren’t amazing, and my social stamina is even less so. Yet I drag myself to parties and happy hours and one-on-one chats because they pay off.
It’s fairly common for me to go to a party and get someone to put hundreds of thousands of dollars towards causes I think are impactful, or to pivot their career, or to tell me a very useful, relevant piece of information I can act on. I think each of those things individually happens more than 15% of the time that I go to a party.
(Though this is only because I know of unusually good cause areas and career opportunities. I don’t think I could get people to put money or time towards random opportunities. This is a positive-sum interaction where I’m sharing information!)
Even if talking to someone isn’t valuable in the moment, knowing lots of people comes in really handy. Being able to directly communicate with lots of people in a high-bandwidth way lets you quickly orient to situations and get things done.
I try to go to every party I’m invited to that’s liable to have new people, and I very rarely turn down an opportunity to chat with a new person. I give my calendar link out like candy. Consider doing the same!
Talking to people is hits-based
Often, people go to an event and try to talk to people but it isn’t very useful, and they give up on the activity forever. Most of the time you go to an event it will not be that useful. But when it is useful, it’s extremely useful. With a little bit of skill, you can start to guess what kinds of conversations and events will be most useful (it is often not the ones that are most flashy and high-status).
Building up trust takes time
Often when I get good results from talking to people, it’s because I’ve already talked to them a few times at parties and I’ve established myself as a trustworthy person that they know.
Talking to people isn’t zero-sum
When I meet new people, I try to find ways I can be useful to them. (Knowing lots of people makes it easier to help other folks because often you can produce value by connecting people to each other.) And when I help the people I’m talking to, I’m also helping myself because I am on the same team as them. I want things that are good for the world, and so do most other people. I’m not sure the strategy is in this short form would work at all if I was trying to trick investors into overvaluing my startup or convincing people to work for me when that wasn’t in their best interest.
I think this is the main way that “talking to people”, as I’m using the term here, differs from “networking”.
Be genuine
When I talk to people, I try to be very blunt and earnest. I happen to like hanging out with people who are talented and capable, so I typically just try to find good conversations I enjoy. I build up friendships and genuine trust with people (by being a genuinely trustworthy person doing good things, not by trying to signal trust in complicated ways). I think I have good suggestions for things people should do with their money and time, and people are often very happy to hear these things.
Sometimes I do seek out specific people for specific reasons. If I’m only talking to someone because they have information/resources that are of interest to me, I try to directly (though tactfully) acknowledge that. Part of my vibe is that I’m weirdly goal-oriented/mission-driven, and I just wear that on my sleeve because I think the mission I drive towards is a good one.
I also try to talk to all kinds of folks and often purposefully avoid “high-status” people. In my experience, chasing them is usually a distraction anyway and the people in the interesting conversations are more worth talking to.
You can ask to be invited to more social events
When I encourage people to go to more social events, often they tell me that they’re not invited to more. In my experience, messaging the person you know who is most into going to social events and asking if they can invite you to stuff works pretty well most of the time. Once you’re attending a critical mass of social events, you’ll find yourself invited to more and more until your calendar explodes.
The other day I was speaking to one of the most productive people I’d ever met.[1] He was one of the top people in a very competitive field who was currently single-handedly performing the work of a team of brilliant programmers. He needed to find a spot to do some work, so I offered to help him find a desk with a monitor. But he said he generally liked working from his laptop on a couch, and he felt he was “only 10% slower” without a monitor anyway.
I was aghast. I’d been trying to optimize my productivity for years. A 10% productivity boost was a lot! Those things compound! How was this man, one of the most productive people I’d ever met, shrugging it off like it was nothing?
I think this nonchalant attitude towards productivity is fairly common in top researchers (though perhaps less so in top executives?). I have no idea why some people are so much more productive than others. It surprises me that so much variance is even possible.
This guy was smart, but I know plenty of people as smart as him who are far less productive. He was hardworking, but not insanely so. He wasn’t aggressively optimizing his productivity.[2] He wasn’t that old so it couldn’t just be experience. Probably part of it was luck, but he had enough different claims to fame that that couldn’t be the whole picture.
If I had to chalk it up to something, I guess I’d call it skill and “research taste”: he had a great ability to identify promising research directions and follow them (and he could just execute end-to-end on his ideas without getting lost or daunted, but I know how to train that).
I want to learn this skill, but I have no idea how to do it and I’m still not totally sure it’s real. Conducting research obviously helps, but that takes time and is clearly not sufficient. Maybe I should talk to a bunch of researchers and try to predict the results of their work?
Has anyone reading this ever successfully cultivated an uncanny ability to identify great research directions? How did you do it? What sub-skills does it require?
Am I missing some other secret sauce that lets some people produce wildly more valuable research than others?
Measured by more conventional means, not by positive impact on the long-term future; that’s dominated by other people. Making sure your work truly steers at solving the world’s biggest problems still seems like the best way to increase the value you produce, if you’re into that sort of thing. But I think this person’s abilities would multiply/complement any benefits from steering towards the most impactful problems.
Hmm, honestly, how do you know that he is one of the most productive people? Like, I have found these kinds of things surprisingly hard to evaluate, and a lot of successful research is luck, so maybe he just got lucky, but not like a “genetic lottery” kind of lucky, but more of a “happened to bet on the right research horse” kind of lucky in a way that I wouldn’t necessarily expect to generalize into the future.
I am partially saying this because I have personally observed a lot of the opposite. Somewhat reliably the most productive people I know have very strong opinions about how they work. And to be clear, most of them do actually not use external monitors a lot of the time (including me myself), so I don’t think this specific preference is that interesting, but they do tend to have strong opinions.
My other hypothesis is just that the conversation somehow caused them to not expose which aspects of their work habits they care a lot about, and the statement about “this merely makes me 10% slower”, was something they wouldn’t actually reflectively endorse. More likely they don’t think of their work as something that has that much of a local “efficiency” attribute to it, and so when they thought through the monitor question, they substituted the productivity question for one that’s more like “how many more to-do list items would I get through if I had a monitor”. If you forced them to consider a more holistic view of their productivity, my guess is some answer like “oh, but by working on a couch I am much more open to get up and talk to other people or start pacing around, and that actually makes up for the loss here”.
Ways training incentivizes and disincentivizes introspection in LLMs.
Recentworkhas shown some LLMs have some ability to introspect. Many people were surprised to learn LLMs had this capability at all. But I found the results somewhat surprising for another reason: models are trained to mimic text, both in pre-training and fine-tuning. Almost every time a model is prompted in training to generate text related to introspection, the answer it’s trained to give is whatever answer the LLMs in the training corpus would say, not what the model being trained actually observes from its own introspection. So I worry that even if models could introspect, they might learn to never introspect in response to prompting.
We do see models act consistently with this hypothesis sometimes: if you ask a model how many tokens it sees in a sentence or instruct it to write a sentence that has a specific number of tokens in it, it won’t answer correctly.[1] But the model probably “knows” how many tokens there are; it’s an extremely salient property of the input, and the space of possible tokens is a very useful thing for a model to know since it determines what it can output. At the very least models can be trained to at semi-accurately count tokens and conform their outputs to short token limits.
I presume the main reason models answer questions about themselves correctly at all is because AI developers very deliberately train them to do so. I bet that training doesn’t directly involve introspection/strongly noting the relationship between the model’s internal activations and the wider world.
So what could be going on? Maybe the way models learn to answer any questions about themselves generalizes? Or maybe introspection is specifically useful for answering those questions and instead of memorizing some facts about themselves, models learn to introspect (this could especially explain why they can articulate what they’ve been trained to do via self-awareness alone).
But I think the most likely dynamic is that in RL settings[2] introspection that affects the model’s output is sometimes useful. Thus it is reinforced. For example, if you ask a reasoning model a question that’s too hard for it to know the answer to, it could introspect to realize it doesn’t know the answer (which might be more efficient than simply memorizing every question it does or doesn’t know the answer to). Then it could articulate in the CoT that it doesn’t know the answer, which would help it avoid hallucinating and ultimately produce the best output it could given the constraints.
One other possibility is the models are just that smart/self-aware and aligned towards being honest and helpful. They might have an extremely nuanced world-model, and since they’re trained to honestly answer questions,[3] they could just put the pieces together and introspect (possibly in a hack-y or shallow way).
Overall these dynamics make introspection a very thorny thing to study. I worry it could go undetected in some models or it could seem like a model can introspect in a meaningful way when it only has shallow abilities reinforced directly by processes like the above (for example knowing when they don’t know something [because that might have been learned during training], but not knowing in general how to query their internal knowledge on topics in other related ways).
Technically this could apply to fine-tuning settings too, for example if the model uses a CoT to improve its final answers enough to justify the CoT not being maximally likely tokens.
In theory at least. In reality I think this training does occur but I don’t know how well it can pinpoint honesty vs several things that are correlated with it (and for things like self-awareness those subtle correlates with truth in training data seem particularly pernicious).
But the model probably “knows” how many tokens there are; it’s an extremely salient property of the input
This doesn’t seem that clear to me; what part of training would incentivize the model to develop circuits for exact token-counting? Training a model to adhere to a particular token budget would do some of this, but it seems like it would have relatively light pressure on getting exact estimates right vs guessing things to the nearest few hundred tokens.
One way to test this would be to see if there are SAE features centrally about token counts; my guess would be that these show up in some early layers but are mostly absent in places where the model is doing more sophisticated semantic reasoning about things like introspection prompts. Ofc this might fail to capture the relevant sense of “knowing” etc, but I’d still take it as fairly strong evidence either way.
Ideas for how to spend very large amounts of money to improve AI safety:
If AI companies’ valuations continue to skyrocket (or if new very wealthy actors start to become worried about AI risk), there might be a large influx of funding into the AI safety space. Unfortunately, it’s not straightforward to magically turn money into valuable AI safety work. Many things in the AI safety ecosystem are more bottlenecked on having a good founder with the right talent and context, or having good researchers.
Here’s a random incomplete grab-bag of ideas for ways you could turn money into reductions in AI risk at large scales. I think right now there are much better donation opportunities available. This is not a list of donation recommendations right now, it’s just suggestions for once all the low-hanging funding fruit has been plucked. Probably if people thought more they could come up with even better scalable opportunities. There’s also probably existing great ideas I neglected to list. But these at least give us a baseline and a rough sense of what dumping a bunch of money into AI safety could look like. I’m also erring towards listing more things rather than fewer. Some of these things might actually be bad ideas.
Bounties to reward AIs for reporting misaligned behavior in themselves or other agents.
Folks have run a couple small experiments on this already. It seems straight-forward to execute and like it could absorb almost unbounded amounts of capital.
Paying high enough salaries to entice non-altruistically-motivated AI company employees to work on safety.
This isn’t only bottlenecked on funding. Many people are very loyal to the AI companies they work for, and the very best employees aren’t very sensitive to money since they already have plenty of money. It seems absurdly expensive for Meta to try hiring away people at other AI companies, and they didn’t seem to get that much top talent from it. On the one hand, working on safety is a much more compelling case than working at Meta, but on the other hand, maybe people who aren’t already doing safety research find AI capabilities research more intrinsically fun and interesting or rewarding than safety research. I am also concerned that people who do capabilities research might not be great at safety research because they might not feel as passionate or inspired by it, and because it is a somewhat different skillset.
In the most extremely optimistic world, you could probably hire 50 extremely talented people by offering them $100M/year each (matching what Meta offered). You could probably also hire ~200 more junior people at $10M/year (the bottleneck on hiring more would be management capacity). So in total you could spend $7B/year.
Over time, I expect this to get more expensive since AI companies’ valuations will increase, and therefore, so will employee compensation.
Compute for AI safety research.
Day-to-day, the AI safety researchers I know outside of AI labs don’t seem to think they’re very bottlenecked on compute. However, the AI safety researchers I know inside AI labs claim they get a lot of value from having gobs and gobs of compute everywhere. Probably, AI safety researchers outside labs are just not being imaginative enough about what they could do with tons of compute. This also isn’t entirely money-bottlenecked. Probably part of it is having the infrastructure in place and the deals with the compute providers, etc. And running experiments on lots of compute can be more fiddly and time-consuming. Even so I bet with a lot more money for compute, people would be able to do much better safety research.
Very roughly, I guess this could absorb ~$100 million a year.
Compute for running AI agents to automate AI safety research.
This doesn’t work today since AIs can’t automate AI safety research. But maybe in the future they will be able to, and you’ll be able to just dump money into this almost indefinitely.
Pay AI companies to do marginal cheap safety interventions.
Maybe you can just pay AI companies to implement safety interventions that are only very slightly costly for them. For example, you could subsidize having really good physical security in their data centers. I think a lot of things AI companies could do to improve safety will be costly enough for the companies that it will be very hard to pay them enough to make up for that cost, especially in worlds where AI companies’ valuations have increased a lot from where they are today. But there’s probably still some opportunities here.
Raising awareness of AI safety.
There’s lots of proven ways to spend money to raise awareness of things (sponsor youtube channels, patronize movies about AI risk, etc). Maybe raising awareness of safety is good because it gets more people to work on safety or gets the government to do more sensible things about AI risk or lets consumers encourage companies to implement more safety interventions.
I couldn’t easily find an American public awareness campaign that cost more than ~$80M/year (for anti-smoking). Coca Cola spends ~$4 billion a year on advertising, but I think that if AI safety were spending as much money as Coca-Cola, it would backfire. I think maybe $500M/year is a reasonable cap on what could be spent?
Biodefense. Buy everyone in the US PPE.
One way that an AI could cause a catastrophe is via designing a bioweapon. One way to reduce the odds that a bioweapon causes a civilization-ending catastrophe is to make sure that everyone has enough PPE that they won’t die. Andrew Snyder-Beattie has elaborated on this idea here. I think this could absorb ~$3B ($3/mask * 350M Americans * 3 masks/person).
Buy foreign AI safety researchers gold cards.
Many great AI safety researchers are on visas. It would be convenient if they had green cards. You can buy green cards now for $1M each. Let’s say there’s a hundred such people, so this opportunity could absorb $100M.
Overall, these are not amazing opportunities. But they give a lower bound and illustrate how it’s possible to turn money into reduced risk from AI at scale, even if you don’t have more entrepreneurs building new organizations. In practice, I think if money slowly ramps up into the space over time, there will be much better opportunities than these, and you will simply see AI safety organizations that have grown to be major research institutions that are producing wonderful research. This is just a floor.
A lot of these ideas came from other people and have generally been floating around for a while. Thanks to everybody I talk to about this.
In the most extremely optimistic world, you could probably hire 50 extremely talented people by offering them $100M/year each (matching what Meta offered). You could probably also hire ~200 more junior people at $10M/year (the bottleneck on hiring more would be management capacity). So in total you could spend $7B/year.
Over time, I expect this to get more expensive since AI companies’ valuations will increase, and therefore, so will employee compensation.
I don’t know that the idea is fundamentally good but at least is scales somewhat with the equity of the safety-sympathetic people at labs?
Here’s my attempt at a neutral look at Prop 50, which people in California can vote on Tuesday (Nov 4th). The bill seems like a case-study in high-stakes game theory and when to cooperate or defect.
The bill would allow the CA legislature to re-write the congressional district maps until 2030 (when district-drawing would go back to normal). Currently, the district maps are drawn by an independent body designed to be politically neutral. In essence, this would allow the CA legislature to gerrymander California. That would probably give Democrats an extra 3-5 seats in Congress. It seems like there’s a ~17% chance that it swings the House in the midterms.
Gerrymandering is generally agreed to be a bad thing, since it means elections are determined on the margin more by the map makers and less by the people. The proponents of this bill don’t seem to think otherwise. They argue the bill is in response to Texas passing a similar bill to redistrict in a way that is predicted to give Republicans 5 new house seats (not to mention similar bills in North Carolina and Missouri that would give republicans an additional 2 seats).
Trump specifically urged Texas, North Carolina, and Missouri to pass their bills, and the rationale was straightforwardly to give Republicans a greater chance at winning the midterms. For example, Rep. Todd Hunter, the author of Texas’s redistricting bill, said “The underlying goal of this plan is straightforward, [to] improve Republican political performance”.
Notably some Republicans have also tried to argue that the Texas bill is in response to Democrats gerrymandering and obstructionism, but this doesn’t match how Trump seems to have described the rationale originally.[1]
The opponents of Prop 50 don’t seem to challenge the notion that the Republican redistricting was bad.[2] They just argue that gerrymandering is bad for all the standard reasons.
So, it’s an iterated prisoners’ dilemma! Gerrymandering is bad, but the Republicans did it, maybe the Democrats should do it to (1) preserve political balance and (2) punish/disincentivize Republicans’ uncooperative behavior.
Some questions you might have:
Will this actually disincentivize gerrymandering? Maybe the better way to disincentivize it is to set a good example.
Generally I’m skeptical of arguments like “the other guys defect in this prisoners’ dilemma and so you should too”. In practice, it’s often hard to tell why someone is defecting or for the counterparty to credibly signal that they would in fact switch to the cooperate-cooperate equilibrium if it was available. Real life is messy, it’s easy to defect and blame it on your counterparty defecting even when they didn’t, and being the kind of person who will legibly reliably cooperate when it counts is very valuable. For these reasons I tend to err towards being cooperative in practice.
In this case, if CA passes Prop 50, maybe republican voters won’t see it as a consequence of Republican gerrymandering and will simply interpret it as “the Democrats gerrymander and go whatever uncooperative behavior gets them the most votes. We need to do whatever it takes to win” or “everyone gerrymanders, gerrymandering is normal and just part and parcel of how the sausage is made”.
On top of that, I’m wary of ending up in one of the defect-defect equilibria tit-for-tat is famous for. Tit-for-two-tats and forgiveness are sometimes helpful.
But I think Prop 50 handles these things fairly well. The bill only lasts until 2030 and has been framed explicitly and clearly as in direct response to redistricting in Texas. (In fact Governor Newsom’s original proposal was to make Prop 50 “Preserves California’s current congressional maps if Texas or other states also keep their original maps.” That provision was removed once Texas solidified its redistricting.) Fretting too much about if Republicans will take even more aggressive actions because of this bill also incentives Republicans to be more aggressive in their responses and to pay less attention to Democrats’ rationales, which seems bad.
Moreover, if Democrats are benefiting similarly to Republicans from gerrymandering, perhaps this creates more bipartisan support for federal regulation banning gerrymandering. In general, where possible, I think it’s good to have laws preventing this kind of uncooperative behavior rather than relying on both parties managing to hit cooperate in a complicated prisoner’s dilemma.
Are the costs to society simply too large to be worth it?
In some ways, Prop 50 undoes some of the damage of redistricting in Texas: in Texas republicans gained 5 seats in a way that isn’t as representative as it should have been, so by undoing that and giving Democrats 3-5 extra seats, the system becomes more representative. But in some ways two wrongs don’t make a right here: at the end of the day both Texans and California end up less representative. For instance, if you think it’s more important for congress being made up of politicians who represent their constituents well and less important that constituents’ views are represented federally.
Notably even if you buy that argument you might still think Prop 50 is worth it if you think the punishing effects are worth it.
What’s the historical context? If this is a prisoner’s dilemma, how much has each side hit cooperate in the past?
Republicans have sometimes said their redistricting bills are a response to Democrats’ gerrymandering. If so, maybe they’re justified. Let’s look into it! You can read the history here or look at an interactive map here.
It seems like Republicans engaged in a major, unprovoked bout of gerrymandering in 2010 with REDMAP. Since then both parties have tried to gerrymander and occasionally succeeded. Overall, Republicans have gerrymandered somewhat more than Democrats, but Democrats have still engaged in blatant gerrymandering, for example, in Illinois in 2021. In searching for more right-leaning narratives, I found that Brookings estimated in 2023 that no party majorly benefited from gerrymandering more than another at the time, regardless of how much they’d engaged in it. I haven’t really found a great source for anyone claiming Democrats have overall benefited more from gerrymandering.
Democrats have also tried to propose a bill to ban gerrymandering federally, the Freedom to Vote Act. (This bill also included some other provisions apart from just banning gerrymandering, like expanding voter registration and making Election Day a federal holiday.) The Freedom to Vote Act was widely opposed by Republicans and I don’t know of any similar legislation they’ve proposed to ban gerrymandering.
So overall, it seems like Republicans have been engaging in more gerrymandering than Democrats and been doing less to fix the issue.
Republicans have also argued the new districts in Texas represent the Hispanic population better, though they tend to frame this more as a reason it’s good and less as the reason they pursued this redistricting in the first place.
Specifically, they say “While Newsom and CA Democrats say Prop 50 is a response to Trump and Texas redistricting, California shouldn’t retaliate and sacrifice its integrity by ending fair elections.”
One argument against the bill that I didn’t explore above (because I haven’t actually heard anyone make it) is that the only reason Democrats aren’t gerrymandering more is because gerrymandering seems more helpful to Republicans for demographic reasons. But Democrats try to do other things that are arguably designed to give them more votes. For example, loosening voter ID laws. So maybe each party should
carefully respond to the ways the other party tries to sneakily get themselves more votes in very measured ways that properly disincentivize bad behaviorengage in a crazy ever-escalating no-holds-barred race to the bottom.I think it’s good that the Republicans and Democrats have been somewhat specific that their attempts at gerrymandering are only retaliation against other gerrymandering, and not retaliation against things like this
My understanding is that voter ID laws are probably net helpful for Democrats at this point.
To elaborate on this, a model of voting demographics is that the most engaged voters vote no matter what hoops they need to jump through, so rules and laws that make voting easier increase the share of less engaged voters. This benefits whichever party is comparatively favored by these less engaged voters. Historically this used to be the Democrats, but due to education polarization they’ve become the party of the college-educated nowadays. This is also reflected in things like Trump winning the Presidential popular vote in 2024. (Though as a counterpoint, this Matt Yglesias article from 2022 claims that voter ID laws “do not have a discernible impact on election results” but doesn’t elaborate.)
In addition, voter ID laws are net popular, so Democrats advocating against them hurts them both directly (advocating for an unpopular policy) and indirectly (insofar as it increases the pool of less engaged voters).
Seen in the light of Section 2 of the Voting Rights Act asymmetrically binding Republicans, what you’re calling an “unprovoked bout of gerrymandering” might be better understood as an attempt to reduce the unfair advantage Democrats have had nationally for decades.
If I am reading things correctly, section 2 of the Voting Rights Act says:
(and subsection (b) clarifies this in what seem like straightforward ways).
It seems to me that if this “asymmetrically binds Republicans” then the conclusion is “so much the worse for the Republicans” not “so much the worse for the Voting Rights Act”.
As for “the unfair advantage Democrats have had nationally for decades”:
https://www.cookpolitical.com/cook-pvi/2022-partisan-voter-index/republican-electoral-college-advantage says that the Electoral College gives Republicans a ~2% advantage in presidential elections
https://fivethirtyeight.com/features/the-senates-rural-skew-makes-it-very-hard-for-democrats-to-win-the-supreme-court/ says that “the Senate is effectively 6 to 7 percentage points redder than the country as a whole”
https://fivethirtyeight.com/features/advantage-gop/ says that “The Electoral College’s Republican bias in 2020 thus averaged out to 3.5 points”.
Why different years (2022, 2020, 2020)? Because each of those was the first thing I found when searching for articles from at-least-somewhat-credible outlets about structural advantages for one or another party in presidential, Senate, and House races. I make no claim that those figures are representative of, say, the last 20 years, but I don’t think it’s credible to talk about “the unfair advantage Democrats have had nationally for decades” when all three of the major national institutions people in the US get to vote for have recently substantially favoured Republicans in the sense that to get equal results Democrats would need substantially more than equal numbers of votes.
The problem with gerrymandering is that it makes elections less representative. It seems to me that (section 2 of) the Voting Rights Act makes elections more representative, so that’s good. It seems reasonable to be mad at republicans when they implement measures that make elections less representative that benefit them, but not when you want elections to stay less fair.
Humanity has only ever eradicated two diseases (and one of those, rinderpest, is only in cattle not humans). The next disease on the list is probably Guinea worm (though polio is also tantalizingly close).
At its peak Guinea worm infected ~900k people a year. In 2024 we so far only know of 7 cases. The disease isn’t deadly, but it causes significant pain for 1-3 weeks (as a worm burrows out of your skin!) and in ~30% of cases that pain persists afterwards for about a year. In .5% of cases the worm burrows through important ligaments and leaves you permanently disabled. Eradication efforts have already saved about 2 million DALYs.[1]
I don’t think this outcome was overdetermined; there’s no recent medical breakthrough behind this progress. It just took a herculean act of international coordination and logistics. It took distributing millions of water filters, establishing village-based surveillance systems in thousands of villages across multiple countries, and meticulously tracking every single case of Guinea worm in humans or livestock around the world. It took brokering a six-month ceasefire in Sudan (the longest humanitarian ceasefire in history!) to allow healthcare workers to access the region. I’ve only skimmed the history, and I’m generally skeptical of historical heroes getting all the credit, but I tentatively think it took Jimmy Carter for all of this to happen.
Rest in peace, Jimmy Carter.
I’m compelled to caveat that top GiveWell charities are probably in the ballpark of $50/DALY, and the Carter Center has an annual budget of ~$150 million a year, so they “should” be able to buy 2 million DALYs every single year by donating to more cost-effective charities. But c’mon this worm is super squicky and nearly eradicating it is an amazing act of agency.
I don’t think you need that footnoted caveat, simply because there isn’t $150M/year worth of room for more funding in all of AMF, Malaria Consortium’s SMC program, HKI’s vitamin A supplementation program, and New Incentives’ cash incentives for routine vaccination program all combined; these comprise the full list of GiveWell’s top charities.
Another point is that the benefits of eradication keep adding up long after you’ve stopped paying for the costs, because the counterfactual that people keep suffering and dying of the disease is no longer happening. That’s how smallpox eradication’s cost-effectiveness can plausibly be less than a dollar per DALY averted so far and dropping (Guesstimate model, analysis). Quoting that analysis:
Notes on living semi-frugally in the Bay Area.
I live in the Bay Area, but my cost of living is pretty low: roughly $30k/year. I think I live an extremely comfortable life. I try to be fairly frugal, both so I don’t end up dependent on jobs with high salaries and so that I can donate a lot of my income, but it doesn’t feel like much of a sacrifice. Often when I tell people how little I spend, they’re shocked. I think people conceive of the Bay as exorbitantly expensive, and it can be, but it doesn’t have to be.
Rent: I pay ~$850 a month for my room. It’s a small room in a fairly large group house I live in with nine friends. It’s a nice space with plenty of common areas and a big backyard. I know of a few other places like this (including in even pricier areas like Palo Alto). You just need to know where to look and to be willing to live with friends. On top of rent I pay ~$200/month (edit: I was missing one expense, it’s more like $300) for things like utilities, repairs on the house, and keeping the house tidy.
I pool the grocery bill with my housemates so we can optimize where we shop a little. We also often cook for each other (notably most of us, including myself, also get free meals on weekdays in the offices we work from, though I don’t think my cost of living was much higher when I was cooking for myself each day not that long ago). It works out to ~$200/month.
I don’t buy that much stuff. I thrift most of my clothes, but I buy myself nice items when it matters (for example comfy, somewhat-expensive socks really do make my day better when I wear them). I have a bunch of miscellaneous small expenses like my Claude subscription, toothpaste, etc, but they don’t add up to much.
I don’t have a car, a child, or a pet (but my housemate has a cat, which is almost the same thing).
I try to avoid meal delivery and Ubers, though I use them in a pinch. Public transportation costs aren’t nothing, but they’re quite manageable.
I actually have a PA who helps me with some personal accounting matters that I’m particularly bad at handling myself. He works remotely from Canada and charges $15/hour. I probably average a few hours of his time each week.
I shy away from super expensive hobbies or events, but I still partake when they seem really fulfilling. Most of the social events I’m invited to are free. I take a couple (domestic) non-work trips each year, usually to visit family.
I also have occasional surprise $500-$7,000 expenses, like buying a new laptop when mine breaks. Call that an extra $10k a year.
In many ways, I’m very fortunate to be able to have this lifestyle.
I honestly feel a little bewildered by how much money people around me spend and how dependent some people seem on earning a very large salary. Many people around me also seem kind of anxious about their financial security, even though they earn a good amount of money. Because my lifestyle is pretty frugal, I feel very good about how much runway I have.
I realize that people’s time is often extremely valuable, and I absolutely believe you can turn money into more time. Sometimes people around me are aghast at how much time I waste walking to the office or sitting on the BART. But for me, I don’t think I would actually be much more productive if I spent 10x as much money on productivity, and it feels extremely freeing to know I could quit my (nonprofit) job any time and fairly easily scrape by. I recommend at least considering it, if you haven’t already.
I also live in the Bay area, and live similarly.
Pay 800-ish a month in rent for one room in a shared house.
Pay a few hundred a month for a PA to help me with tasks like laundry and packaging supplements.
Walk to and from work, am happy to use ubers when I travel farther afield.
Eat almost exclusively at the office, and generally buy simple groceries that require minimal prep rather than eating out.
If I think something might make me more effective, and it costs less than ~150, I buy it and try it out, and give it away if it doesn’t work out. (Things like “kneeling chair”, “lifting shoes”, “heart rate monitor”, “backpack that’s better for running in”, “shirt that might fit me”, “heavy duty mask and filters”, “textbooks”, “bluetooth headphones”, “extra chargers”.)
I currently save (and invest) something like 90% of my income. Though my my income has changed a lot in different years. When I’m working a lot less on paid projects, and don’t have a salary, I make less money, and only save like 20% to 40%.
However, I’m semi-infamously indifferent to fun (and to most forms of physical pleasure), and I spend almost all my time working or studying. So my situation probably doesn’t generalize to most people.
Note that most people either have or want children, which changes the calculus here: you need a larger place (often a whole house if you have many or want to live with extended family), and are more likely to benefit from paying a cleaner/domestic help (which is surprisingly expensive in the Bay and cannot be hired remotely). Furthermore, if you’re a meat-eater and want to buy ethically sourced meat or animal products, this increases the cost of food a lot.
I want to push back on the idea of needing a large[1] place if you have a family.
In the US a four person family will typically live in a 2,000-2,500 square foot place, but in Europe the same family will typically live in something like 1,000-1,400 square feet. In Asia it’s often less, and earlier in the US’s history it also was much less than what it is today.
If smaller sizes work for others across time and space I believe it is often sufficient for people in the US today.
Well, you just said “larger”.
Yeah that’s fair. But the lifestyle of ~$850 a month room in a group house isn’t that nice if you have many kids, and so it makes sense that people benefit from more money to afford a nicer place.
And like, sure, you can get by on less money than some people assume, but the original comment imo understates how much you and your family benefit from more money (e.g the use of “bewildered”).
As the father of 2 kids (a 5 y/o and 2 y/o) in Palo Alto, I can confirm that childcare is a lot. $2k per kid per month at our subsidized academic-affiliation rate. At $48k, it’s almost the entirety of my wife’s PhD salary. Fortunately, I have a well-paying job and we are not strapped for money.
We also got along with just an e-bike for 6 years, saving something like $15k per year in car insurance and gas (save for 9 months when we had the luxury of borrowing a car from family) [Incorrect, see below]. We got a car recently due to a longer commute, but even then, I still use the e-bike almost everyday because the car is not much faster and overlapping with exercise time is valuable (plus the 5 y/o told me he likes fresh air),
For clothes/toys/etc., we’ve used Facebook market place, “Buy Nothing” groups, and our neighbors to source pretty much everything. The best toys have just been cardboard, masking tape, and scissors, which are very cheap.
[Edit: As comments below point out, the figure for no-car savings was incorrect. It’s closer to $8k, taking into account gas, insurance, maintenance, and repairs. Apologies for the embellishment—I think it was from a combination of factors including (i) being proud of previously not owning a car, (ii) making enough not to track it closely, and (iii) deferring to my spouse for most of our household payments/financial management (which is not great on my part—she is busy and household management is a real burden).
To shore up my credibility on child care, I pulled our receipts, and we’re currently at $2,478 per month for the toddler, and $1,400 per month for the kindergartener’s after-school program (though cheaper options were available for the after-school program).]
Is $15k a year typical for car insurance? In the UK it’s a few hundred dollars a year at most unless you’re a very young or very risky driver.
It can vary enormously based on risk factors, choice of car, and quantity of coverage, but that does still sound extremely high to me. I think even if you’re a 25-yo male with pretty generous coverage above minimum liability, you probably won’t be paying more than ~$300/mo unless you have recent accidents on your record. Gas costs obviously scale ~linearly with miles driven, but even if your daily commute is a 40 mile round-trip, that’s still only like $200/mo. (There are people with longer commutes than that, but not ones that you can easily substitute for with an e-bike; even 20 miles each way seems like a stretch.)
Thank you both for calling this out, because I was clearly incorrect. I was trying to recall my wife’s initial calculation, which I believe included maintenance, insurance, gas, and repairs.
I think this is one of those things where I was so proud of not owning a car that the amount saved morphed from $8k to $10k to $15k in the retelling. I need to stop doing that.
Also, I’m feeling some whiplash reading my reply because I totally sound like an LLM when called out for a mistake. Maybe similar neural pathways for embellishment were firing, haha.
What group of people is this claim supposed to refer to, LessWrong readers? The world population?
I was thinking about US adults, but I’d guess it applies to LW readers and world adult population also.
69% of US adults say they have children, 15% do not but still want to (source)
My rent, also in a small room in a Bay Area group house, is around $1050. This is an interesting group house phenomenon where if rent is $1800 on average, the good rooms go for $2600 and the bad ones have to be $1000 to balance out total rent. The best rooms in a group house are a limited supply good and bc people (or even couples) often are indifferent between group house with good social scene and a $4000 luxury 1bed, prices are roughly similar. There is lots of road noise, but I realized I could pay $1000 for extra-thick blackout curtains, smart lightbulbs, etc. to mitigate this, which has saved me thousands over the past couple of years.
As for everything else, my sense is it’s not for most people. To have expenses as low as OP’s you basically need to have only zero-cost or cost-saving hobbies like cooking and thrifting, and enjoy all aspects of them. I got into cooking at one point but didn’t like shopping and wanted to use moderately nice ingredients, so when cooking for my housemates the ingredients (from an expensive grocery store through Instacart) came out to $18/serving. A basic car is also super useful, bay area or not.
I am probably one of the people OP mentions, with a bunch of financial anxiety despite being able to save close to $100k/year, but this is largely due to a psychological block keeping me from investing most of my money.
This resonates with me. I’ve always been a fan of Mr. Money Mustache’s perspective that it doesn’t take much money at all to live a really awesome life, which I think is similar to the perspective you’re sharing.
Some thoughts:
Housing is huge. And living with friends is a huge help. But I think for a lot of people that isn’t a pragmatic option (tied to an area; friends unwilling or incompatible; need privacy), and then they get stuck paying a lot for housing.
Going car free helps a lot. Unfortunately, I think most places in North America make this somewhat difficult, and the places that don’t tend to have high housing costs.
Traveling is expensive. Flights, hotels, Ubers, food. I find myself in lots of situations where I feel socially obligated to travel, like for weddings and stuff, and so end up traveling maybe 4-6x/year, but this isn’t the hardest thing in the world to avoid. You could explain to people that you have a hard budget for two trips a year.
Spending $200/month or whatever on food means being strategic about ingredients. Which I very much think is doable, but yeah, it requires a fair amount of agency.
So… the biggest savings here, by far, is the rent. At a guess it’s bigger than everything else combined. If you don’t have enough friends, or your friends all live in full houses, guess you’re screwed here. Hope we’re OK with the tyranny of structurelessness?
There’s a cottage industry that thrives off of sneering, gawking, and maligning the AI safety community. This isn’t new, but it’s probably going to become more intense and pointed now that there are two giant super PACs that (allegedly[1]) see safety as a barrier to [innovation/profit, depending on your level of cynicism]. Brace for some nasty, uncharitable articles.
I think the largest cost of this targeted bad press will be the community’s overreaction, not the reputational effects outside the AI safety community. I’ve already seen people shy away from doing things like donating to politicians that support AI safety for fear of provoking the super PACs.
Historically, the safety community often freaked out in the face of this kind of bad press. People got really stressed out, pointed fingers about whose fault it was, and started to let the strong frames in the hit pieces get into their heads.[2] People disavowed AI safety and turned to more popular causes. And the collective consciousness decided that the actions and people who ushered in the mockery were obviously terrible and dumb, so much so that you’d get a strange look if you asked them to justify that argument. In reality I think many actions that were publicly ridiculed were still worth it ex-ante despite the bad press.
It seems bad press is often much, much more salient to the subjects of that press than it is to society at large, and it’s best to shrug it off and let it blow over. Some of the most PR-conscious people I know are weirdly calm during actual PR blowups and are sometimes more willing than the “weird” folks around me to take dramatic (but calculated) PR risks.
In the activist world, I hear this is a well-known phenomenon. You can get 10 people to protest a multi-billion-dollar company and a couple journalists to write articles, and the company will bend to your demands.[3] The rest of the world will have no idea who you are, but to the executives at the company, it will feel the world is watching them. These executives are probably making a mistake![4] Don’t be like them.
With all these (allegedly anti-safety[1]) super PACs, there will probably be a lot more bad press than usual. All else being equal, avoiding the bad press is good, but in order to fight back, people in the safety community will probably take some actions, and the super PACs will probably twist any actions into headlines about cringe doomer tech bros.
I do think people should take into account when deciding what to do that provoking the super PACs is risky, and should think carefully before doing it. But often I expect it will be the right choice and the blowback will be well worth it.
If people in the safety community refuse to stand up to them, then they super PACs will get what they want anyway and the safety community won’t even put up a fight.
Ultimately I think the AI safety community is an earnest, scrupulous group of people fighting for an extremely important cause. I hope we continue to hold ourselves to high standards for integrity and honor, and as long as we do, I will be proud to be part of this community no matter what the super PACs say.
They haven’t taken any anti-safety actions yet as far as I know (they’re still new). The picture they paint of themselves isn’t opposed to safety, and while I feel confident they will take actions I consider opposed to safety, I don’t like maligning people before they’ve actually taken actions worthy of condemnation.
I think it’s really healthy to ask yourself if you’re upholding your principles and acting ethically. But I find it a little suspicious how responsive some of these attitudes can be to bad press, where people often start tripping over themselves to distance themselves from whatever the journalist happened to dislike. If you’ve ever done this, consider asking yourself before you take any action how you’d feel if the fact that you took that action was on the front page of the papers. If you’d feel like you could hold your head up high, do it. Otherwise don’t. And then if you do end up on the front page of the papers, hold your head up high!
To a point. They won’t do things that would make them go out of business, but they might spend many millions of dollars on the practices you want them to adopt.
Tactically, that is. In many cases I’m glad the executives can be held responsible in this way and I think their changed behavior is better for the world.
I do wish this was the case, but as I have written many times in the past, I just don’t think this is an accurate characterization. See e.g.: https://www.lesswrong.com/posts/wn5jTrtKkhspshA4c/michaeldickens-s-shortform?commentId=zoBMvdMAwpjTEY4st
I don’t think the AI safety community has particularly much integrity or honor. I would like to make there be something in the space that has those attributes, but please don’t claim valor we/you don’t have!
For context, how would you rank the AI safety community w.r.t. integrity and honor, compared to the following groups:
1. AGI companies
2. Mainstream political parties (the organizations, not the voters, so e.g. the politicians and their staff)
3. Mainstream political movements e.g. neoliberalism, wokism, china hawks, BLM,
4. A typical university department
5. Elite opinion formers (e.g. the kind of people whose Substacks and op-eds are widely read and highly influential in DC, silicon valley, etc.)
6. A typical startup
7. A typical large bloated bureaucracy or corporation
8. A typical religion e.g. christianity, islam, etc.
9. The US military
My current best guess is that you have a higher likelihood of being actively deceived/have someone actively plot to mislead you/have someone put in very substantial optimization pressure to get you to believe something false or self-serving, if you interface with the AI safety community than almost any of the above.
A lot of that is the result of agency, which is often good, but in this case a double-edged sword. Naive consequentialism and lots of intense group-beliefs make the appropriate level of paranoia when interfacing with the AI Safety community higher than with most of these places.
“Appropriate levels of paranoia when interfacing with you” is of course not the only measure of honor and integrity, though as I am hoping to write about sometime this week, it’s kind of close to the top.
On that dimension, I think the AI Safety community is below AGI companies and the US military, and above all the other ones on this list. For the AGI companies, it’s unclear to me how much of it is the same generator. Approximately 50% of the AI Safety community are employed by AI labs, and they have historically made up a non-trivial fraction of the leadership of those companies, so those datapoints are highly correlated.
This is a wild claim. Don’t religions sort of centrally try to get you to believe known-to-be-false claims? Don’t politicians famously lie all the time?
Are you saying that EAs are better at deceiving people than typical members of those groups?
Are you claiming that members of those groups may regularly spout false claims, but they’re actually not that invested in getting others to believe them?
Can you be more specific about the way in which you think AI Safety folk are worse?
I agree that institutionally they are set up to do a lot of that, but the force they bring to bear on any individual is actually quite small in my experience, compared to what I’ve seen in AI safety spaces. Definitely lots of heterogeneity here, but most optimization that religions do to actually keep you believing in their claims are pretty milquetoast.
Definitely in-expectation! I think SBF, Sam Altman, Dario, Geoff Anders plus a bunch of others are pretty big outliers on these dimensions. I think in-practice there is a lot variance between individuals, with a very high-level gloss being something like “the geeks are generally worse, unless they make it an explicit optimization target, but there are a bunch of very competent sociopaths around, in the Venkatesh Rao sense of the word, which seem a lot more competent and empowered than even the sociopaths in other communities”.
Yeah, that’s a good chunk of it. Like, members of those groups do not regularly sit down and make extensive plans about how to optimize other people’s beliefs in the same way as seems routine around here. Some of it is a competence side-effect. Paranoia becomes worse the more competent your adversary is. The AI Safety community is a particularly scary adversary in that respect (and one that due to relatively broad buy-in for something like naive-consequentialism can bring more of its competence to bear on the task of deceiving you).
I’ve been around the community for 10 years. I don’t think I’ve ever seen this?[1]
Am I just blind to this? Am I seeing it all the time, except I have lower standards what should “count”? Am I just selected out of such conversations somehow?
I currently work for an org that is explicitly focused on communicating the AI situation to the world, and to policymakers in particular. We are definitely attempting to be strategic about that, and we put a hell of a lot of effort into doing it well (eg running many many test sessions, where we try to explain what’s up to volunteers, see what’s confusing, and adjust what we’re saying).
(Is this the kind of thing you mean?)
But, importantly, we’re clear about trying to frankly communicate our actual beliefs, including our uncertainties, and are strict about adhering to standards of local validity and precise honesty: I’m happy to talk with you about the confusing experimental results that weaken our high level claims (though admittedly, under normal time constraints, I’m not going to lead with that).
Pretty much every day, I check “If someone had made this argument against [social media], would that have made me think that it was imperative to shut it down?”, about proffered anti AI arguments.
Also, come on, this seems false. I am pretty sure you’ve seen Leverage employees do this, and my guess is you’ve seen transcripts of chats of this happening with quite a lot of agency at FTX with regards to various auditors and creditors.
(Some) Leverage people used to talk as if they were doing this kind of thing, though it’s not like they let me in on their “optimize other people” planning meetings. I’m not counting chat transcripts that I read of meetings that I wasn’t present for.
Ah, OK, if you meant “see” in the literal sense, then yeah, seems more plausible, but also kind of unclear what its evidential value is. Like, I think you know that it happened a bunch. I agree we don’t want to double count evidence, but I think your message implied that you thought it wasn’t happening, not that it was happening and you just hadn’t seen it.
Well what I’ve seen personally bears on frequently with which this happens.
I think FTX and Leverage are regarded to be particularly bad and outlier-y cases, along several dimensions, including deceptiveness and willingness to cause harm.
If our examples are limited to those two groups, I don’t think that alone justifies saying that it is “routine” in the EA community to “regularly sit down and make extensive plans about how to optimize other people’s beliefs”.
I think you’re making a broader claim that this is common even beyond those particularly extreme examples.
Totally agree I haven’t established representativeness! I was just talking about what I think the natural implication of your comment was.
Yeah, that does sound roughly like what I mean, and then I think most people just drop the second part:
I do not think that SBF was doing this part. He was doing the former though!
My best guess you are doing a mixture of:
Indeed self-selecting yourself out of these environments
Having a too-narrow conception of the “AI Safety community” that forms a Motte where you conceptually exclude people who do this a lot (e.g. the labs themselves), but in a way that then makes posts like the OP we are commenting on misleading
Probably have somewhat different standards for this (indeed, a thing I’ve updated on over the years is that a lot of powerful optimization can happen here between people, where e.g. one party sets up a standard in good-faith, and then another party starts goodharting on that standard in largely good-faith, and the end-result is a lot of deception).
Do you have an example of this? (It sounds like you think that I might be participating in this dynamic on one side or the other.)
I think this is roughly what happened when FTX was spending a huge amount of money before it all collapsed and a lot of people started new projects under pretty dubious premises to look appealing to them. I also think this is still happening quite a lot around OpenPhil, with a lot of quite bad research being produced, and a lot of people digging themselves into holes (and also trying to enforce various norms that don’t really make sense, but where they think if they enforce it, they are more likely to get money, which does unfortunately work).
Is this not common in politics? I thought this was a lot of what politics was about. (Having never worked in politics.)
And corporate PR campaigns too for that matter.
I have been very surprised by how non-agentic politics is! Like, there certainly is a lot of signaling going on, but when reading stuff like Decidingtowin.org it becomes clear how little optimization actually goes into saying things that will get you voters and convince stakeholders.
I do think a lot of that is going on there, and in the ranking above I would probably put the current political right above AI safety and the current political left below AI safety. Just when I took the average it seemed to me like it would end up below, largely as a result of a severe lack of agency as documented in things like deciding-to-win.
Re corporate campaigns: I think those are really very milquetoast. Yes, you make cool ads, but the optimization pressure here seems relatively minor (barring some intense outliers, like Apple and Disney, which I do think are much more agentic here than others, and have caused pretty great harm in doing so, like Disney being responsible for copyright being far too long in the US because Disney was terribly afraid of anyone re-using their characters and so tainting Disney’s image).
Are you combining Venkatesh Rao’s loser/clueless/sociopath taxonomy with David Chapman’s geek/mop/sociopath?
(ETA: I know this is not relevant to the discussion, but I confuse these sometimes.)
Oh, oops, yep, I confused the two. I meant geek/mop/sociopath in the David Chapman sense. Thank you for the catch!
Huh. Seems super wrong to me fwiw. How would you rank AIFP on the list?
Very low, though trending a bit higher over time. The policy-focused playbook has to deal with a lot more trickiness here than AI-2027, and you have to deal more with policymakers and stuff, but currently y’all don’t do very much of the kind of thing I am talking about here.
I really appreciate your clear-headedness at recognizing these phenomena even in people “on the same team”, i.e. people very concerned about and interested in preventing AI X-Risk.
However, I suspect that you also underrate the amount of self-deception going on here. It’s much easier to convince others if you convince yourself first. I think people in the AI Safety community self-deceive in various ways, for example by choosing to not fully think through how their beliefs are justified (e.g. not acknowledging the extent to which they are based on deference—Tsvi writes about this in his recent post rather well).
There are of course people who explicitly, consciously, plan to deceive, thinking things like “it’s very important to convince people that AI Safety/policy X is important, and so we should use the most effective messaging techniques possible, even if they use false or misleading claims.” However, I think there’s a larger set of people who, as they realize claims A B C are useful for consequentialist reasons, internally start questioning A B C less, and become biased to believe A B C themselves.
Sure! I definitely agree that’s going on a lot as well. But I think that kind of deception is more common in the rest of the world, and the things that set this community apart from others is the ability to do something more intentional here (which then combined with plenty of self-deception can result in quite catastrophic outcomes, as FTX illustrates).
This is not good. Why should people run the risk of interacting with the AI safety community if this is true?
I do think it’s not good! But also, it’s an important issue and you have to interface with people who aren’t super principled all the time. I just don’t want people to think of the AI Safety community as some kind of community of saints. I think it’s pretty high variance, and you should have your guard up a good amount.
For those that rely on intelligence enhancement as a component of their AI safety strategy, it would be a good time to get your press lines straight. The association of AI safety with eugenics (whether you personally agree with that label or not) strikes me as a soft target and a simple way to keep AI safety as a marginal movement.
I think a good counter to this from the activism perspective is avoiding labels and producing objective, thoughtful, and well-reasoned content arguing your point. Anti-AI-safety content often focuses on attacking the people or the specific beliefs of the people in the AI safety/rationalist community. The epistemic effects of these attacks can be circumvented by avoiding association with that community as much as is reasonable, without being deceptive. A good example would be the YouTube channel AI in Context run by 80,000 Hours. They made an excellent AI 2027 video, coming at it from an objective perspective and effectively connecting the dots from the seemingly fantastical scenario to reality. That video is now approaching 10 million views on a completely fresh channel! See also SciShows recent episode on AI, which also garnered extremely positive reception.
The strong viewership on this type of content demonstrates that people are clearly receptive to the AI safety narrative if it’s done tastefully and logically. Most of the negative comments on these videos (anecdotally) come from people who believe that superintelligent AI is either impossible or extremely distant, not that reject the premise altogether. In my view, content like this would be affected very weakly by the type of attacks you are talking about in this post. To be blunt, to oversimplify, and to take the risk of being overconfident, I believe safety and caution narratives have the advantage over acceleration narratives by merit of being based in reality and logic! Imagine attempting to make a “counter” to the above videos trying to make the case that safety is no big deal. How would you even go about that? Would people believe you? Arguments are not won by truth alone, but it certainly helps.
The potential political impact seems more salient, but in my (extremely inexpert) opinion getting the public on your side will cause political figures to follow. The measures required to meaningfully impact AI outcomes require so much political will that extremely strong public opinion is required, and that extremely strong public opinion comes from a combination of real world impact and evidence(“AI took my job”) along with properly communicating the potential future and dangers (Like the content above). The more the public is on the side of an AI slowdown, the less impact a super PAC can have on politicians decisions regarding the topic (compare a world where 2 percent of voters say they support a pause on AI development to a world where 70 percent say they support it. In world 1 a politician would be easily swayed to avoid the issue by the threat of adversarial spending, but in world 2 the political risk of avoiding the issue is far stronger than the risk of invoking the wrath of the super PAC). This is not meant to diminish the very real harm that organized opposition can cause politically, or to downplay the importance of countering that political maneuvering in turn. Political work is extremely important, and especially so if well funded groups are working to push the exact opposite narrative to what is needed.
I don’t mean to diminish the potential harm this kind of political maneuvering can have, but in my view the future is bright from the safety activism perspective. I’ll also add that I don’t believe my view of “avoid labels” and your point about “standing proud and putting up a fight” are opposed. Both can happen parallelly, two fights at once. I strongly agree that backing down from your views or actions as a result of bad press is a mistake, and I don’t advocate for that here.
One such article came out yesterday; I think it’s a fairly representative example of the genre.
The advice and techniques from the rationality community seem to work well at avoiding a specific type of high-level mistake: they help you notice weird ideas that might otherwise get dismissed and take them seriously. Things like AI being on a trajectory to automate all intellectual labor and perhaps take over the world, animal suffering, longevity, cryonics. The list goes on.
This is a very valuable skill and causes people to do things like pivot their careers to areas that are ten times better. But once you’ve had your ~3-5 revelations, I think the value of these techniques can diminish a lot.[1]
Yet a lot of the rationality community’s techniques and culture seem oriented around this one idea, even on small scales: people pride themselves on being relentlessly truth-seeking and willing to consider possibilities they flinch away from.
On the margin, I think the rationality community should put more empasis on skills like:
Performing simple cost-effectiveness estimates accurately
I think very few people in the community could put together an analysis like this one from Eric Neyman on the value of a particular donation opportunity (see the section “Comparison to non-AI safety opportunities”). I’m picking this example not because it’s the best analysis of its kind, but because it’s the sort of analysis I think people should be doing all the time and should be practiced at, and I think it’s very reasonable to produce things of this quality fairly regularly.
When people do practice this kind of analysis, I notice they focus on Fermi estimates where they get good at making extremely simple models and memorizing various numbers. (My friend’s Anki deck includes things like the density of typical continental crust, the dimensions of a city block next to his office, the glide ratio of a hang glider, the amount of time since the last glacial maximum, and the fraction of babies in the US that are twins).
I think being able to produce specific models over the course of a few hours (where you can look up the glide ratio of a hang glider if you need it) is more neglected but very useful (when it really counts, you can toss the back of the napkin and use a whiteboard).
Simply noticing something might be a big deal is only the first step! You need to decide if it’s worth taking action (how big a deal is it exactly?) and what action to take (what are the costs and benefits of each option?). Sometimes it’s obvious, but often it isn’t, and these analyses are the best way I know of to improve at this, other than “have good judgement magically” or “gain life experience”.
Articulating all the assumptions underlying an argument
A lot of the reasoning I see on LessWrong feels “hand-wavy”: it makes many assumptions that it doesn’t spell out. That kind of reasoning can be valuable: often good arguments start as hazy intuitions. Plus many good ideas are never written up at all and I don’t want to make the standards impenetrably high. But I wish people recognized this shortcoming and tried to remedy it more often.
By “articulating assumptions” I mean outlining the core dynamics at play that seem important, the ways you think these dynamics work, and the many other complexities you’re ignoring in your simple model. I don’t mean trying to compress a bunch of Bayesian beliefs into propositional logic.
Contact with reality
It’s really really powerful to look at things directly (read data, talk to users, etc), design and run experiments, and do things in the world to gain experience.
Everyone already knows this, empiricism is literally a virtue of rationality. But I don’t see people employing it as much as they should be. If you’re worried about AI risk, talk to the models! Read raw transcripts!
Scholarship
Another virtue of rationality. It’s in the sequences, just not as present in the culture as you might expect. Almost nobody I know reads enough. I started a journal club at my company and after nearly every meeting folks tells me how useful it is. I so often see so much work that would be much better if the authors engaged with the literature a little more. Of course YMMV depending on the field you’re in; some literature isn’t worth engaging with.
Being overall skilled and knowledgeable and able to execute on things in the real world
Maybe this doesn’t count as a rationality skill per-se, but I think the meta skill of sitting down and learning stuff and getting good at it is important. In practice the average person reading this short form would probably be more effective if they spent their energy developing whatever specific concrete skills and knowledge were most blocking them.
This list is far from complete.[2] I just wanted to gesture at the general dynamic.
They’re still useful. I could rattle off a half-dozen times this mindset let me notice something the people around me were missing and spring into action.
I especially think there’s some skill that separates people with great research taste from people with poor research taste that might be crucial, but I don’t really know what it is well enough to capture it here.
Huh, FWIW, I thought this analysis was a quite classical example of streetlighting. It succeeded at quantifying some things related to the donation opportunity at hand, but it failed to cover the ones I considered most important. This seems like the standard error mode of this kind of estimate, and I was quite sad to see it here.
Like, the most important thing to estimate when evaluating a political candidate is their trustworthiness and integrity! It’s the thing that would flip the sign on whether supporting someone is good or bad for the world. The model is silent on this point, and weirdly, it indeed, when I talked to many others about it, seemed to serve as a semantic stopsign for asking the much more important questions about the candidate.
Like, I am strongly in favor of making quick quantitative model, but I felt like this one missed the target. I mean, like, it’s fine, I don’t think it was a bad thing, but at least various aspects about how it was presented made me think that Eric and others think this might come close to capturing the most important considerations, as opposed to a thing that puts some numbers on some second-order considerations that maybe become relevant once the more important questions are answered.
ETA: I think this comment is missing some important things and I endorse Habryka’s reply more than I endorse this comment
I agree that this is an important thing that deserved more consideration in Eric’s analysis (I wrote a note about it on Oct 22 but then I forgot to include it in my post yesterday). But I don’t think it’s too hard to put into a model (although it’s hard to find the right numbers to use). The model I wrote down in my note is
30% chance Bores would oppose an AI pause / strong AI regulations (b/c it’s too “anti-innovation” or something)
40% chance Bores would support strong regulations
30% chance he would vote for strong regulations but not advocate for them
90% chance Bores would support weak/moderate AI regulations
My guess is that 2⁄3 of the EV comes from strong regulations and 1⁄3 from weak regulations (which I just came up with a justification for earlier today but it’s too complicated to fit in this comment), so these considerations reduce the EV to 37% (i.e., roughly divide EV by 3).
FWIW I wouldn’t say “trustworthiness” is the most important thing, more like “can be trusted to take AI risk seriously”, and my model is more about the latter. (A trustworthy politician who is honest about the fact that they don’t care about AI safety will not be getting any donations from me.)
No. Bad. Really not what I support. Strong disagree. Bad naive consequentialism.
Yes, of course I care about whether someone takes AI risk seriously, but if someone is also untrustworthy, in my opinion this serves as a multiplier of their negative impact on the world. I do not want to create scheming and untrustworthy stakeholders that start doing sketchy stuff around AI risk. That’s how really a lot of bad stuff in the past has already happened.
I think political donations to trustworthy and reasonable politicians who are open to AI X-risk, but don’t have an opinion on it are much better for the world (indeed, infinitely better due to inverted sign), than untrustworthy ones that do seem interested.
That said, I agree that you could put this in the model! I am not against quantitatively estimating integrity and trustworthiness, and think the model would be a bunch better for considering it.
No-true-Scotsman-ish counterargument: no-one who actually gets AI risk would engage in this kind of tomfoolery. This is the behavior of someone who almost got it, but then missed the last turn and stumbled into the den of the legendary Black Beast of Aaargh. In the abstract, I think “we should be willing to consider supporting literal Voldemort if we’re sure he has the correct model of AI X-risk” goes through.
The problem is that it just totally doesn’t work in practice, not even on pure consequentialist grounds:
You can never tell whether Voldemorts actually understand and believe your cause, or whether they’re just really good at picking the right things to say to get you to support them. No, not even if you’ve considered the possibility that they’re lying and you still feel sure they’re not. Your object-level evaluations just can’t be trusted. (At least, if they’re competent at their thing. And if they’re not just evil, but also bad at it, so bad you can tell when they’re being honest, why would you support them?)
Voldemorts and their plans are often more incompetent than they seem,[1] and when their evil-but-”effective” plan predictably blows up, you and your cause are going to suffer reputational damage and end up in a worse position than your starting one. (You’re not gonna find an Altman, you’ll find an SBF.)
Voldemorts are naturally predisposed to misunderstanding the AI risk in precisely the ways that later make them engage in sketchy stuff around it. They’re very tempted to view ASI as a giant pile of power they can grab. (They hallucinate the Ring when they look into the Black Beast’s den, if I’m to mix my analogies.)
In general, if you’re considering giving power to a really effective but untrustworthy person because they seem credibly aligned with your cause, despite their general untrustworthiness (they also don’t want to die to ASI!), you are almost certainly just getting exploited. These sorts of people should be avoided like wildfire. (Even in cases where you think you can keep them in check, you’re going to have to spend so much effort paranoidally looking over everything they do in search of gotchas that it almost certainly wouldn’t be worth it.)
Probably because of that thing where if a good person dramatically abandons their morals for the greater good, they feel that it’s a monumental enough sacrifice for the universe to take notice and make it worth it.
A lot of Paranoia: A Beginner’s Guide is actually trying to set up a bunch of the prerequisites for making this kind of argument more strongly. In particular, a feature of people who act in untrustworthy ways, and surround themselves with unprincipled people, is that they end up sacrificing most of their sanity on the altar of paranoia.
Like, fiction HPMoR Voldemort happened to not have any adversaries who could disrupt his OODA loop, but that was purely a fiction. A world with two Voldemort-level competent players results in two people nuking their sanity as they try to get one over each other, and at that point, you can’t really rely on them having good takes, or sane stances on much of anything (or, if they are genuinely smart enough, them making an actually binding alliance, which via utilization of things like unbreakable vows is surprisingly doable in the HPMoR universe, but which in reality runs into many more issues).
Tone note: I really don’t like people responding to other people’s claims with content like “No. Bad… Bad naive consequentialism” (I’m totally fine with “Really not what I support. Strong disagree.”). It reads quite strongly to me as trying to scold someone or socially punish them using social status for a claim that you disagree with; they feel continuous with some kind of frame that’s like “habryka is the arbiter of the Good”
It sounds like scolding someone because it is! Like, IDK, sometimes that’s the thing you want to do?
I mean, I am not the “arbiter of the good”, but like, many things are distasteful and should be reacted to as such. I react similarly to people posting LLM slop on LW (usually more in the form of “wtf, come on man, please at least write a response yourself, don’t copy paste from an LLM”) and many other things I see as norm violations.
I definitely consider the thing I interpreted Michael to be saying a norm violation of LessWrong, and endorse lending my weight to norm enforcement of that (he then clarified in a way that I think largely diffused the situation, but I think I was pretty justified in my initial reaction). Not all spaces I participate in are places where I feel fine participating in norm enforcement, but of course LessWrong is one such place!
Now, I think there are fine arguments to be made that norm enforcement should also happen at the explicit intellectual level and shouldn’t involve more expressive forms of speech. IDK, I am a bit sympathetic to that, but feel reasonably good about my choices here, especially given that Michael’s comment started with “I agree”, therefore implying that the things he was saying were somehow reflective of my personal opinion. It seems eminently natural that when you approach someone and say “hey, I totally agree with you that <X>” where X is something they vehemently disagree with (like, IDK imagine someone coming to you and saying “hey, I totally agree with you that child pornography should be legal” when you absolutely do not believe this), that they respond the kind of way I did.
Overall, feedback is still appreciated, but I think I would still write roughly the same comment in a similar situation!
Michael’s comment started with a specific point he agreed with you on.
He specifically phrased the part you were objecting to as his opinion, not as a shared point of view.
I am pretty sure Michael thought he was largely agreeing with me. He wasn’t saying “I agree this thing is important, but here is this totally other thing that I actually think is more important”. He said (and meant to say) “I agree this thing is important, and here is a slightly different spin on it”. Feel free to ask him!
I claim you misread his original comment, as stated. Then you scolded him based on that misreading. I made the case you misread him via quotes, which you ignored, instead inviting me to ask him about his intentions. That’s your responsibility, not mine! I’d invite you to check in with him about his meaning yourself, and to consider doing that in the future before you scold.
I mean, I think his intention in communicating is the ground truth! I was suggesting his intentions as a way to operationalize the disagreement. Like, I am trying to check that you agree that if that was his intention, and I read it correctly, then you agree that you were wrong to say that I misread him. If that isn’t the case then we have a disagreement about the nature of communication on our hand, which I mean, we can go into, but doesn’t sound super exciting.
I do happen to be chatting with Michael sometime in the next few days, so I can ask. Happy to bet about what he says about what he intended to communicate! Like, I am not overwhelmingly confident, but you seem to present overwhelming confidence, so presumably you would be up for offering me a bet at good odds.
FWIW I think Habryka was right to call out that some parts of my comment were bad, and the scolding got me to think more carefully about it.
I would generally agree, but a mitigating factor here is that that MichaelDickens is presenting himself as agreeing with habryka. It seems more reasonable for habryka to strongly push back against statements that make claims about his own beliefs.
Yeah I pretty much agree with what you’re saying. But I think I misunderstood your comment before mine, and the thing you’re talking about was not captured by the model I wrote in my last comment; so I have some more thinking to do.
I didn’t mean “can be trusted to take AI risk seriously” as “indeterminate trustworthiness but cares about x-risk”, more like “the conjunction of trustworthy + cares about x-risk”.
Fair enough. This doesn’t seem central to my point so I don’t really want to go down a rabbit-hole here. As I said originally “I’m picking this example not because it’s the best analysis of its kind, but because it’s the sort of analysis I think people should be doing all the time and should be practiced at, and I think it’s very reasonable to produce things of this quality fairly regularly.” I know this particular analysis surfaced some useful considerations others’ hadn’t thought of, and I learned things from reading it.
I also suspect you dislike the original analysis for reasons that stem from deep-seated worldview disagreements with Eric, not because the methodology is flawed.
I think the methodology of elevating cost-effectiveness estimates that thereby (usually, at least at a community level) produce lots of naive consequentialist choices, is a large chunk of the deep-seated worldview disagreement!
I actually think I probably have it less with Eric than other people, but I think the disagreement here is at least not uncorrelated from the worldview divergence.
Agree! I am glad to have read it and wish more people produced things like it. It’s also not particularly high on my list of things to strongly incentivize, but it’s nice because it scales well, and lots of people doing more things like this seems like it just makes things a bit better.
My only sadness about it comes from the context in which it was produced. It seems eminently possible to me to have a culture of producing these kinds of estimates without failing to engage with the most important questions (or like, to include them in your estimates somehow), but I think it requires at least a bit of intentionality, and in the absence of that does seem like a bit of a trap.
Is there reason to think that Bores or Wiener are not trustworthy or lack integrity? Genuine question, asking because it could affect my donation choices. (I couldn’t tell from your post if there were, e.g., rumors floating around about them, or if you were just using this as an example of a key question that you thought was missed in Neyman’s analysis.)
I mean, I think there are substantial priors that trustworthiness or lack of integrity differ quite a lot between different politicians.
That said, I overall had reasonably positive impressions after talking to Bores in-person. I… did feel a bit worried he was a bit too naive consequentialist, but various other things he said made me overall think he is a good person to donate to. But I am glad I talked to him since I was pretty uncertain before I did.
For “Performing simple cost-effectiveness estimates accurately”, I would like to be better at this but I feel like I’m weak on some intermediate skills. I’d appreciate a post laying out more of the pieces.
(A thing I find hard is somewhat related to the thing habryka is saying, where the real crux is often a murky thing that’s particularly hard to operationalize. Although in the case of the Eric Neyman thing, I think I separately asked those questions, and found Eric’s BOTEC useful for the thing it was trying to do)
(1) Thanks for writing this!
(2)
Mind spelling out a few more items?
(3) Consider posting this as a top-level post.
When I was first trying to learn ML for AI safety research, people told me to learn linear algebra. And today lots of people I talk to who are trying to learn ML[1] seem under the impression they need to master linear algebra before they start fiddling with transformers. I find in practice I almost never use 90% of the linear algebra I’ve learned. I use other kinds of math much more, and overall being good at empiricism and implementation seems more valuable than knowing most math beyond the level of AP calculus.
The one part of linear algebra you do absolutely need is a really, really good intuition for what a dot product is, the fact that you can do them in batches, and the fact that matrix multiplication is associative. Someone smart who can’t so much as multiply matrices can learn the basics in an hour or two with a good tutor (I’ve taken people through it in that amount of time). The introductory linear algebra courses I’ve seen[2] wouldn’t drill this intuition nearly as well as the tutor even if you took them.
In my experience it’s not that useful to have good intuitions for things like eigenvectors/eigenvalues or determinants (unless you’re doing something like SLT). Understanding bases and change-of-basis is somewhat useful for improving your intuitions, and especially useful for some kinds of interp, I guess? Matrix decompositions are useful if you want to improve cuBLAS. Sparsity sometimes comes up, especially in interp (it’s also a very very simple concept).
The same goes for much of vector calculus. (You need to know you can take your derivatives in batches and that this means you write your d/dx as ∂/∂x or an upside-down triangle. You don’t need curl or divergence.)
I find it’s pretty easy to pick things like this up on the fly if you ever happen to need them.
Inasmuch as I do use math, I find I most often use basic statistics (so I can understand my empirical results!), basic probability theory (variance, expectations, estimators), having good intuitions for high-dimensional probability (which is the only part of math that seems underrated for ML), basic calculus (the chain rule), basic information theory (“what is KL-divergence?”), arithmetic, a bunch of random tidbits like “the log derivative trick”, and the ability to look at equations with lots of symbols and digest them.
In general most work and innovation[3] in machine learning these days (and in many domains of AI safety[4]) is not based in formal mathematical theory, it’s based on empiricism, fussing with lots of GPUs, and stacking small optimizations. As such, being good at math doesn’t seem that useful for doing most ML research. There are notable exceptions: some people do theory-based research. But outside these niches, being good at implementation and empiricism seems much more important; inasmuch as math gives you better intuitions in ML, I think reading more empirical papers or running more experiments or just talking to different models will give you far better intuitions per hour.
By “ML” I mean things involving modern foundation models, especially transformer-based LLMs.
It’s pretty plausible to me that I’ve only been exposed to particularly mediocre math courses. My sample-size is small, and it seems like course quality and content varies a lot.
Please don’t do capabilities mindlessly.
The standard counterargument here is these parts of AI safety are ignoring what’s actually hard about ML and that empiricism won’t work. For example we need to develop techniques that work on the first model we build that can self-improve. I don’t want to get into that debate.
This is a great list!
Here’s some stuff that isn’t in your list that I think comes up often enough that aspiring ML researchers should eventually know it (and most of this is indeed universally known). Everything in this comment is something that I’ve used multiple times in the last month.
Linear algebra tidbits
Vector-matrix-vector products
Probably einsums more generally
And the derivative of an einsum wrt any input
Matrix multiplication of matrices of shape [A,B] and [B,C] takes 2ABC flops.
This stuff comes up when doing basic math about the FLOPs of a neural net architecture.
Stuff that I use as concrete simple examples when thinking about ML
A deep understanding of linear regression, covariance, correlation. (This is useful because it is a simple analogy for fitting a probabilistic model, and it lets you remember a bunch of important facts.)
Basic facts about (multivariate) Gaussians; Bayesian updates on Gaussians
Variance reduction, importance sampling. Lots of ML algorithms, e.g. value baselining, are basically just variance reduction tricks. Maybe consider the difference between paired and unpaired t-tests as a simple example.
This is relevant for understanding ML algorithms, for doing basic statistics to understand empirical results, and for designing sample-efficient experiments and algorithms.
Errors go as 1/sqrt(n) so sample sizes need to grow 4x if you want your error bars to shrink 2x
AUROC is the probability that a sample from distribution A will be greater than a sample from distribution B, this is the obvious natural way of comparing distributions over a totally ordered set
Maximum likelihood estimation, MAP estimation, full Bayes
The Boltzmann distribution (aka softmax)
And some stuff I’m personally very glad to know:
The Price equation/the breeder’s equation—we’re constantly thinking about how neural net properties change as you train them, it is IMO helpful to have the quantitative form of natural selection in your head as an example
SGD is not parameterization invariant; natural gradients
Bayes nets
Your half-power-of-ten times tables
(barely counts) Conversions between different units of time (e.g. “there are 30M seconds in a year, there are 3k seconds in an hour, there are 1e5 seconds in a day”)
I think you wanted to say standard error of the mean.
I think I somewhat disagree here, I think that often even good empirics-focused researchers have background informal and not-so-respectable models informed by mathematical intuition. Source is probably some Dwarkesh Patel interview, but I’m not sure which.
this feels intuitively true to me, but I’m also very biased—I’ve basically shovelled all of my skill points into engineering and research intuition, and have only a passable understanding of math, and this generally has not been a huge bottleneck for me. but maybe if I knew more math i’d know what I’m missing out on
I think this is largely right point by point, except that I’d flag that if you are rarely using eigendecomposition (mostly at the whiteboard, less so in code), you are possibly bottlenecked by a poor grasp of eigenvectors and eigenvalues.
Also, a fancy linear algebra education will tell you exactly how matrix log and matrix exponent work, but all you need is that 99% of the time any number manipulation you can do with regular logs and exponents will work completely unmodified with square matrices and matrix logs and exponents, but if you don’t know about matrix logs at all this will be a glaring hole: I use these constantly in actual code. ( Actually 99% is definitely sampling bias- for example, given matrices A and B, log(AB) only equals log(A) + log(B) if A and B share eigenvalues, and them being numerically equal may require being tricky about which branch of the log to pick, and my pleading may fall on deaf ears that well of course, but you’d only think to try it if they share eigenvalues and you’re doing an operation later that kills branch differences so in practice when you try it it works)
LessWrong feature request: make it easy for authors to opt-out of having their posts in the training data.
If most smart people were put in the position of a misaligned AI and tried to take over the world, I think they’d be caught and fail.[1] If I were a misaligned AI, I think I’d have a much better shot at succeeding, largely because I’ve read lots of text about how people evaluate and monitor models, strategies schemers can use to undermine evals and take malicious actions without being detected, and creative paths to taking over the world as an AI.
A lot of that information is from LessWrong.[2] It’s unfortunate that this information will probably wind up in the pre-training corpus of new models (though sharing the information is often still worth it overall to share most of this information[3]).
LessWrong could easily change this for specific posts! They could add something to their robots.txt to ask crawlers looking to scrape training data to ignore the pages. They could add canary strings to the page invisibly. (They could even go a step further and add something like copyrighted song lyrics to the page invisibly.) If they really wanted, they could put the content of a post behind a captcha for users who aren’t logged in. This system wouldn’t be perfect (edit: please don’t rely on these methods. They’re harm-reduction for information where you otherwise would have posted without any protections), but I think even reducing the odds or the quantity of this data in the pre-training corpus could help.
I would love to have this as a feature at the bottom of drafts. I imagine a box I could tick in the editor that would enable this feature (and maybe let me decide if I want the captcha part or not). Ideally the LessWrong team could prompt an LLM to read users’ posts before they hit publish. If it seems like the post might be something the user wouldn’t want models trained on, the site could could proactively ask the user if they want to have their post be removed from the training corpus if it seems likely the user might want that.
As far as I know, no other social media platform has an easy way to try to avoid having their data up in the training corpus (and many actively sell it for this purpose). So LessWrong would be providing a valuable service.
The actual decisions around what should or shouldn’t be part of the pre-training corpus seem nuanced: if we want to use LLMs to help with AI safety, it might help if those LLMs have some information about AI safety in their pre-training corpus (though adding that information back in during post-training might work almost as well). But I want to at least give users the option to opt out of the current default.
That’s not to say all misaligned AIs would fail; I think there will be a period where AIs are roughly as smart as me and thus could at least bide their time and hide their misalignment without being caught if they’d read LessWrong and might fail to do so and get caught if they hadn’t. But you can imagine we’re purchasing dignity points or micro-dooms depending on your worldview. In either case I think this intervention is relatively cheap and worthwhile.
Of course much of it is reproduced outside LessWrong as well. But I think (1) so much of it is still on LessWrong and nowhere else that it’s worth it, and (2) the more times this information is reported in the pre-training dats the more likely the model is to memorize it or have the information be salient to it.
And the information for which the costs of sharing it aren’t worth it probably still shouldn’t be posted even if the proposal I outline here is implemented, since there’s still a good chance it might leak out.
I worry that canary strings and robots.txt are ~basically ignored by labs and that this could cause people to share things that on the margin they wouldn’t if there were no such option[1]. More reliable methods exist, but they come with a lot of overhead and I expect most users wouldn’t want to deal with it.
Especially since as the post says, canaries often don’t serve the purpose of detection either with publicly accessible models claiming ignorance of them.
Probably I should have included a footnote about this. I’m well aware that this is not a foolproof mechanism, but it still seems better than nothing and I think it’s very easy to have a disclaimer that makes this clear. As I said in the post, I think that people should only do this for information they would have posted on LessWrong anyway.
I disagree that these things are basically ignored by labs. My guess is many labs put some effort into filtering out data with the canary string, but that this is slightly harder than you might think and so they end up messing it up sometimes. (They might also sometimes ignore it on purpose, I’m not sure.)
Even if labs ignore the canary string now having the canary string in there would make it much easier to filter these things out if labs ever wanted to do that in the future.
I also suggest using better methods like captchas for non-logged-in users. I expect something like this to work somewhat well (though it still wouldn’t be foolproof).
https://www.fsf.org/blogs/sysadmin/our-small-team-vs-millions-of-bots
https://xeiaso.net/notes/2025/anubis-works/
I think having copyrighted content in between might work, but it depends on the the labs on how they’re processing it but it being really difficult to prevent AI scraping seems to be largely accurate.
I think the canary string is slightly more likely to work than the robots.txt. And the things you link don’t say which labs. I totally believe some labs scrape aggressively and ignore all robots.txt, but it wouldn’t surprise me if others don’t! In my opinion, if even one frontier lab doesn’t train on this data, that’s still quite valuable. And it seems so cheap to just try this as a bit of a Hail Mary.
I think success of anubis (as shown above) does tell us that a lot of them are not doing scraping in good faith. The fact that performing crypographic computation is needed to stop them is ridiculous. I am not sure whether I have a list but there’s even a case of :
https://github.com/TecharoHQ/anubis/issues/978
Where they started performing the computations to scrape the data.
https://blog.cloudflare.com/perplexity-is-using-stealth-undeclared-crawlers-to-evade-website-no-crawl-directives/
There has been multiple reports that openAI also ignores robots.txt but it’s much less agressive.
Also, the
X-Robots-Tagheader can be set on individual page responses (in contrast to robots.txt being for the whole domain).Sorry this is what I meant, you’re right.
Even if various pages are not used in pretraining, they can still be read by the models in deployment, right? When Claude or GPT 5 go out and finds up to date information on the internet, they’re not checking the robots.txt files?
Absolutely! But having models that don’t know how to effectively scheme without access to external resources is still very useful. As just one example, we could interrogate the models and give them honeypots without giving them address to the internet.
Labs could also ban some of their models from reading certain webpages or ingesting certain information (as in filter at inference time all inputs to the model) in deployment (especially for models performing especially high-stakes tasks where sabotage would be easier).
I also suspect digesting a LessWrong post the model reads in deployment is more likely to involve explicit reasoning about the post in the CoT (or probes etc) and thus to be caught. By contrast, if a model has seen lots of of transcripts of misalignment in pre-training, the misaligned actions might be intuitive and second-nature, and thus harder to detect.
TurboTax and H&R Block famously lobby the US government to make taxes more annoying to file to drum up demand for their products.[1] But as far as I can tell, they each only spend ~$3-4 million a year on lobbying. That’s… not very much money (contrast it with the $60 billion the government gave the IRS to modernize its systems or the $4.9 billion in revenue Intuit made last fiscal year from TurboTax or the hundreds of millions of hours[2] spent that a return-free tax filing system could save).
Perhaps it would “just” take a multimillionaire and a few savvy policy folks to make the US tax system wildly better? Maybe TurboTax and H&R Block would simply up their lobbying budget if they stopped getting their way, but maybe they wouldn’t. Even if they do, I think it’s not crazy to imagine a fairly modest lobbying effort could beat them, since simpler tax filing seems popular across party lines/is rather obviously a good idea, and therefore may have an easier time making its case. Plus I wonder if pouring more money into lobbying hits diminishing returns at some point such that even a small amount of funding against TurboTax could go a long way.
Nobody seems to be trying to fight this. The closest things are an internal department of the IRS and some sporadic actions from broad consumer protection groups that don’t particularly focus on this issue (for example ProPublica wrote an amazing piece of investigative journalism in 2019 that includes gems like the below Intuit slide:)
In the meantime, the IRS just killed its pilot direct file program. While the program was far from perfect, it seemed to me like the best bet out there for eventually bringing the US to a simple return-free filing system, like the UK, Japan, and Germany use. It seems like a tragedy that the IRS sunset this program.[3]
In general, the amount of money companies spend on lobbying is often very low, and the harm to society that lobbying causes seems large. If anyone has examples of times folks tried standing up to corporate lobbying like this that didn’t seem to involve much money, I’d love to know more about how that’s turned out.
I haven’t deeply investigated how true this narrative is. It seems clear TurboTax/Intuit lobbies actively with this goal in mind, but it seems possible that policymakers are ignoring them and that filing taxes is hard for some other reason. That would at least explain why TurboTax and H&R Block spend so little here.
I don’t trust most sources that quote numbers like this. This number comes from this Brookings article from 2006, which makes up numbers just like everyone else but at least these numbers are made up by a respectable institution that doesn’t have an obvious COI.
In general, I love when the government lets the private sector compete and make products! I want TurboTax to keep existing, but it’s telling that they literally made the government promise not to build a competitor. That seems like the opposite of open competition.
Joe Bankman, better known for other reasons, had this idea:
I had thought that Patrick McKenzie claims here, that lobbying by intuit is not the reason why US tax filing is so complicated, and actually it’s because of a republican advocacy group, that doesn’t want to simplify tax filing, because that would be a stealth tax hike.
But rereading the relevant section, I’m confused. It sounds like the relevant advocacy group is in favor of simplifying the tax system, and in particular, removing withholding?
Interesting! How did Norquist/Americans for Tax Reform get so much influence? They seem to spend even less money than Intuit on lobbying, but maybe I’m not looking at the right sources or they have influence via means other than money?
I’m also somewhat skeptical of the claims. The agreement between the the IRS and the Free File Alliance feels too favorable to the Free File Alliance for them to have had no hand in it.
As to your confusion, I can see why an advocacy group that wants to lower taxes might want the process of filing taxes to be painful. I’m just speculating, but I bet the fact that taxes are annoying to file and require you to directly confront the sizable sum you may owe the government makes people favor lower taxes and simpler tax codes.
This is what I remembered the piece as saying, but unless I’m misreading it now, that’s not actually in the text.
The world seems bottlenecked on people knowing and trusting each other. If you’re a trustworthy person who wants good things for the world, one of the best ways to demonstrate your trustworthiness is by interacting with people a lot, so that they can see how you behave in a variety of situations and they can establish how reasonable, smart, and capable you are. You can produce a lot of value for everyone involved by just interacting with people more.
I’m an introvert. My social skills aren’t amazing, and my social stamina is even less so. Yet I drag myself to parties and happy hours and one-on-one chats because they pay off.
It’s fairly common for me to go to a party and get someone to put hundreds of thousands of dollars towards causes I think are impactful, or to pivot their career, or to tell me a very useful, relevant piece of information I can act on. I think each of those things individually happens more than 15% of the time that I go to a party.
(Though this is only because I know of unusually good cause areas and career opportunities. I don’t think I could get people to put money or time towards random opportunities. This is a positive-sum interaction where I’m sharing information!)
Even if talking to someone isn’t valuable in the moment, knowing lots of people comes in really handy. Being able to directly communicate with lots of people in a high-bandwidth way lets you quickly orient to situations and get things done.
I try to go to every party I’m invited to that’s liable to have new people, and I very rarely turn down an opportunity to chat with a new person. I give my calendar link out like candy. Consider doing the same!
Talking to people is hits-based
Often, people go to an event and try to talk to people but it isn’t very useful, and they give up on the activity forever. Most of the time you go to an event it will not be that useful. But when it is useful, it’s extremely useful. With a little bit of skill, you can start to guess what kinds of conversations and events will be most useful (it is often not the ones that are most flashy and high-status).
Building up trust takes time
Often when I get good results from talking to people, it’s because I’ve already talked to them a few times at parties and I’ve established myself as a trustworthy person that they know.
Talking to people isn’t zero-sum
When I meet new people, I try to find ways I can be useful to them. (Knowing lots of people makes it easier to help other folks because often you can produce value by connecting people to each other.) And when I help the people I’m talking to, I’m also helping myself because I am on the same team as them. I want things that are good for the world, and so do most other people. I’m not sure the strategy is in this short form would work at all if I was trying to trick investors into overvaluing my startup or convincing people to work for me when that wasn’t in their best interest.
I think this is the main way that “talking to people”, as I’m using the term here, differs from “networking”.
Be genuine
When I talk to people, I try to be very blunt and earnest. I happen to like hanging out with people who are talented and capable, so I typically just try to find good conversations I enjoy. I build up friendships and genuine trust with people (by being a genuinely trustworthy person doing good things, not by trying to signal trust in complicated ways). I think I have good suggestions for things people should do with their money and time, and people are often very happy to hear these things.
Sometimes I do seek out specific people for specific reasons. If I’m only talking to someone because they have information/resources that are of interest to me, I try to directly (though tactfully) acknowledge that. Part of my vibe is that I’m weirdly goal-oriented/mission-driven, and I just wear that on my sleeve because I think the mission I drive towards is a good one.
I also try to talk to all kinds of folks and often purposefully avoid “high-status” people. In my experience, chasing them is usually a distraction anyway and the people in the interesting conversations are more worth talking to.
You can ask to be invited to more social events
When I encourage people to go to more social events, often they tell me that they’re not invited to more. In my experience, messaging the person you know who is most into going to social events and asking if they can invite you to stuff works pretty well most of the time. Once you’re attending a critical mass of social events, you’ll find yourself invited to more and more until your calendar explodes.
The other day I was speaking to one of the most productive people I’d ever met.[1] He was one of the top people in a very competitive field who was currently single-handedly performing the work of a team of brilliant programmers. He needed to find a spot to do some work, so I offered to help him find a desk with a monitor. But he said he generally liked working from his laptop on a couch, and he felt he was “only 10% slower” without a monitor anyway.
I was aghast. I’d been trying to optimize my productivity for years. A 10% productivity boost was a lot! Those things compound! How was this man, one of the most productive people I’d ever met, shrugging it off like it was nothing?
I think this nonchalant attitude towards productivity is fairly common in top researchers (though perhaps less so in top executives?). I have no idea why some people are so much more productive than others. It surprises me that so much variance is even possible.
This guy was smart, but I know plenty of people as smart as him who are far less productive. He was hardworking, but not insanely so. He wasn’t aggressively optimizing his productivity.[2] He wasn’t that old so it couldn’t just be experience. Probably part of it was luck, but he had enough different claims to fame that that couldn’t be the whole picture.
If I had to chalk it up to something, I guess I’d call it skill and “research taste”: he had a great ability to identify promising research directions and follow them (and he could just execute end-to-end on his ideas without getting lost or daunted, but I know how to train that).
I want to learn this skill, but I have no idea how to do it and I’m still not totally sure it’s real. Conducting research obviously helps, but that takes time and is clearly not sufficient. Maybe I should talk to a bunch of researchers and try to predict the results of their work?
Has anyone reading this ever successfully cultivated an uncanny ability to identify great research directions? How did you do it? What sub-skills does it require?
Am I missing some other secret sauce that lets some people produce wildly more valuable research than others?
Measured by more conventional means, not by positive impact on the long-term future; that’s dominated by other people. Making sure your work truly steers at solving the world’s biggest problems still seems like the best way to increase the value you produce, if you’re into that sort of thing. But I think this person’s abilities would multiply/complement any benefits from steering towards the most impactful problems.
Or maybe he was but there are so many 2x boosts the 10% ones aren’t worth worrying about?
Hmm, honestly, how do you know that he is one of the most productive people? Like, I have found these kinds of things surprisingly hard to evaluate, and a lot of successful research is luck, so maybe he just got lucky, but not like a “genetic lottery” kind of lucky, but more of a “happened to bet on the right research horse” kind of lucky in a way that I wouldn’t necessarily expect to generalize into the future.
I am partially saying this because I have personally observed a lot of the opposite. Somewhat reliably the most productive people I know have very strong opinions about how they work. And to be clear, most of them do actually not use external monitors a lot of the time (including me myself), so I don’t think this specific preference is that interesting, but they do tend to have strong opinions.
My other hypothesis is just that the conversation somehow caused them to not expose which aspects of their work habits they care a lot about, and the statement about “this merely makes me 10% slower”, was something they wouldn’t actually reflectively endorse. More likely they don’t think of their work as something that has that much of a local “efficiency” attribute to it, and so when they thought through the monitor question, they substituted the productivity question for one that’s more like “how many more to-do list items would I get through if I had a monitor”. If you forced them to consider a more holistic view of their productivity, my guess is some answer like “oh, but by working on a couch I am much more open to get up and talk to other people or start pacing around, and that actually makes up for the loss here”.
Ways training incentivizes and disincentivizes introspection in LLMs.
Recent work has shown some LLMs have some ability to introspect. Many people were surprised to learn LLMs had this capability at all. But I found the results somewhat surprising for another reason: models are trained to mimic text, both in pre-training and fine-tuning. Almost every time a model is prompted in training to generate text related to introspection, the answer it’s trained to give is whatever answer the LLMs in the training corpus would say, not what the model being trained actually observes from its own introspection. So I worry that even if models could introspect, they might learn to never introspect in response to prompting.
We do see models act consistently with this hypothesis sometimes: if you ask a model how many tokens it sees in a sentence or instruct it to write a sentence that has a specific number of tokens in it, it won’t answer correctly.[1] But the model probably “knows” how many tokens there are; it’s an extremely salient property of the input, and the space of possible tokens is a very useful thing for a model to know since it determines what it can output. At the very least models can be trained to at semi-accurately count tokens and conform their outputs to short token limits.
I presume the main reason models answer questions about themselves correctly at all is because AI developers very deliberately train them to do so. I bet that training doesn’t directly involve introspection/strongly noting the relationship between the model’s internal activations and the wider world.
So what could be going on? Maybe the way models learn to answer any questions about themselves generalizes? Or maybe introspection is specifically useful for answering those questions and instead of memorizing some facts about themselves, models learn to introspect (this could especially explain why they can articulate what they’ve been trained to do via self-awareness alone).
But I think the most likely dynamic is that in RL settings[2] introspection that affects the model’s output is sometimes useful. Thus it is reinforced. For example, if you ask a reasoning model a question that’s too hard for it to know the answer to, it could introspect to realize it doesn’t know the answer (which might be more efficient than simply memorizing every question it does or doesn’t know the answer to). Then it could articulate in the CoT that it doesn’t know the answer, which would help it avoid hallucinating and ultimately produce the best output it could given the constraints.
One other possibility is the models are just that smart/self-aware and aligned towards being honest and helpful. They might have an extremely nuanced world-model, and since they’re trained to honestly answer questions,[3] they could just put the pieces together and introspect (possibly in a hack-y or shallow way).
Overall these dynamics make introspection a very thorny thing to study. I worry it could go undetected in some models or it could seem like a model can introspect in a meaningful way when it only has shallow abilities reinforced directly by processes like the above (for example knowing when they don’t know something [because that might have been learned during training], but not knowing in general how to query their internal knowledge on topics in other related ways).
At least, not on any model I tried. They occasionally get it right by chance; they give plausible answers, just not precisely correct ones.
Technically this could apply to fine-tuning settings too, for example if the model uses a CoT to improve its final answers enough to justify the CoT not being maximally likely tokens.
In theory at least. In reality I think this training does occur but I don’t know how well it can pinpoint honesty vs several things that are correlated with it (and for things like self-awareness those subtle correlates with truth in training data seem particularly pernicious).
This doesn’t seem that clear to me; what part of training would incentivize the model to develop circuits for exact token-counting? Training a model to adhere to a particular token budget would do some of this, but it seems like it would have relatively light pressure on getting exact estimates right vs guessing things to the nearest few hundred tokens.
We know from humans that it’s very possible for general intelligences to be blind to pretty major low-level features of their experience; you don’t have introspective access to the fact that there’s a big hole in your visual field or the mottled patterns of blood vessels in front of your eye at all times or the ways your brain distorts your perception of time and retroactively adjusts your memories of the past half-second.
One way to test this would be to see if there are SAE features centrally about token counts; my guess would be that these show up in some early layers but are mostly absent in places where the model is doing more sophisticated semantic reasoning about things like introspection prompts. Ofc this might fail to capture the relevant sense of “knowing” etc, but I’d still take it as fairly strong evidence either way.
Ideas for how to spend very large amounts of money to improve AI safety:
If AI companies’ valuations continue to skyrocket (or if new very wealthy actors start to become worried about AI risk), there might be a large influx of funding into the AI safety space. Unfortunately, it’s not straightforward to magically turn money into valuable AI safety work. Many things in the AI safety ecosystem are more bottlenecked on having a good founder with the right talent and context, or having good researchers.
Here’s a random incomplete grab-bag of ideas for ways you could turn money into reductions in AI risk at large scales. I think right now there are much better donation opportunities available. This is not a list of donation recommendations right now, it’s just suggestions for once all the low-hanging funding fruit has been plucked. Probably if people thought more they could come up with even better scalable opportunities. There’s also probably existing great ideas I neglected to list. But these at least give us a baseline and a rough sense of what dumping a bunch of money into AI safety could look like. I’m also erring towards listing more things rather than fewer. Some of these things might actually be bad ideas.
Bounties to reward AIs for reporting misaligned behavior in themselves or other agents.
Folks have run a couple small experiments on this already. It seems straight-forward to execute and like it could absorb almost unbounded amounts of capital.
Paying high enough salaries to entice non-altruistically-motivated AI company employees to work on safety.
This isn’t only bottlenecked on funding. Many people are very loyal to the AI companies they work for, and the very best employees aren’t very sensitive to money since they already have plenty of money. It seems absurdly expensive for Meta to try hiring away people at other AI companies, and they didn’t seem to get that much top talent from it. On the one hand, working on safety is a much more compelling case than working at Meta, but on the other hand, maybe people who aren’t already doing safety research find AI capabilities research more intrinsically fun and interesting or rewarding than safety research. I am also concerned that people who do capabilities research might not be great at safety research because they might not feel as passionate or inspired by it, and because it is a somewhat different skillset.
In the most extremely optimistic world, you could probably hire 50 extremely talented people by offering them $100M/year each (matching what Meta offered). You could probably also hire ~200 more junior people at $10M/year (the bottleneck on hiring more would be management capacity). So in total you could spend $7B/year.
Over time, I expect this to get more expensive since AI companies’ valuations will increase, and therefore, so will employee compensation.
Compute for AI safety research.
Day-to-day, the AI safety researchers I know outside of AI labs don’t seem to think they’re very bottlenecked on compute. However, the AI safety researchers I know inside AI labs claim they get a lot of value from having gobs and gobs of compute everywhere. Probably, AI safety researchers outside labs are just not being imaginative enough about what they could do with tons of compute. This also isn’t entirely money-bottlenecked. Probably part of it is having the infrastructure in place and the deals with the compute providers, etc. And running experiments on lots of compute can be more fiddly and time-consuming. Even so I bet with a lot more money for compute, people would be able to do much better safety research.
Very roughly, I guess this could absorb ~$100 million a year.
Compute for running AI agents to automate AI safety research.
This doesn’t work today since AIs can’t automate AI safety research. But maybe in the future they will be able to, and you’ll be able to just dump money into this almost indefinitely.
Pay AI companies to do marginal cheap safety interventions.
Maybe you can just pay AI companies to implement safety interventions that are only very slightly costly for them. For example, you could subsidize having really good physical security in their data centers. I think a lot of things AI companies could do to improve safety will be costly enough for the companies that it will be very hard to pay them enough to make up for that cost, especially in worlds where AI companies’ valuations have increased a lot from where they are today. But there’s probably still some opportunities here.
Raising awareness of AI safety.
There’s lots of proven ways to spend money to raise awareness of things (sponsor youtube channels, patronize movies about AI risk, etc). Maybe raising awareness of safety is good because it gets more people to work on safety or gets the government to do more sensible things about AI risk or lets consumers encourage companies to implement more safety interventions.
I couldn’t easily find an American public awareness campaign that cost more than ~$80M/year (for anti-smoking). Coca Cola spends ~$4 billion a year on advertising, but I think that if AI safety were spending as much money as Coca-Cola, it would backfire. I think maybe $500M/year is a reasonable cap on what could be spent?
Biodefense. Buy everyone in the US PPE.
One way that an AI could cause a catastrophe is via designing a bioweapon. One way to reduce the odds that a bioweapon causes a civilization-ending catastrophe is to make sure that everyone has enough PPE that they won’t die. Andrew Snyder-Beattie has elaborated on this idea here. I think this could absorb ~$3B ($3/mask * 350M Americans * 3 masks/person).
Buy foreign AI safety researchers gold cards.
Many great AI safety researchers are on visas. It would be convenient if they had green cards. You can buy green cards now for $1M each. Let’s say there’s a hundred such people, so this opportunity could absorb $100M.
Overall, these are not amazing opportunities. But they give a lower bound and illustrate how it’s possible to turn money into reduced risk from AI at scale, even if you don’t have more entrepreneurs building new organizations. In practice, I think if money slowly ramps up into the space over time, there will be much better opportunities than these, and you will simply see AI safety organizations that have grown to be major research institutions that are producing wonderful research. This is just a floor.
A lot of these ideas came from other people and have generally been floating around for a while. Thanks to everybody I talk to about this.
I don’t know that the idea is fundamentally good but at least is scales somewhat with the equity of the safety-sympathetic people at labs?