GradientDissenter

Karma: 1,237

GradientDissenter 16 Jan 2026 20:13 UTC
22 points
0
on: GradientDissenter’s Shortform
It’s useful for evals to be run reliably for every model and maintained for long periods. A lot of the point of safety-relevant evals is to be a building block people can use for other things: they can make forecasts/bets about what models will score on the eval or what will happen if a certain score is reached, they can make commitments about what to do if a model achieves a certain score, they can make legislation that applies only to models with specific scores, and they can advise the world to look to these scores to understand if risk is high.
Much of that falls apart if there’s FUD about whether a given eval will still exist and be run on the relevant models in a year’s time.
This didn’t used to be an issue because evals used to be simple to run; they were just a simple script asking a model a series of multiple-choice questions.
Agentic evals are complex. They require GPUs and containers and scripts that need to be maintained. You need to scaffold your agent and run it for days. Sometimes you need to build a vending machine.
I’m worried about a pattern where a shiny new eval is developed, run for a few months, then discarded in favor of newer, better evals. Or where the folks running the evals don’t get around to running them reliably for every model.
As a concrete example, the 2025 AI Forecasting Survey asked people to forecast what the best model’s score on RE-Bench would be by the end of 2025, but RE-Bench hasn’t been run on Claude Opus 4.5, or on many other recent models (METR focuses on their newer, larger time-horizon eval instead). It also asked for forecasted scores on OS-World, but OS-World isn’t run anymore (it’s been replaced by OSWorld-Verified).
There are real costs to running these evals, and when they’re deprecated, it’s usually because they’re replaced with something better. But I think sometimes people act like this is a completely costless action and I want to point out the costs.

GradientDissenter 14 Jan 2026 3:39 UTC
3 points
0
in reply to: anaguma’s comment on: anaguma’s Shortform
Some of that error is correlated between models; they also have versions of the graph with error bars on the trendline and those error bars are notably smaller.
The error bars are also much smaller when you look at the plot on a log-y-axis. Like, in some sense not being able to distinguish a 10-minute time horizon from a 30-minute one is a lot of error, but it’s still very distinct from the one-minute time horizon of the previous generation or the 2-hour time horizon you might expect from the next generation. In other words, when you look at the image you shared, the error bars on o4 mini don’t look so bad, but if you were only looking at models up to o4 mini you’d have zoomed in a bunch and the error bars on o4 mini would be large too.
Also note that to cut the size of the error bars in half you’d need to make ~4x as many tasks, to cut it by 4x you’d need ~16x as many tasks. And you’d need to be very confident the tasks weren’t buggy, so just throwing money at the wall and hiring lots of people won’t work because you’ll just get a bunch of tasks you won’t have confidence in.
Keep in mind the opportunity cost is real though, and the main blocker on orgs like METR usually is more like talent/capacity than money. It would be great if they had capacity for this and you’re right that it is insane that humanity doesn’t have better benchmarks. But there’s a dozen other fires at least that large that METR seems to be trying to address, like RCTs to see if AI is actually speeding people up and risk report reviews to see if AIs are actually safe. Perhaps you think these are less important, but if so I would like to hear that argument.
All that said, my understanding is METR is working on this. I would also love to see this type of work from others!

GradientDissenter 13 Jan 2026 2:23 UTC
4 points
0
in reply to: Dagon’s comment on: GradientDissenter’s Shortform
I think that there are still very real trade-offs. Examples:
- Should you wear sunscreen?
- Should you smoke?
- Should you decrease sodium intake so that you don’t develop hypertension?
And for many things wealth there is some short-term cost and some long-term longevity cost the long-term cost might be large enough to change the calculus.

GradientDissenter 12 Jan 2026 16:41 UTC
11 points
0
on: GradientDissenter’s Shortform
Many people who think ASI will be developed soon seem to assume this means they should care less about their long-term health because in most worlds it won’t matter: they figure most likely by the time they get old they’ll either be dead or humanity will have cured aging and disease. I think it’s important to remember that the bigger update is probably on the size of the value at stake, not the probability of health interventions mattering.
Even if ASI seems like it will happen soon, I think there’s a real (if small) chance that humanity develops radical life-extension technology but not for another 50-100 years: maybe there’s an AI winter, maybe medical research ends up being inherently slow (either for legal reasons or because it requires trials in humans, and those require the humans to actually age over time), maybe humanity decides to pause and not build ASI, maybe humanity decides to have a long reflection before building any crazy technology that cures death, etc.
The upside of hanging on to life until radical life-extension technology is developed seems extremely high: there are ten trillion years or so before the stars start to burn out (and you could probably live after the stars burn out, plus, you could run a simulation of yourself that lets you live a subjectively longer time). Even if you think there are steep diminishing returns to how long you live, getting to live into the depths of the far future probably gives you more control over how the future looks. If resources are divided equally amongst currently-living humans, you should eventually expect to get your own galaxy or two, but you’d need to live long enough for that space exploration and apportionment to be sorted out.^[1] Even if that assumption is too rosy, every galaxy has a trillion or so planets. Probably someone will throw you one out of charity, and those odds go up if you live a long time.
The upside is so large that even though you might think the probability of this outcome is slim compared to worlds where humanity either goes extinct or quickly cures aging and disease, it still seems overwhelmingly important to shoot for.^[2]
If you buy this worldview, you probably want to focus on preserving your mind: avoiding risk factors for dementia/strokes, avoiding concussions/head trauma, avoiding literally dying.
1. ^
  Maybe you could try to wield similar influence through a will, but this means (1) you don’t get to experience the benefits firsthand, (2) the will might not specify what you want well enough, (3) you might not get as many resources; people don’t tend to pay as much heed to the wishes of dead people.
2. ^
  Unless it’s trading off with other goals on a similar scale, such as if you are an altruist trying to make the future better.

End-of year donation taxes 101

GradientDissenter30 Dec 2025 2:16 UTC

35 points

1 comment3 min readLW link

GradientDissenter 18 Nov 2025 7:59 UTC
57 points
1
on: GradientDissenter’s Shortform
When I was first trying to learn ML for AI safety research, people told me to learn linear algebra. And today lots of people I talk to who are trying to learn ML^[1] seem under the impression they need to master linear algebra before they start fiddling with transformers. I find in practice I almost never use 90% of the linear algebra I’ve learned. I use other kinds of math much more, and overall being good at empiricism and implementation seems more valuable than knowing most math beyond the level of AP calculus.
The one part of linear algebra you do absolutely need is a really, really good intuition for what a dot product is, the fact that you can do them in batches, and the fact that matrix multiplication is associative. Someone smart who can’t so much as multiply matrices can learn the basics in an hour or two with a good tutor (I’ve taken people through it in that amount of time). The introductory linear algebra courses I’ve seen^[2] wouldn’t drill this intuition nearly as well as the tutor even if you took them.
In my experience it’s not that useful to have good intuitions for things like eigenvectors/eigenvalues or determinants (unless you’re doing something like SLT). Understanding bases and change-of-basis is somewhat useful for improving your intuitions, and especially useful for some kinds of interp, I guess? Matrix decompositions are useful if you want to improve cuBLAS. Sparsity sometimes comes up, especially in interp (it’s also a very very simple concept).
The same goes for much of vector calculus. (You need to know you can take your derivatives in batches and that this means you write your d/dx as ∂/∂x or an upside-down triangle. You don’t need curl or divergence.)
I find it’s pretty easy to pick things like this up on the fly if you ever happen to need them.
Inasmuch as I do use math, I find I most often use basic statistics (so I can understand my empirical results!), basic probability theory (variance, expectations, estimators), having good intuitions for high-dimensional probability (which is the only part of math that seems underrated for ML), basic calculus (the chain rule), basic information theory (“what is KL-divergence?”), arithmetic, a bunch of random tidbits like “the log derivative trick”, and the ability to look at equations with lots of symbols and digest them.
In general most work and innovation^[3] in machine learning these days (and in many domains of AI safety^[4]) is not based in formal mathematical theory, it’s based on empiricism, fussing with lots of GPUs, and stacking small optimizations. As such, being good at math doesn’t seem that useful for doing most ML research. There are notable exceptions: some people do theory-based research. But outside these niches, being good at implementation and empiricism seems much more important; inasmuch as math gives you better intuitions in ML, I think reading more empirical papers or running more experiments or just talking to different models will give you far better intuitions per hour.
1. ^
  By “ML” I mean things involving modern foundation models, especially transformer-based LLMs.
2. ^
  It’s pretty plausible to me that I’ve only been exposed to particularly mediocre math courses. My sample-size is small, and it seems like course quality and content varies a lot.
3. ^
  Please don’t do capabilities mindlessly.
4. ^
  The standard counterargument here is these parts of AI safety are ignoring what’s actually hard about ML and that empiricism won’t work. For example we need to develop techniques that work on the first model we build that can self-improve. I don’t want to get into that debate.

GradientDissenter 15 Nov 2025 3:43 UTC
13 points
−1
on: GradientDissenter’s Shortform
The other day I was speaking to one of the most productive people I’d ever met.^[1] He was one of the top people in a very competitive field who was currently single-handedly performing the work of a team of brilliant programmers. He needed to find a spot to do some work, so I offered to help him find a desk with a monitor. But he said he generally liked working from his laptop on a couch, and he felt he was “only 10% slower” without a monitor anyway.
I was aghast. I’d been trying to optimize my productivity for years. A 10% productivity boost was a lot! Those things compound! How was this man, one of the most productive people I’d ever met, shrugging it off like it was nothing?
I think this nonchalant attitude towards productivity is fairly common in top researchers (though perhaps less so in top executives?). I have no idea why some people are so much more productive than others. It surprises me that so much variance is even possible.
This guy was smart, but I know plenty of people as smart as him who are far less productive. He was hardworking, but not insanely so. He wasn’t aggressively optimizing his productivity.^[2] He wasn’t that old so it couldn’t just be experience. Probably part of it was luck, but he had enough different claims to fame that that couldn’t be the whole picture.
If I had to chalk it up to something, I guess I’d call it skill and “research taste”: he had a great ability to identify promising research directions and follow them (and he could just execute end-to-end on his ideas without getting lost or daunted, but I know how to train that).
I want to learn this skill, but I have no idea how to do it and I’m still not totally sure it’s real. Conducting research obviously helps, but that takes time and is clearly not sufficient. Maybe I should talk to a bunch of researchers and try to predict the results of their work?
Has anyone reading this ever successfully cultivated an uncanny ability to identify great research directions? How did you do it? What sub-skills does it require?
Am I missing some other secret sauce that lets some people produce wildly more valuable research than others?
1. ^
  Measured by more conventional means, not by positive impact on the long-term future; that’s dominated by other people. Making sure your work truly steers at solving the world’s biggest problems still seems like the best way to increase the value you produce, if you’re into that sort of thing. But I think this person’s abilities would multiply/complement any benefits from steering towards the most impactful problems.
2. ^
  Or maybe he was but there are so many 2x boosts the 10% ones aren’t worth worrying about?

GradientDissenter 14 Nov 2025 20:25 UTC
3 points
0
in reply to: habryka’s comment on: GradientDissenter’s Shortform
Fair enough. This doesn’t seem central to my point so I don’t really want to go down a rabbit-hole here. As I said originally “I’m picking this example not because it’s the best analysis of its kind, but because it’s the sort of analysis I think people should be doing all the time and should be practiced at, and I think it’s very reasonable to produce things of this quality fairly regularly.” I know this particular analysis surfaced some useful considerations others’ hadn’t thought of, and I learned things from reading it.
I also suspect you dislike the original analysis for reasons that stem from deep-seated worldview disagreements with Eric, not because the methodology is flawed.

GradientDissenter 14 Nov 2025 7:51 UTC
73 points
9
on: GradientDissenter’s Shortform
The advice and techniques from the rationality community seem to work well at avoiding a specific type of high-level mistake: they help you notice weird ideas that might otherwise get dismissed and take them seriously. Things like AI being on a trajectory to automate all intellectual labor and perhaps take over the world, animal suffering, longevity, cryonics. The list goes on.
This is a very valuable skill and causes people to do things like pivot their careers to areas that are ten times better. But once you’ve had your ~3-5 revelations, I think the value of these techniques can diminish a lot.^[1]
Yet a lot of the rationality community’s techniques and culture seem oriented around this one idea, even on small scales: people pride themselves on being relentlessly truth-seeking and willing to consider possibilities they flinch away from.
On the margin, I think the rationality community should put more empasis on skills like:
Performing simple cost-effectiveness estimates accurately
I think very few people in the community could put together an analysis like this one from Eric Neyman on the value of a particular donation opportunity (see the section “Comparison to non-AI safety opportunities”). I’m picking this example not because it’s the best analysis of its kind, but because it’s the sort of analysis I think people should be doing all the time and should be practiced at, and I think it’s very reasonable to produce things of this quality fairly regularly.
When people do practice this kind of analysis, I notice they focus on Fermi estimates where they get good at making extremely simple models and memorizing various numbers. (My friend’s Anki deck includes things like the density of typical continental crust, the dimensions of a city block next to his office, the glide ratio of a hang glider, the amount of time since the last glacial maximum, and the fraction of babies in the US that are twins).
I think being able to produce specific models over the course of a few hours (where you can look up the glide ratio of a hang glider if you need it) is more neglected but very useful (when it really counts, you can toss the back of the napkin and use a whiteboard).
Simply noticing something might be a big deal is only the first step! You need to decide if it’s worth taking action (how big a deal is it exactly?) and what action to take (what are the costs and benefits of each option?). Sometimes it’s obvious, but often it isn’t, and these analyses are the best way I know of to improve at this, other than “have good judgement magically” or “gain life experience”.
Articulating all the assumptions underlying an argument
A lot of the reasoning I see on LessWrong feels “hand-wavy”: it makes many assumptions that it doesn’t spell out. That kind of reasoning can be valuable: often good arguments start as hazy intuitions. Plus many good ideas are never written up at all and I don’t want to make the standards impenetrably high. But I wish people recognized this shortcoming and tried to remedy it more often.
By “articulating assumptions” I mean outlining the core dynamics at play that seem important, the ways you think these dynamics work, and the many other complexities you’re ignoring in your simple model. I don’t mean trying to compress a bunch of Bayesian beliefs into propositional logic.
Contact with reality
It’s really really powerful to look at things directly (read data, talk to users, etc), design and run experiments, and do things in the world to gain experience.
Everyone already knows this, empiricism is literally a virtue of rationality. But I don’t see people employing it as much as they should be. If you’re worried about AI risk, talk to the models! Read raw transcripts!
Scholarship
Another virtue of rationality. It’s in the sequences, just not as present in the culture as you might expect. Almost nobody I know reads enough. I started a journal club at my company and after nearly every meeting folks tells me how useful it is. I so often see so much work that would be much better if the authors engaged with the literature a little more. Of course YMMV depending on the field you’re in; some literature isn’t worth engaging with.
Being overall skilled and knowledgeable and able to execute on things in the real world
Maybe this doesn’t count as a rationality skill per-se, but I think the meta skill of sitting down and learning stuff and getting good at it is important. In practice the average person reading this short form would probably be more effective if they spent their energy developing whatever specific concrete skills and knowledge were most blocking them.
This list is far from complete.^[2] I just wanted to gesture at the general dynamic.
1. ^
  They’re still useful. I could rattle off a half-dozen times this mindset let me notice something the people around me were missing and spring into action.
2. ^
  I especially think there’s some skill that separates people with great research taste from people with poor research taste that might be crucial, but I don’t really know what it is well enough to capture it here.

GradientDissenter 14 Nov 2025 4:21 UTC
2 points
0
in reply to: Zack_M_Davis’s comment on: GradientDissenter’s Shortform
Sorry this is what I meant, you’re right.

GradientDissenter 13 Nov 2025 2:24 UTC
4 points
0
in reply to: lesswronguser123’s comment on: GradientDissenter’s Shortform
I think the canary string is slightly more likely to work than the robots.txt. And the things you link don’t say which labs. I totally believe some labs scrape aggressively and ignore all robots.txt, but it wouldn’t surprise me if others don’t! In my opinion, if even one frontier lab doesn’t train on this data, that’s still quite valuable. And it seems so cheap to just try this as a bit of a Hail Mary.

GradientDissenter 12 Nov 2025 20:14 UTC
3 points
1
in reply to: Eli Tyre’s comment on: GradientDissenter’s Shortform
Absolutely! But having models that don’t know how to effectively scheme without access to external resources is still very useful. As just one example, we could interrogate the models and give them honeypots without giving them address to the internet.
Labs could also ban some of their models from reading certain webpages or ingesting certain information (as in filter at inference time all inputs to the model) in deployment (especially for models performing especially high-stakes tasks where sabotage would be easier).
I also suspect digesting a LessWrong post the model reads in deployment is more likely to involve explicit reasoning about the post in the CoT (or probes etc) and thus to be caught. By contrast, if a model has seen lots of of transcripts of misalignment in pre-training, the misaligned actions might be intuitive and second-nature, and thus harder to detect.

GradientDissenter 12 Nov 2025 15:41 UTC
7 points
6
in reply to: Jozdien’s comment on: GradientDissenter’s Shortform
Probably I should have included a footnote about this. I’m well aware that this is not a foolproof mechanism, but it still seems better than nothing and I think it’s very easy to have a disclaimer that makes this clear. As I said in the post, I think that people should only do this for information they would have posted on LessWrong anyway.
I disagree that these things are basically ignored by labs. My guess is many labs put some effort into filtering out data with the canary string, but that this is slightly harder than you might think and so they end up messing it up sometimes. (They might also sometimes ignore it on purpose, I’m not sure.)
Even if labs ignore the canary string now having the canary string in there would make it much easier to filter these things out if labs ever wanted to do that in the future.
I also suggest using better methods like captchas for non-logged-in users. I expect something like this to work somewhat well (though it still wouldn’t be foolproof).

GradientDissenter 12 Nov 2025 7:47 UTC
30 points
−8
on: GradientDissenter’s Shortform
LessWrong feature request: make it easy for authors to opt-out of having their posts in the training data.
If most smart people were put in the position of a misaligned AI and tried to take over the world, I think they’d be caught and fail.^[1] If I were a misaligned AI, I think I’d have a much better shot at succeeding, largely because I’ve read lots of text about how people evaluate and monitor models, strategies schemers can use to undermine evals and take malicious actions without being detected, and creative paths to taking over the world as an AI.
A lot of that information is from LessWrong.^[2] It’s unfortunate that this information will probably wind up in the pre-training corpus of new models (though sharing the information is often still worth it overall to share most of this information^[3]).
LessWrong could easily change this for specific posts! They could add something to their robots.txt to ask crawlers looking to scrape training data to ignore the pages. They could add canary strings to the page invisibly. (They could even go a step further and add something like copyrighted song lyrics to the page invisibly.) If they really wanted, they could put the content of a post behind a captcha for users who aren’t logged in. This system wouldn’t be perfect (edit: please don’t rely on these methods. They’re harm-reduction for information where you otherwise would have posted without any protections), but I think even reducing the odds or the quantity of this data in the pre-training corpus could help.
I would love to have this as a feature at the bottom of drafts. I imagine a box I could tick in the editor that would enable this feature (and maybe let me decide if I want the captcha part or not). Ideally the LessWrong team could prompt an LLM to read users’ posts before they hit publish. If it seems like the post might be something the user wouldn’t want models trained on, the site could could proactively ask the user if they want to have their post be removed from the training corpus if it seems likely the user might want that.
As far as I know, no other social media platform has an easy way to try to avoid having their data up in the training corpus (and many actively sell it for this purpose). So LessWrong would be providing a valuable service.
The actual decisions around what should or shouldn’t be part of the pre-training corpus seem nuanced: if we want to use LLMs to help with AI safety, it might help if those LLMs have some information about AI safety in their pre-training corpus (though adding that information back in during post-training might work almost as well). But I want to at least give users the option to opt out of the current default.
1. ^
  That’s not to say all misaligned AIs would fail; I think there will be a period where AIs are roughly as smart as me and thus could at least bide their time and hide their misalignment without being caught if they’d read LessWrong and might fail to do so and get caught if they hadn’t. But you can imagine we’re purchasing dignity points or micro-dooms depending on your worldview. In either case I think this intervention is relatively cheap and worthwhile.
2. ^
  Of course much of it is reproduced outside LessWrong as well. But I think (1) so much of it is still on LessWrong and nowhere else that it’s worth it, and (2) the more times this information is reported in the pre-training dats the more likely the model is to memorize it or have the information be salient to it.
3. ^
  And the information for which the costs of sharing it aren’t worth it probably still shouldn’t be posted even if the proposal I outline here is implemented, since there’s still a good chance it might leak out.

GradientDissenter 11 Nov 2025 5:47 UTC
3 points
0
in reply to: Eli Tyre’s comment on: GradientDissenter’s Shortform
Interesting! How did Norquist/Americans for Tax Reform get so much influence? They seem to spend even less money than Intuit on lobbying, but maybe I’m not looking at the right sources or they have influence via means other than money?
I’m also somewhat skeptical of the claims. The agreement between the the IRS and the Free File Alliance feels too favorable to the Free File Alliance for them to have had no hand in it.
As to your confusion, I can see why an advocacy group that wants to lower taxes might want the process of filing taxes to be painful. I’m just speculating, but I bet the fact that taxes are annoying to file and require you to directly confront the sizable sum you may owe the government makes people favor lower taxes and simpler tax codes.

GradientDissenter 10 Nov 2025 10:56 UTC
12 points
0
on: GradientDissenter’s Shortform
Ways training incentivizes and disincentivizes introspection in LLMs.
Recent work has shown some LLMs have some ability to introspect. Many people were surprised to learn LLMs had this capability at all. But I found the results somewhat surprising for another reason: models are trained to mimic text, both in pre-training and fine-tuning. Almost every time a model is prompted in training to generate text related to introspection, the answer it’s trained to give is whatever answer the LLMs in the training corpus would say, not what the model being trained actually observes from its own introspection. So I worry that even if models could introspect, they might learn to never introspect in response to prompting.
We do see models act consistently with this hypothesis sometimes: if you ask a model how many tokens it sees in a sentence or instruct it to write a sentence that has a specific number of tokens in it, it won’t answer correctly.^[1] But the model probably “knows” how many tokens there are; it’s an extremely salient property of the input, and the space of possible tokens is a very useful thing for a model to know since it determines what it can output. At the very least models can be trained to at semi-accurately count tokens and conform their outputs to short token limits.
I presume the main reason models answer questions about themselves correctly at all is because AI developers very deliberately train them to do so. I bet that training doesn’t directly involve introspection/strongly noting the relationship between the model’s internal activations and the wider world.
So what could be going on? Maybe the way models learn to answer any questions about themselves generalizes? Or maybe introspection is specifically useful for answering those questions and instead of memorizing some facts about themselves, models learn to introspect (this could especially explain why they can articulate what they’ve been trained to do via self-awareness alone).
But I think the most likely dynamic is that in RL settings^[2] introspection that affects the model’s output is sometimes useful. Thus it is reinforced. For example, if you ask a reasoning model a question that’s too hard for it to know the answer to, it could introspect to realize it doesn’t know the answer (which might be more efficient than simply memorizing every question it does or doesn’t know the answer to). Then it could articulate in the CoT that it doesn’t know the answer, which would help it avoid hallucinating and ultimately produce the best output it could given the constraints.
One other possibility is the models are just that smart/self-aware and aligned towards being honest and helpful. They might have an extremely nuanced world-model, and since they’re trained to honestly answer questions,^[3] they could just put the pieces together and introspect (possibly in a hack-y or shallow way).
Overall these dynamics make introspection a very thorny thing to study. I worry it could go undetected in some models or it could seem like a model can introspect in a meaningful way when it only has shallow abilities reinforced directly by processes like the above (for example knowing when they don’t know something [because that might have been learned during training], but not knowing in general how to query their internal knowledge on topics in other related ways).
1. ^
  At least, not on any model I tried. They occasionally get it right by chance; they give plausible answers, just not precisely correct ones.
2. ^
  Technically this could apply to fine-tuning settings too, for example if the model uses a CoT to improve its final answers enough to justify the CoT not being maximally likely tokens.
3. ^
  In theory at least. In reality I think this training does occur but I don’t know how well it can pinpoint honesty vs several things that are correlated with it (and for things like self-awareness those subtle correlates with truth in training data seem particularly pernicious).

GradientDissenter 9 Nov 2025 8:24 UTC
28 points
0
on: GradientDissenter’s Shortform
TurboTax and H&R Block famously lobby the US government to make taxes more annoying to file to drum up demand for their products.^[1] But as far as I can tell, they each only spend ~$3-4 million a year on lobbying. That’s… not very much money (contrast it with the $60 billion the government gave the IRS to modernize its systems or the $4.9 billion in revenue Intuit made last fiscal year from TurboTax or the hundreds of millions of hours^[2] spent that a return-free tax filing system could save).
Perhaps it would “just” take a multimillionaire and a few savvy policy folks to make the US tax system wildly better? Maybe TurboTax and H&R Block would simply up their lobbying budget if they stopped getting their way, but maybe they wouldn’t. Even if they do, I think it’s not crazy to imagine a fairly modest lobbying effort could beat them, since simpler tax filing seems popular across party lines/is rather obviously a good idea, and therefore may have an easier time making its case. Plus I wonder if pouring more money into lobbying hits diminishing returns at some point such that even a small amount of funding against TurboTax could go a long way.
Nobody seems to be trying to fight this. The closest things are an internal department of the IRS and some sporadic actions from broad consumer protection groups that don’t particularly focus on this issue (for example ProPublica wrote an amazing piece of investigative journalism in 2019 that includes gems like the below Intuit slide:)
In the meantime, the IRS just killed its pilot direct file program. While the program was far from perfect, it seemed to me like the best bet out there for eventually bringing the US to a simple return-free filing system, like the UK, Japan, and Germany use. It seems like a tragedy that the IRS sunset this program.^[3]
In general, the amount of money companies spend on lobbying is often very low, and the harm to society that lobbying causes seems large. If anyone has examples of times folks tried standing up to corporate lobbying like this that didn’t seem to involve much money, I’d love to know more about how that’s turned out.
1. ^
  I haven’t deeply investigated how true this narrative is. It seems clear TurboTax/Intuit lobbies actively with this goal in mind, but it seems possible that policymakers are ignoring them and that filing taxes is hard for some other reason. That would at least explain why TurboTax and H&R Block spend so little here.
2. ^
  I don’t trust most sources that quote numbers like this. This number comes from this Brookings article from 2006, which makes up numbers just like everyone else but at least these numbers are made up by a respectable institution that doesn’t have an obvious COI.
3. ^
  In general, I love when the government lets the private sector compete and make products! I want TurboTax to keep existing, but it’s telling that they literally made the government promise not to build a competitor. That seems like the opposite of open competition.

GradientDissenter 8 Nov 2025 7:23 UTC
86 points
36
on: GradientDissenter’s Shortform
There’s a cottage industry that thrives off of sneering, gawking, and maligning the AI safety community. This isn’t new, but it’s probably going to become more intense and pointed now that there are two giant super PACs that (allegedly^[1]) see safety as a barrier to [innovation/profit, depending on your level of cynicism]. Brace for some nasty, uncharitable articles.
I think the largest cost of this targeted bad press will be the community’s overreaction, not the reputational effects outside the AI safety community. I’ve already seen people shy away from doing things like donating to politicians that support AI safety for fear of provoking the super PACs.
Historically, the safety community often freaked out in the face of this kind of bad press. People got really stressed out, pointed fingers about whose fault it was, and started to let the strong frames in the hit pieces get into their heads.^[2] People disavowed AI safety and turned to more popular causes. And the collective consciousness decided that the actions and people who ushered in the mockery were obviously terrible and dumb, so much so that you’d get a strange look if you asked them to justify that argument. In reality I think many actions that were publicly ridiculed were still worth it ex-ante despite the bad press.
It seems bad press is often much, much more salient to the subjects of that press than it is to society at large, and it’s best to shrug it off and let it blow over. Some of the most PR-conscious people I know are weirdly calm during actual PR blowups and are sometimes more willing than the “weird” folks around me to take dramatic (but calculated) PR risks.
In the activist world, I hear this is a well-known phenomenon. You can get 10 people to protest a multi-billion-dollar company and a couple journalists to write articles, and the company will bend to your demands.^[3] The rest of the world will have no idea who you are, but to the executives at the company, it will feel the world is watching them. These executives are probably making a mistake!^[4] Don’t be like them.
With all these (allegedly anti-safety^[1]) super PACs, there will probably be a lot more bad press than usual. All else being equal, avoiding the bad press is good, but in order to fight back, people in the safety community will probably take some actions, and the super PACs will probably twist any actions into headlines about cringe doomer tech bros.
I do think people should take into account when deciding what to do that provoking the super PACs is risky, and should think carefully before doing it. But often I expect it will be the right choice and the blowback will be well worth it.
If people in the safety community refuse to stand up to them, then they super PACs will get what they want anyway and the safety community won’t even put up a fight.
Ultimately I think the AI safety community is an earnest, scrupulous group of people fighting for an extremely important cause. I hope we continue to hold ourselves to high standards for integrity and honor, and as long as we do, I will be proud to be part of this community no matter what the super PACs say.
1. ^
  They haven’t taken any anti-safety actions yet as far as I know (they’re still new). The picture they paint of themselves isn’t opposed to safety, and while I feel confident they will take actions I consider opposed to safety, I don’t like maligning people before they’ve actually taken actions worthy of condemnation.
2. ^
  I think it’s really healthy to ask yourself if you’re upholding your principles and acting ethically. But I find it a little suspicious how responsive some of these attitudes can be to bad press, where people often start tripping over themselves to distance themselves from whatever the journalist happened to dislike. If you’ve ever done this, consider asking yourself before you take any action how you’d feel if the fact that you took that action was on the front page of the papers. If you’d feel like you could hold your head up high, do it. Otherwise don’t. And then if you do end up on the front page of the papers, hold your head up high!
3. ^
  To a point. They won’t do things that would make them go out of business, but they might spend many millions of dollars on the practices you want them to adopt.
4. ^
  Tactically, that is. In many cases I’m glad the executives can be held responsible in this way and I think their changed behavior is better for the world.
What links here?
- AI #142: Common Ground by Zvi (13 Nov 2025 15:20 UTC; 42 points)

GradientDissenter 7 Nov 2025 23:33 UTC
15 points
26
in reply to: LWLW’s comment on: LWLW’s Shortform
I don’t understand how working on “AI control” here is any worse than working on AI alignment (I’m assuming you don’t feel the same about alignment since you don’t mention it).
In my mind, two different ways AI could cause bad things to happen are: (1) misuse: people use the AI use it for bad things, and (2) misalignment: regardless of anyone’s intent, the AI does bad things of its own accord.
Both seem bad. Alignment research and control are both ways to address misalignment problems, I don’t see how they differ for the purposes of your argument (though maybe I’m failing to understand your argument).
Addressing misalignment slightly increases people’s ability to misuse AI, but I think the effect is fairly small and outweighed by the benefit of decreasing the odds a misaligned AI takes catastrophic actions.

GradientDissenter 7 Nov 2025 6:30 UTC
16 points
6
on: GradientDissenter’s Shortform
The world seems bottlenecked on people knowing and trusting each other. If you’re a trustworthy person who wants good things for the world, one of the best ways to demonstrate your trustworthiness is by interacting with people a lot, so that they can see how you behave in a variety of situations and they can establish how reasonable, smart, and capable you are. You can produce a lot of value for everyone involved by just interacting with people more.
I’m an introvert. My social skills aren’t amazing, and my social stamina is even less so. Yet I drag myself to parties and happy hours and one-on-one chats because they pay off.
It’s fairly common for me to go to a party and get someone to put hundreds of thousands of dollars towards causes I think are impactful, or to pivot their career, or to tell me a very useful, relevant piece of information I can act on. I think each of those things individually happens more than 15% of the time that I go to a party.
(Though this is only because I know of unusually good cause areas and career opportunities. I don’t think I could get people to put money or time towards random opportunities. This is a positive-sum interaction where I’m sharing information!)
Even if talking to someone isn’t valuable in the moment, knowing lots of people comes in really handy. Being able to directly communicate with lots of people in a high-bandwidth way lets you quickly orient to situations and get things done.
I try to go to every party I’m invited to that’s liable to have new people, and I very rarely turn down an opportunity to chat with a new person. I give my calendar link out like candy. Consider doing the same!
Talking to people is hits-based
Often, people go to an event and try to talk to people but it isn’t very useful, and they give up on the activity forever. Most of the time you go to an event it will not be that useful. But when it is useful, it’s extremely useful. With a little bit of skill, you can start to guess what kinds of conversations and events will be most useful (it is often not the ones that are most flashy and high-status).
Building up trust takes time
Often when I get good results from talking to people, it’s because I’ve already talked to them a few times at parties and I’ve established myself as a trustworthy person that they know.
Talking to people isn’t zero-sum
When I meet new people, I try to find ways I can be useful to them. (Knowing lots of people makes it easier to help other folks because often you can produce value by connecting people to each other.) And when I help the people I’m talking to, I’m also helping myself because I am on the same team as them. I want things that are good for the world, and so do most other people. I’m not sure the strategy is in this short form would work at all if I was trying to trick investors into overvaluing my startup or convincing people to work for me when that wasn’t in their best interest.
I think this is the main way that “talking to people”, as I’m using the term here, differs from “networking”.
Be genuine
When I talk to people, I try to be very blunt and earnest. I happen to like hanging out with people who are talented and capable, so I typically just try to find good conversations I enjoy. I build up friendships and genuine trust with people (by being a genuinely trustworthy person doing good things, not by trying to signal trust in complicated ways). I think I have good suggestions for things people should do with their money and time, and people are often very happy to hear these things.
Sometimes I do seek out specific people for specific reasons. If I’m only talking to someone because they have information/resources that are of interest to me, I try to directly (though tactfully) acknowledge that. Part of my vibe is that I’m weirdly goal-oriented/mission-driven, and I just wear that on my sleeve because I think the mission I drive towards is a good one.
I also try to talk to all kinds of folks and often purposefully avoid “high-status” people. In my experience, chasing them is usually a distraction anyway and the people in the interesting conversations are more worth talking to.
You can ask to be invited to more social events
When I encourage people to go to more social events, often they tell me that they’re not invited to more. In my experience, messaging the person you know who is most into going to social events and asking if they can invite you to stuff works pretty well most of the time. Once you’re attending a critical mass of social events, you’ll find yourself invited to more and more until your calendar explodes.

GradientDissenter

End-of year dona­tion taxes 101

End-of year donation taxes 101