What does economics-as-moral-foundation mean?
MichaelDickens
He mainly used analogies from IABED. Off the top of my head I recall him talking about
predicting where the molecules go when you heat an ice cube
LLMs are grown; the companies aren’t building crops, they are building farm equipment that grows crops (I don’t remember this one from IABED)
we know ASI could build a self-replicating solar-powered factory because those already exist (i.e. grass)
leaded gasoline as a case study of scientists/companies making something bad and being in denial about it, even to their own detriment
many people thought nuclear war was inevitable, but it didn’t happen, largely because the people in charge would be personally harmed by it
I’m talking about my perception of the standards for a quick take vs. post. I don’t know if my perception is accurate
My perception is that it’s not exactly about goodness, it’s more like, a post must conform to certain standards*. In the same way that a scientific paper must meet certain standards to get published in a peer-reviewed journal, but a non-publishable paper could still present novel and valuable scientific findings.
*and, even though I’ve been reading LW since 2012, I’m still not clear on what those standards are or how to meet them
On a meta level, I think this post is a paragon of how to reason about the cost-effectiveness of a highly uncertain decision, and I would love to see more posts like this one.
I don’t know Bores personally. I looked through some of his communications and social media, most of it seemed reasonable (I noticed his Twitter has an unusually small amount of mud-slinging). I did see one thread with some troubling comments:
This bill [SB 53] recognizes that in order to win the AI race, our AI needs to be both safe and trustworthy.
In this case, pro-safety is the pro-innovation position.
[...]
As a New Yorker, I have to point out that SB53 includes a cloud compute cluster & @GavinNewsom said in his signing memo “The future happens [in CA] first”
...but @KathyHochul established EmpireAI in April 2024. So, thanks to our Gov’s vision, the future actually happens in NY 😉
Why I find this troubling:
Bores seems to want to race to build AI. Racing shortens timelines and decreases safety.
“pro-safety is the pro-innovation position” seems false? If AI companies maximize profit by being safe, then they’d do it without regulation, so why would we need regulation? If they don’t maximize profit by being safe, then pro-safety is not (maximally) pro-innovation.
I think our best hope for survival is that governments become sufficiently aware of the danger of AI that they agree to ban frontier AI development until we can figure out how to make it safe. If Bores is indeed pro-innovation on AI, then he would presumably oppose such a ban. My guess is the average Democrat would be basically fine with banning frontier AI if the political winds shifted that way, but Bores would have a more strongly-held stance, in which case he would be worse than the average Democrat (but still probably better than the average Republican).
He calls out New York directing funding to a new AI research lab as if that’s a good thing, which I don’t think it is. (I don’t actually know what EmpireAI is doing, I looked at their website but it doesn’t really say anything, it says they only fund “responsible” research, but I really don’t trust them to know what qualifies as responsible.)
Politicans are often pressured to say those sorts of things, so perhaps he would still support an AI pause if it became politically feasible. So these comments aren’t overwhelmingly troubling. But they’re troubling.
If those quotes accurately reflect his stance on AI innovation and arms races, then he might still be better than the average Democrat if the increased chance of getting weak-to-moderate AI safety regulations outweighs the decreased chance of getting strong regulations, but it’s unclear to me.
I will note that this was the only worrying comment I saw from Bores, although I didn’t find many comments on AI safety.
I think the strongest case for AI stocks being overpriced is to ignore any specific facts about how AI works and take the outside view on historical market behavior. I don’t see a good argument being made in the quotes above so I will try to make a version of it.
I’m going based on memory instead of looking up sources, I’m pretty sure I’m wrong about the exact details of the claims below but I believe they’re approximately true.
The Mag 7 have a P/E of 32. Historically, when companies had a P/E of 32, their average future returns were much worse than average (my guess would be 0–3%).
A study looking at the performance of the #1 market cap company found that the top company almost always went on to underperform, which is an argument against buying Nvidia in particular.
Nvidia as a % of the total world market cap is currently the largest any company has ever been in history. When things go outside the normal historical range, that generally suggests the price is unsustainable.
Historically, the market has systematically over-extrapolated earnings/revenue growth. Companies with excellent earnings growth for years 1–3 tend to have merely above-average earnings growth for years 4–6, but they’re usually priced as if they’re going to continue to have excellent earnings growth. Although that’s just an average, and companies with very high market expectations still exceed expectations something like 40% of the time.
Stocks and industries tend to exhibit 3–5 year mean reversion, i.e. stocks that perform particularly well for 3–5 years on average underperform the market over the following year.
(These are five different perspectives on the same general market phenomenon, so they’re not really five independent pieces of evidence.)
On the outside view, I think there’s pretty good reason to believe that AI stocks are overpriced. However, on the inside view, the market sort of still doesn’t seem like it appreciates how big a deal AGI could be. So on balance I’m pretty uncertain.
Importantly, AFAICT some Horizon fellows are actively working against x-risk (pulling the rope backwards, not sideways). So Horizon’s sign of impact is unclear to me. For a lot of people, “tech policy going well” means “regulations that don’t impede tech companies’ growth”.
As in, Horizon fellows / people who work at Horizon?
What leads to you believe this?
FWIW this is also my impression but I’m going off weak evidence (I wrote about my evidence here), and Horizon is pretty opaque so it’s hard to tell. A couple weeks ago I tried reaching out to them to talk about it but they haven’t responded.
I think so, yeah. I think my probability of the next model being catastrophically dangerous is a bit higher than it was a year ago, mainly because the IMO gold medal result and similar improvements on models’ ability to reason on hard problems. An argument in the other direction is that the more data points you have along a capabilities curve, the more confident you can be that your model of the curve is accurate, although on balance I think this is probably outweighed by the fact that we are now closer to AGI than we were a year ago.
The next-gen LLM might pose an existential threat
I’m pretty sure that the next generation of LLMs will be safe. But the risk is still high enough to make me uncomfortable.
How sure are we that scaling laws are correct? Researchers have drawn curves predicting how AI capabilities scale based on how much goes into training them. If you extrapolate those curves, it looks like the next level of LLMs won’t be wildly more powerful than the current level. But maybe there’s a weird bump in the curve that happens in between GPT-5 and GPT-6 (or between Claude 4.5 and Claude 5), and LLMs suddenly become much more capable in a way that scaling laws didn’t predict. I don’t think we can be more than 99.9% confident that there’s not.
How sure are we that current-gen LLMs aren’t sandbagging (that is, deliberately hiding their true skill level)? I think they’re still dumb enough that their sandbagging can be caught, and indeed they have been caught sandbagging on some tests. I don’t think LLMs are hiding their true capabilities in general, and our understanding of AI capabilities is probably pretty accurate. But I don’t think we can be more than 99.9% confident about that.
How sure are we that the extrapolated capability level of the next-gen LLM isn’t enough to take over the world? It probably isn’t, but we don’t really know what level of capability is required for something like that. I don’t think we can be more than 99.9% confident.
Perhaps we can be >99.99% that the extrapolated capability of the next-gen LLM is still not as smart as the smartest human. But an LLM has certain advantages over humans—it can work faster (at least on many sorts of tasks), it can copy itself, it can operate computers in a way that humans can’t.
Alternatively, GPT-6/Claude 5 might not be able to take over the world, but it might be smart enough to recursively self-improve, and that might happen too quickly for us to do anything about.
How sure are we that we aren’t wrong about something else? I thought of three ways we could be disastrously wrong:
We could be wrong about scaling laws;
We could be wrong that LLMs aren’t sandbagging;
We could be wrong about what capabilities are required for AI to take over.
But we could be wrong about some entirely different thing that I didn’t even think of. I’m not more than 99.9% confident that my list is comprehensive.
On the whole, I don’t think we can say there’s less than a 0.4% chance that the next-gen LLM forces us down a path that inevitably ends in everyone dying.
As I understand, this is how scientific bodies’ position statements get written. Scientists do not universally agree about the facts in their field, but they iterate on the statement until none of the signatories have any major objections.
I have a strong intuition that this isn’t how it works:
When I have a positive experience, it is readily apparent to me that the experience is positive, and no amount of argument can convince me that actually I didn’t enjoy myself.
Suppose I did something that I quite enjoyed, and then Omega came up to me and said “actually somebody else last week (or a simulation of you, or whatever) already experienced those exact same qualia, so your qualia weren’t that valuable.” I’d say, sorry Omega, that is wrong, my experience was good regardless of whether it had already happened before. I know it was good because I directly experienced its goodness.
Perhaps I’m misunderstanding your objection but I think the issue is that Claude is hosted on AWS servers (among other places), which means Amazon could steal Claude’s model weights if it wanted to, and ASL-3 states that Claude needs to be secure against theft by other companies (including Amazon).
I don’t know for sure that Zach’s assertion is true, but I’m reasonably confident that a dedicated Amazon security team could steal the contents of any AWS server if they really wanted to.
The great thing is outside university nobody cares about how fast I can apply the gauss algorithm. It’s just important that I know when to use it.
This particular fact sounds right but I think the generalization is often wrong. At my first software development job, I learned more slowly than my peers, and I took longer than usual to get promoted from entry-level to mid-level. This had a real material impact on my earnings and therefore how much money I could donate. It would have been better for the world if I had been able to learn faster.
But I still basically agree with the lesson, as I understand it: trying to go fast is overrated. I don’t think “try to go fast” would’ve helped me at all. (In fact I often was trying to go fast, and it didn’t help.)
Yeah I agree that it’s not a good plan. I just think that if you’re proposing your own plan, your plan should at least mention the “standard” plan and why you prefer to do something different. Like give some commentary on why you don’t think alignment bootstrapping is a solution. (And I would probably agree with your commentary.)
First, the obvious: baryonic matter itself is a rounding error
This is a minor comment but I think it would be useful to define terms like “baryonic matter”. I’m much more well-versed in particle physics than the average person (I have taken one (1) class on particle physics, which I think puts me in a pretty high percentile on particle physics knowledge) but I don’t remember what “baryonic matter” means. It’s also non-trivial to find out what it means: Wikipedia says a baryon is defined as having an odd number of valence quarks, but I have no intuition for what that means either.
From context, I think what you mean by “baryonic matter” is “matter that forms atoms”, and you mean to exclude dark matter, black holes, and force-transmitter particles (photons etc.).
This plan is pretty abstract (which is necessary because it’s short) but in some ways I think it’s better than any of the AI companies’ published 400-page plans. From what I’ve seen, AI companies don’t care enough about trying to break their own evals, and they don’t care enough about theory work.
Maybe this is too in the weeds but I’m skeptical that we can create robust alignment evals using anything resembling current methods. A superintelligent AI will be better at breaking evals than humans, so I expect there is a big gap between “our top researchers have tried and failed to find any loopholes in our alignment evals” and “a superintelligence will not be able to find any loopholes”.
AI companies would probably answer that they only need to align weaker models and then bootstrap to an aligned superintelligence. You didn’t talk about that in your meta plan though. I think it would be good for you to talk about that in your meta-plan, since it’s what every AI company (with an alignment team) currently plans on doing.
I think you could approximately define philosophy as “the set of problems that are left over after you take all the problems that can be formally studied using known methods and put them into their own fields.” Once a problem becomes well-understood, it ceases to be considered philosophy. For example, logic, physics, and (more recently) neuroscience used to be philosophy, but now they’re not, because we know how to formally study them.
So I believe Wei Dai is right that philosophy is exceptionally difficult—and this is true almost by definition, because if we know how to make progress on a problem, then we don’t call it “philosophy”.
For example, I don’t think it makes sense to say that philosophy of science is a type of science, because it exists outside of science. Philosophy of science is about laying the foundations of science, and you can’t do that using science itself.
I think the most important philosophical problems with respect to AI are ethics and metaethics because those are essential for deciding what an ASI should do, but I don’t think we have a good enough understanding of ethics/metaethics to know how to get meaningful work on them out of AI assistants.