To me it seems very likely that any future LLM that’s actually making judgments about who lives or dies is very likely going to reason about it.
Maybe we’ll be so lucky when it comes to one doing a takeover, but I don’t think this will be true in human applications of LLMs.
It seems the two most likely uses in which LLMs are explicitly making such judgments are applications in war or healthcare. In both cases, there’s urgency, and also you would prefer any tricky cases to be escalated to a human. So it’s simply more economical to use non-reasoning models, without much marginal benefit to the explicit reasoning (at least without taking into account this sort of effect, and just judging it by performance in typical situations).
A) Really? Reasoning models are already dirt cheap and noticeably better on almost all benchmarks than non-reasoning models. I’d be shocked if even the medical and military communities didn’t upgrade to reaosning models pretty quickly.
B) I feel that alignment for AGI is much more important than alignment for AI—that is, we should worry primarily about AI that becomes takeover-capable. I realize opinions vary. One opinion is that we needn’t worry yet about aligning AGI, because it’s probably far out and we haven’t seen the relevant type of AI yet, so can’t really work on it. I challenge anyone to do the back of envelope expected value calculation on that one on any reasonabe (that is, epistemically humble) estimate of timelines and x-risk.
Another, IMO more defensible common position is that aligning AI is a useful step toward aligning AGI. I think this is true—but if that’s part of the motivation, shouldn’t we think about how current alignment efforts build toward aligning AGI, at least a little in each publication?
Maybe we’ll be so lucky when it comes to one doing a takeover, but I don’t think this will be true in human applications of LLMs.
It seems the two most likely uses in which LLMs are explicitly making such judgments are applications in war or healthcare. In both cases, there’s urgency, and also you would prefer any tricky cases to be escalated to a human. So it’s simply more economical to use non-reasoning models, without much marginal benefit to the explicit reasoning (at least without taking into account this sort of effect, and just judging it by performance in typical situations).
A) Really? Reasoning models are already dirt cheap and noticeably better on almost all benchmarks than non-reasoning models. I’d be shocked if even the medical and military communities didn’t upgrade to reaosning models pretty quickly.
B) I feel that alignment for AGI is much more important than alignment for AI—that is, we should worry primarily about AI that becomes takeover-capable. I realize opinions vary. One opinion is that we needn’t worry yet about aligning AGI, because it’s probably far out and we haven’t seen the relevant type of AI yet, so can’t really work on it. I challenge anyone to do the back of envelope expected value calculation on that one on any reasonabe (that is, epistemically humble) estimate of timelines and x-risk.
Another, IMO more defensible common position is that aligning AI is a useful step toward aligning AGI. I think this is true—but if that’s part of the motivation, shouldn’t we think about how current alignment efforts build toward aligning AGI, at least a little in each publication?