I kind of want you to get quantitative here? Like pretty much every action we take has some effect on AI timelines, but I think effect-on-AI-timelines is often swamped by other considerations (like effects on attitudes around those who will be developing AI).
Of course it’s prima facie more plausible that the most important effect of AI research is the effect on timelines, but I’m actually still kind of sceptical. On my picture, I think a key variable is the length of time between when-we-understand-the-basic-shape-of-things-that-will-get-to-AGI and when-it-reaches-strong-superintelligence. Each doubling of that length of time feels to me like it could be worth order of 0.5-1% of the future. Keeping implemented-systems close to the technological-frontier-of-what’s-possible could help with this, and may be more affectable than the
Note that I don’t think this really factors into an argument in terms of “advancing alignment” vs “aligning capabilities” (I agree that if “alignment” is understood abstractly the work usually doesn’t add too much to that). It’s more like a DTD argument about different types of advancing capabilities.
I think it’s unfortunate if that strategy looks actively bad on your worldview. But if you want to persuade people not to do it, I think you either need to persuade them of the whole case for your worldview (for which I’ve appreciated your discussion of the sharp left turn), or to explain not just that you think this is bad, but also how big a deal do you think it is. Is this something your model cares about enough to trade for in some kind of grand inter-worldview bargaining? I’m not sure. I kind of think it shouldn’t be (that relative to the size of ask it is, you’d get a much bigger benefit from someone starting to work on things you cared about than stopping this type of capabilities research), but I think it’s pretty likely I couldn’t pass your ITT here.
I’d be very interested to read more about the assumptions of your model, if there’s a write-up somewhere.
Fair question. I just did the lazy move of looking up world GDP figures. In fact I don’t think that my observers would measure GDP the same way we do. But it would be a measurement of some kind of fundamental sense of “capacity for output (of various important types)”. And I’m not sure whether that has been growing faster or slower than real GDP, so the GDP figures seem a not-terrible proxy.
I’d be interested to dig into this claim more. What exactly is the claim, and what is the justification for it? If the claim is something like “For most tasks, the thinking machines seem to need 0 to 3 orders of magnitude more experience on the task before they equal human performance” then I tentatively agree. But if it’s instead 6 to 9 OOMs, or even just a solid 3 OOMs, I’d say “citation needed!”
No precise claim, I’m afraid! The whole post was written from a place of “OK but what are my independent impressions on this stuff?”, and then setting down the things that felt most true in impression space. I guess I meant something like “IDK, seems like they maybe need 0 to 6 OOMs more”, but I just don’t think my impressions should be taken as strong evidence on this point.
The general point about the economic viability of automating specialized labour is about more than just data efficiency; there are other ~fixed costs for automating industries which mean small specialized industries will be later to be automated.
(It’s maybe worth commenting that the scenarios I describe here are mostly not like “current architecture just scales all the way to human-level and beyond with more compute”. If they actually do scale then maybe superhuman generalization happens significantly earlier in the process.)
It’s a lightly fictionalized account of my independent impressions of AI trajectories.
Interesting, I think there’s some kind of analogy (or maybe generalization) here, but I don’t fully see it.
I at least don’t think it’s a direct reinvention because slack (as I understand it) is a think that agents have, rather than something which determines what’s good or bad about a particular decision.
(I do think I’m open to legit accusations of reinvention, but it’s more like reinventing alignment issues.)
I’m relatively a fan of their approach (although I haven’t spent an enormous amount of time thinking about it). I like starting with problems which are concrete enough to really go at but which are microcosms for things we might eventually want.
I actually kind of think of truthfulness as sitting somewhere on the spectrum between the problem Redwood are working on right now and alignment. Many of the reasons I like truthfulness as medium-term problem to work on are similar to the reasons I like Redwood’s current work.
I think it would be an easier challenge to align 100 small ones (since solutions would quite possibly transfer across).
I think it would be a bigger victory to align the one big one.
I’m not sure from the wording of your question whether I’m supposed to assume success.
To add to what Owain said:
I think you’re pointing to a real and harmful possible dynamic
However I’m generally a bit sceptical of arguments of the form “we shouldn’t try to fix problem X because then people will get complacent”
I think that the burden of proof lies squarely with the “don’t fix problem X” side, and that usually it’s good to fix the problem and then also give attention to the secondary problem that’s come up
I note that I don’t think of politicians and CEOs to be the primary audience of our paper
Rather I think in the next several years such people will naturally start having more of their attention drawn to AI falsehoods (as these become a real-world issue), and start looking for what to do about it
I think that at that point it would be good if the people they turn to are better informed about the possible dynamics and tradeoffs. I would like these people to have read work which builds upon what’s in our paper. It’s these further researchers (across a few fields) that I regard as the primary audience for our paper.
I don’t think I’m yet at “here’s regulation that I’d just like to see”, but I think it’s really valuable to try to have discussions about what kind of regulation would be good or bad. At some point there will likely be regulation in this space, and it would be great if that was based on as deep an understanding as possible about possible regulatory levers, and their direct and indirect effects, and ultimate desirability.
I do think it’s pretty plausible that regulation about AI and truthfulness could end up being quite positive. But I don’t know enough to identify in exactly what circumstances it should apply, and I think we need a bit more groundwork on building and recognising truthful AI systems first. I guess quite a bit of our paper is trying to open the conversation on that.
I think there’s also a capability component, distinct from “understanding/modeling the world”, about self-alignment or self-control—the ability to speak or act in accordance with good judgement, even when that conflicts with short-term drives.
In my ontology I guess this is about the heuristics which are actually invoked to decide what to do given a clash between abstract understanding of what would be good and short-term drives (i.e. it’s part of meta-level judgement). But I agree that there’s something helpful about having terminology to point to that part in particular. Maybe we could say that self-alignment and self-control are strategies for acting according to one’s better judgement?
Do you consider “good decision-making” and “good judgement” to be identical? I think there’s a value alignment component to good judgement that’s not as strongly implied by good decision-making.
I agree that there’s a useful distinction to be made here. I don’t think of it as fitting into “judgement” vs “decision-making” (and would regard those as pretty much the same), but rather about how “good” is interpreted/assessed. I was mostly using good to mean something like “globally good” (i.e. with something like your value alignment component), but there’s a version of “prudentially good judgement/decision-making” which would exclude this.
I’m open to suggestions for terminology to capture this!
I think the double decrease effect kicks in with uncertainty, but not with confident expectation of a smaller network.
I’m not sure I’ve fully followed, but I’m suspicious that you seem to be getting something for nothing in your shift from a type of uncertainty that we don’t know how to handle to a type we do.
It seems to me like you must be making an implicit assumption somewhere. My guess is that this is where you used i to pair S with S′. If you’d instead chosen j=i∘ρ as the matching then you’d have uncertainty between whether m should be j or ρ−1∘j. My guess is that generically this gives different recommendations from your approach.