LessWrong dev & admin as of July 5th, 2022.
RobertM
test leaving a sixth comment
test leaving a fifth comment
test leaving a fourth comment
test leaving a third comment on prod!
test leaving a second comment on prod
test leaving a comment on prod
Yeah, I probably should have explicitly clarified that I wasn’t going to be citing my sources there. I agree that the fact that it’s costly to do so is a real problem, but Robert Miles points out, some of the difficulty here is insoluble.
It’s very strange to me that there isn’t a central, accessible “101” version of the argument given how much has been written.
There are several, in fact; but as I mentioned above, none of them will cover all the bases for all possible audiences (and the last one isn’t exactly short, either). Off the top of of my head, here are a few:
the presentation of the topic is unpersuasive to an intelligent layperson
There is, of course, no single presentation, but many presentations given by many people, targeting many different audiences. Could some of those presentations be improved? No doubt.
I agree that the question of how to communicate the problem effectively is difficult and largely unsolved. I disagree with some of the specific prescriptions (i.e. the call to falsely claim more-modest beliefs to make them more palatable for a certain audience), and the object-level arguments are either arguing against things that nobody[1] thinks are core problems[2] or are missing the point[3].
- ^
Approximately.
- ^
Wireheading may or may not end up being a problem, but it’s not the thing that kills us. Also, that entire section is sort of confused. Nobody thinks that an AI will deliberately change its own values to be easier to fulfill; goal stability implies the opposite.
- ^
Specific arguments about whether superintelligence will be able to exploit bugs in human cognition or create nanotech (which… I don’t see an arguments against, here, except for the contention that nothing was ever invented by a smart person sitting in an armchair, even though of course an AI will not be limited in its ability to experiment in the real world if it needs to) are irrelevant. Broadly speaking, the reason we might expect to lose control to a superintelligent AI is that achieving outcomes in real life is not a game with an optimal solution the way tic tac toe is, and the idea that something more intelligent than us will do better at achieving its goals than other agents in the system should be your default prior, not something that needs to overcome a strong burden of proof.
- ^
Over the years roughly between 2015 and 2020 (though I might be off by a year or two), it seemed to me like numerous AI safety advocates were incredibly rude to LeCun, both online and in private communications.
I’d be interested to see some representative (or, alternatively, egregious) examples of public communications along those lines. I agree that such behavior is bad (and also counterproductive).
Against them, The conjecture about what protein folding and ribosomes might one have the possibility to do really weak counterargument, based as it is on no empirical or evidentiary reasoning
I’m not sure I’ve parsed this correctly, but if I have, can I ask what unsupported conjecture you think undergirds this part of the argument? It’s difficult to say what counts as “empirical” or “evidentiary” reasoning in domains where the entire threat model is “powerful stuff we haven’t managed to build ourselves yet”, given we can be confident that set isn’t empty. (Also, keep in mind that nanotech is merely presented as a lower bound of how STEM-AGI might achieve DSA, being a domain where we already have strong reasons to believe that significant advances which we haven’t yet achieved are nonetheless possible.)
Cognition should instead be thought of as a logarithmically decreasing input into the rate of technological change.
Why? This doesn’t seem to be it worked with humans, where it was basically a step function from technology not existing, to existing.
A little bit of extra cognition
This sure is assuming a good chunk of the opposing conclusion.
...but an excess of cognition is not fungible for other necessary inputs to technological progress, such as the need for experimentation for hypothesis testing and problem solving on real world constraints related to unforeseen implementation difficulties related to unexplored technological frontiers.
And, sure, but it’s not clear why any of this matters? What is the thing that we’re going to (attempt) to do with AI, if not use it to solve real-world problems?
Transcript of a presentation on catastrophic risks from AI
By contrast, some lines of research where I’ve seen compelling critiques (and haven’t seen compelling defences) of their core intuitions, and therefore don’t recommend to people:
Cooperative inverse reinforcement learning (the direction that Stuart Russell defends in his book Human Compatible); critiques here and here.
John Wentworth’s work on natural abstractions; exposition and critique here, and another here.
The first critique of natural abstractions says:
Concluding thoughts on relevance to alignment: While we’ve made critical remarks on several of the details, we also want to reiterate that overall, we think (natural) abstractions are an important direction for alignment and it’s good that someone is working on them! In particular, the fact that there are at least four distinct stories for how abstractions could help with alignment is promising.
The second says:
I think this is a fine dream. It’s a dream I developed independently at MIRI a number of years ago, in interaction with others. A big reason why I slogged through a review of John’s work is because he seemed to be attempting to pursue a pathway that appeals to me personally, and I had some hope that he would be able to go farther than I could have.
Neither of them seemed, to me, to be critiques of the “core intuitions”; rather, the opposite: both suggested that the core intuitions seemed promising; the weaknesses were elsewhere. That suggests that natural abstractions might be a better than average target for incoming researchers, not a worse one.
I have some other disagreements, but those are model-level disagreements; that piece of advice in particular seems to be misguided even under your own models. I think I agree with the overall structure and most of the prioritization (though would put scalable oversight lower, or focus on those bits that Joe points out are the actual deciding factors for whether that entire class of approaches is worthwhile—that seems more like “alignment theory with respect to scalable oversight”).
Some combination of:
tech debt from design decisions which made sense when rebooting as LW 2.0, but have become increasingly unwieldy as development’s continued
strictly speaking there were options here that weren’t moving off of mongo, but it’d be much more difficult to make sane design choices with mongo involved. the fact that it can do a thing doesn’t mean it does that thing well.
mongo’s indexing in particular is quite bad, both in terms of how finnicky mongo is about whether it can figure out how to use an index for a given query… and then also limiting you to 64 indexes per collection. (why would you need even 64 indexes per collection, you ask? let’s just say that development around here has been focused more on shipping things quickly than on other things… and, to be fair, we only got there with one collection. of course, not all of those indexes are used very much, but another mongo headache is the insane, poorly-documented locking behavior when you try to drop an index, which causes outages. lol.)
the EA forum team (particularly @ollieetherington) doing much of the heavy lifting in terms of writing the infrastructure necessary to migrate
in a vacuum I’d unquestionably rather be on postgres, but it’s not clear that it would’ve been worth the effort if it’d just been the LessWrong team and our use-case(s) at stake.
Recent Database Migration—Report Bugs
There seems to be much more diversity in human cognitive performance than there is in human-brain-energy-efficiency; whether this is due to larger differences in the underlying software (to the extent that this is meaningfully commensurable with differences in hardware) or because smaller differences in that domain result in much larger differences in observable outputs, or both, none of that really takes away from the fact that brain software does not seem to be anywhere near the relevant efficiency frontier, especially since many trade-offs which were operative at an evolutionary scale simply aren’t when it comes to software.
I’ve heard people be somewhat optimistic about this AI guideline from China. They think that this means Beijing is willing to participate in an AI disarmament treaty due to concerns over AI risk.
I’m curious where you’ve seen this. My impression from reading the takes of people working on the governance side of things is that this is mostly being interpreted as a positive sign because it (hopefully) relaxes race dynamics in the US. “Oh, look, we don’t even need to try all that hard, no need to rush to the finish line.” I haven’t seen anyone serious making a claim that this is downstream of any awareness of x-risk concerns, let alone intent to mitigate them.
which implies by association that brain software is much more efficient as it was produced by exactly the same evolutionary process which he now admits produced fully optimized conventional computational elements over the same time frame, etc
I don’t believe this would follow; we actually have much stronger evidence that ought to screen off that sort of prior—simply the relatively large differences in human cognitive abilities.
My guess is the interpretability team is under a lot of pressure to produce insights that would help the rest of the org with capabilities work
I would be somewhat surprised if this was true, assuming you mean a strong form of this claim (i.e. operationalizing “help with capabilities work” as relying predominantly on 1st-order effects of technical insights, rather than something like “help with capabilities work by making it easier to recruit people”, and “pressure” as something like top-down prioritization of research directions, or setting KPIs which rely on capabilities externalities, etc).
I think it’s more likely that the interpretability team(s) operate with approximately full autonomy with respect to their research directions, and to the extent that there’s any shaping of outputs, it’s happening mostly at levels like “who are we hiring” and “org culture”.
Complexity of value is part of why value is fragile.
(Separately, I don’t think current human efforts to “figure out” human values have been anywhere near adequate, though I think this is mostly a function of philosophy being what it is. People with better epistemology seem to make wildly more progress in figuring out human values compared to their contemporaries.)
test leaving a seventh comment