That is too strong a statement. I think that it is evidence that general intelligence may be easier to achieve than commonly thought. But, past evidence has already shown that over the last couple of years and I am not sure that this is significant additional evidence in that regard.
It’s not my fire alarm (in part because I don’t think that’s a good metaphor). But it has caused me to think about updating timelines.
My initial reaction was to update timelines, but this achievement seems less impressive than I thought at first. It doesn’t seem to represent an advance in capabilities; instead it is (another) surprising result of existing capabilities.
My understanding is that starting in late 2020 with the release of Stockfish 12, Stockfish would probably be considered AI, but before that it would not be. I am, of course, willing to change this view based on additional information.
The original Alpha Zero- Stockfish match was in 2017, so if the above is correct, I think referring to Stockfish as non-AI makes sense.
“AI agents may not be radically superior to combinations of humans and non-agentic machines”
I’m not sure that the evidence supports this unless the non-agentic machines are also AI.
In particular: (i) humans are likely to subtract from this mix and (ii) AI is likely to be better than non-AI.
In the case of chess, after two decades of non-AI programming advances from the time that computers beat the best human, involving humans no longer provides an advantage over just using the computer programs. And, Alpha Zero fairly decisively beat Stockfish (one of the best non-AI chess programs).
If the requirement for this to be true is that the non-agentic machine needs to be non-agentic AI, I am unsure that this is a separate argument from the one about AI being non-agentic. Rather this is a necessary condition for that point.
My impression is that there has been a variety of suggestions about the necessary level of alignment. It is only recently that don’t kill most of humanity has been suggested as a goal and I am not sure that the suggestion was meant to be taken seriously. (Because if you can do that, you can probably do much better; the point of that comment as I understand it was that we aren’t even close to being able to achieve even that goal.)
As an empirical fact, humans are not perfect human face recognizers. It is something humans are very good at, but not perfect. We are definitely much better recognizers of human faces than of worlds high in human values. (I think it is perhaps more relevant to say consensus on what constitutes a human face is much. much higher than what constitutes a world high in human values.)
I am unsure whether this distinction is relevant for the substance of the argument however.
I don’t have the same order, but tend to agree that option 0 is the most likely one.
This was well written and persuasive. It doesn’t change my views against AGI on very short time lines (pre-2030), but does suggest that I should be updating likelihoods thereafter and shorten timelines.
Humans obtain value from other humans and depend on them for their existence. It is hypothesized that AGIs will not depend on humans for their existence. Thus, humans who would not push the button to kill all other humans may choose not to do so for reasons of utility that don’t apply to AGI. Your hypothetical assumes this difference away, but our observations of humans don’t.
As you not, human morality and values were shaped by evolutionary and cultural pressure in favor of cooperation with other humans. The way this presumably worked is that humans who were less able or willing to cooperate tended to die more frequently. And cultures that were less able or willing to do so were conquered and destroyed. It is unclear how we would be able to replicate this or how well it translates.
It is unclear how many humans would actually choose to press this button. Your guess is that between 5% and 50% of humans would choose to do so.
That doesn’t suggest humans are very aligned; rather the opposite. It means that if we have between 2 and 20 AGIs (and those numbers don’t seem unreasonable) between 1 and 10 would choose to destroy humanity. Of course, extinction is the extreme version; having an AGI could also result in other negative consequences.
Those are fair concerns, but my impression in general is that those kinds of attitudes will tend to moderate in practice as Balsa becomes larger, develops and focuses on particular issues. To the extent they don’t and are harmful, Balsa is likely to be ineffective but is unlikely to be able to be significant enough to cause negative outcomes.
I understand why you get the impression you do. The issues mentioned are all over the map. Zoning is not even a Federal government issue. Some of those issues are already the subject of significant reform efforts. In other cases, such as “fixing student loans” it’s unclear what Balsa’s goal even is.
But, many of the problems identified are real.
And, it doesn’t seem that much progress is being made on many of them.
So, Balsa’s goal is worthy.
And, it may well be that Balsa turns out to be unsuccessful, but doing nothing is guaranteed to be unsuccessful.
So, I for one applaud this effort and am excited to see what comes of it.
It does sound reckless doesn’t it? Even more so when you consider that over time you would likely have to eliminate many species of mosquito, not just one to achieve the effect you desire. And, as the linked nature article noted, this could have knock on effects on other species which prey on mosquitos.
I think your comment is important, because this is probably the heart of the objection to using gene drives to exterminate mosquitos.
I think a few points are relevant in thinking about this objection:
(1) We already take steps to reduce mosquito populations, which are successful in wealthier countries.
(2) This shows the limited ecological effects of eliminating mosquitos.
(3) The existing efforts are not narrowly targeted. Eliminating malaria and other disease causing mosquitos would enable these other efforts to stop, possibly reducing overall ecological effects.
(4) Malaria is a major killer and there are other mosquito borne diseases. If you are looking at this from a human-centered perspective, the ecological consequences would have to be clear and extreme to conclude that this step shouldn’t be taken and the consequences don’t appear to be clear or extreme. (If there is another perspective you are looking at this from, I’d be happy to consider it.)
(5) Humanity is doing its best to eradicate Guinea worm to universal praise. It’s a slow process. Would you suggest reversing it? Why are mosquitos and Guinea worms different?
I think we have significantly longer. Still, if success requires several tens of thousands of people researching this for decades, we will likely fail.
(1) Reasoned estimates for the date as of which we will develop AGI start in less than two decades.
(2) To my knowledge, there aren’t thousands studying alignment now (let alone tens of thousands) and there does not seem to be a significant likelihood of that changing in the next few years.
(3) Even if, by the early 2030s, there are 10s of thousands of researchers working on alignment, there is a significant chance they may not have time to work on it for decades before AGI is developed.
“When we refer to “aligned AI”, we are using Paul Christiano’s conception of “intent alignment”, which essentially means the AI system is trying to do what its human operators want it to do.”
Reading this makes me think that the risk of catastrophe due to human use of AGI is higher than I was thinking.
In a word where AGI is not agentic, but is ubiquitous I can easily see people telling “their” AGIs to “destroy X” or “do something about Y” and having catastrophic results. (And attempts to prevent such outcomes could also have catastrophic results for similar reasons.)
So you may need to substantively align AGI (i.e. have AGI with substantive values or hard to alter restrictions) even if the AGI itself does not have agency or goals.
I think that either of the following would be reasonably acceptable outcomes:
(i) alignment with the orders of the relevant human authority, subject to the Universal Declaration of Human Rights as it exists today and other international human rights law as it exists today;
(ii) alignment with the orders of relevant human authority, subject to the constraints imposed on governments by the most restrictive of the judicial and legal systems currently in force in major countries.
Alignment doesn’t mean that AGI is going to be aligned with some perfect distillation of fundamental human values (which doesn’t exist) or the “best” set of human values (on which there is no agreement); it means that a range of horrible results (most notably human extinction due to rational calculation) is ruled out.
That my values aren’t perfectly captured by those of the United States government isn’t a problem. That the United States government might rationally decide it wanted to kill me and then do so would be.
Some of your constraints, in particular the first two, seem like they would not be practical in the real world in which AI would be deployed. On the other hand, there are also other things one could do in the real world which can’t be done in this kind of dialogue, which makes boxing theoretically stronger.
However, the real problem with boxing is that whoever boxes less is likely to have a more effective AI, which likely results in someone letting an AI out of its box or more likely, loosening the box constraints sufficiently to permit an escape.
“The greatest cost is probably starting expansion a tiny bit later, not making the most effective use of what’s immediately at hand.”
Possible, but not definitely so. We don’t really know all the relevant variables.
The two questions you pose are not equivalent. There are critiques of AI existential risk arguments. Some of them are fairly strong. I am unaware of any which do a good job of quantifying the odds of AI existential risk. In addition, your second question appears to be asking for a cumulative probability. It’s hard to see how you can provide that absent a mechanism for eventually cutting AI existential risk to zero...which seems difficult.
You are making a number of assumptions here.
(1) The AI will value or want the resources used by humans. Perhaps. Or, perhaps the AI will conclude that being on a relatively hot planet in a high-oxygen atmosphere with lots of water isn’t optimal and leave the planet entirely.
(2) The AI will view humans as a threat. The superhuman AI that those on Less Wrong usually posit, one so powerful that it can cause human extinction with ease, can’t be turned off or reprogrammed and can manipulate humans as easily as I can type can’t effectively be threatened by human beings.
(3) An AI which just somewhat cares about humans is insufficient for human survival. Why? Marginal utility is a thing.
In addition to being misleading, this just makes AI one more (small) facet of security. But security is broadly underinvested in and there is limited government pushback. In addition, there is already a security community which prioritizes other issues and thinks differently. So this would place AI in the wrong metaphorical box.
While I’m not a fan of the proposed solution I do want to note that its good that people are beginning to look at the problem.