Sure, we can talk about this over video. Check your Facebook messages.
Computing the fastest route to Paris doesn’t involve search?
More generally, I think in order for it to work your example can’t contain subroutines that perform search over actions. Nor can it contain subroutines such that, when called in the order that the agent typically calls them, they collectively constitute a search over actions.
My example uses search, but the search is not the search of the inner alignment failure. It is merely a subroutine that is called upon by this outer superstructure, which itself is the part that is misaligned. Therefore, I fail to see why my point doesn’t follow.
If your position is that inner alignment failures must only occur when internal searches are misaligned with the reward function used during training, then my example would be a counterexample to your claim, since the reason for misalignment was not due to a search being misaligned (except under some unnatural rationalization of the agent, which is a source of disagreement highlighted in the post, and in my discussion with Evan above).
How does evolutionary psychology help us during our everyday life? We already know that people like having sex and that they execute all these sorts of weird social behaviors. Why does providing the ultimate explanation for our behavior provide more than a satisfaction of our curiosity?
If one’s interpretation of the ‘objective’ of the agent is full of piecewise statements and ad-hoc cases, then what exactly are we doing it by describing it as maximizing an objective in the first place? You might as well describe a calculator by saying that it’s maximizing the probability of outputting the following [write out the source code that leads to its outputs]. At some point the model breaks down, and the idea that it is following an objective is completely epiphenomenal to its actual operation. The model that it is maximizing an objective doesn’t shed light on its internal operations any more than just spelling out exactly what its source code is.
I feel like what you’re describing here is just optimization where the objective is determined by a switch statement
Typically when we imagine objectives, we think of a score which rates how well an agent performed some goal in the world. How exactly does the switch statement ‘determine’ the objective?
Let’s say that a human is given the instructions, “If you see the coin flip heads, then become a doctor. If you see the coin flip tails, then become a lawyer.” what ‘objective function’ is it maximizing here? If it’s maximizing some weird objective function like, “probability of becoming a doctor in worlds where the coin flips heads, and probability of becoming a lawyer in worlds where the coin flips tails” this would seem to be unnatural, no? Why not simply describe it as a switch case agent instead?
Remember, this matters because we want to be perfectly clear about what types of transparency schemes work. A transparency scheme that assumes that the agent has a well-defined objective that it is using a search to optimize for, would, I think, would fail in the examples I gave. This becomes especially true if the if-statements are complicated nested structures, and repeat as part of some even more complicated loop, which seems likely.
ETA: Basically, you can always rationalize an objective function for any agent that you are given. But the question is simply, what’s the best model of our agent, in the sense of being able to mitigate failures. I think most people would not categorize the lunar lander as a search-based agent, even though you could say that it is under some interpretation. The same is true with humans, plants, animals.
That’s a good point, but it doesn’t reduce my credence much. Perhaps 94% or 95% is more appropriate? I’d be willing to bet on this.
By hand I mean anything that closely resembles a human hand.
I’m willing to bet on this prediction.
A language model making it onto the NYT’s bestseller list seems like a very specific thing. High level machine intelligence is not.
If AGI just means, “can, in principle, solve any problem” then I think we could already build very very slow AGI right now (at least for all well-defined solutions—you just perform a search over candidate solutions).
Plus, I don’t think my definition matches the definition given by Bostrom.
By a “superintelligence” we mean an intellect that is much smarter than the best human brains in practically every field, including scientific creativity, general wisdom and social skills.
ETA: I edited the original post to be more specific.
To help calibrate, watch this video.
I will probably accept bets, although the fact that someone would be willing to bet me on some of mine is evidence that I’m overconfident, so I might re-evaluate my probability if someone offers.
I’m using a slighly modified definition given by Grace et al. for high level machine intelligence.
My main data point is that I’m not very impressed by OpenAI’s robot hand. It is very impressive relative to what we had 10 years ago, but top humans are extremely adept at manipulating things in their hands.
I think, able to drive on any road that Google Maps has access to, and able to drive in all “normal” weather conditions (some snow, medium amounts of rain). I’m not confident on this, though, and I imagine that it might be a while (>10 years) before autonomous vehicles are truly autonomous (that is, they can drive in any condition that a human would be able to in any context).
Here are some of mine. These are very rough and I could probably be persuaded on many of them to move them significantly in some direction.
By 2030 (and after January 1st, 2020),No high-level AGI, defined as a single system that can perform nearly every economically valuable task more cheaply than a human, will have been created. 94%
No robot hand will be able to manipulate a Rubik’s cube as well as a top human. 80%
No state will secede from the US. 95%
No language model will write a book without substantial aid, that ends up on the New York Times bestseller list. 97%
No pandemic will kill >50 million people. 93%
Neither Puerto Rico or DC will be recognized as states. 80%Tradititional religion will continue to decline in the West, as measured by surveys that track engagement. 85%Bryan Caplan will lose a bet. 75%No US President will utter the words “Existential risk” in public during their term as president. 65%No human will have stepped foot on Mars. 50%At least one company sells nearly fully autonomous cars, defined as cars that can autonomously perform nearly all tasks that normal drivers accomplish. 80%Robin Hanson will disagree with the statement, “The rate of automation increased substantially during the 2020s, compared to prior decades.” 85%Experts will recognize that top computers can reliably beat humans at narrow language benchmarks, such as those on https://super.gluebenchmark.com/. 90%Kurzweil will lose his bet on Longbets (http://longbets.org/1/). 55%There will be no convincing evidence of contact from extraterrestrials. 99%Jeff Bezos will be unseated as the richest person in the world. 70%Robust mouse rejuvenation, defined as a mouse being rejuvenated so that it lives 2500 days, will not have been demonstrated. 85%If a survey is performed, most people in the United States will say that curing aging is undesirable. 85%There will be another economic recession in the United States. 70%World GDP will be higher than it was in 2019. 97%No one will have won a Nobel Prize in Physics for their work on string theory. 80%No war larger than the Syrian Civil War by death count, according to a reputable organization, will have occurred. 65%Donald Trump will not be convicted of high crimes and misdemeanors in his first term as president. 95%Donald Trump will serve his entire first term as president. 92%Donald Trump will win re-election. 55%Roe v Wade will not be overturned. 70%
No proof that P = NP. 98%
No proof that P != NP. 90%
If we define intelligence purely by how successful someone is, then we run into a ton of issues. For example, is a billionaire who inherited their wealth but failed out of high school smarter than a middle class college professor?
I’m not arguing that other species are more successful than humans. I’m using the more intuitive definition of intelligence (problem solving capability/ability to innovate).
The problem with it is that I’m finding many links that seem to argue that chimpanzees actually do have better memory, even compared to comparably trained humans (see this Wikipedia page, for instance). That one link says I’m wrong, but there’s many that say I’m right and I’m not sure what the answer is. It’s unfortunate that I linked to something that said I was wrong! Anyway, I’ll edit the post so that it says that I’m not actually sure.
On the contrary, when I look at the lists, it reinforces the idea that humans are smartest individually.
It’s worth noting that I have little reason to believe that the Wikipedia list is comprehensive.