I’ve since edited the previous comment to agree with you in principle, but I think this particular objection doesn’t really work.
Let’s say Lawrence asks the AI to get him a cheeseburger with probability at least 90%. The AI can’t use its usual plan because the local burger place is closed. It picks the next simplest plan, which involves using a couple more computers for additional planning and doesn’t specify any further details. These computers receive the subgoal “maximize the probability no matter what”, because it’s slightly simpler mathematically than capping it at 90%, and doesn’t have any downside from the POV of the original goal.
If you want the AI to avoid such plans, it needs to have a concept of “non-extreme” that agrees with our intuitions more reliably. As far as I understand, that’s pretty much the friendly AI problem.
As far as I understand, that’s pretty much the friendly AI problem.
I think it’s simpler, but not by much. Instead of knowing both the value and cost of everything, you just need to know the cost of everything. (The ‘actual’ cost, that is, not the full economic cost, which by including opportunity cost includes the value problem.) You could probably get away with an approximation of the cost, though a guarantee like “at least as high as the actual cost” is probably helpful.
So if Lawrence says “I’ll pay up to $10 for a hamburger,” either it can find a plan that provides Lawrence a hamburger for less than $10 (gross cost, not net cost), or it says “sorry, can’t find anything at that price range.”
I think there’s a huge amount of work to get there—you have to have an idea of ‘gross cost’ that matches up well enough with our intuitions, which is an intuition-encoding problem and thus hard. (If it tweets at the local burger company to get a coupon for a free burger, what’s the cost?)
I’ve since edited the previous comment to agree with you in principle, but I think this particular objection doesn’t really work.
Let’s say Lawrence asks the AI to get him a cheeseburger with probability at least 90%. The AI can’t use its usual plan because the local burger place is closed. It picks the next simplest plan, which involves using a couple more computers for additional planning and doesn’t specify any further details. These computers receive the subgoal “maximize the probability no matter what”, because it’s slightly simpler mathematically than capping it at 90%, and doesn’t have any downside from the POV of the original goal.
If you want the AI to avoid such plans, it needs to have a concept of “non-extreme” that agrees with our intuitions more reliably. As far as I understand, that’s pretty much the friendly AI problem.
I think it’s simpler, but not by much. Instead of knowing both the value and cost of everything, you just need to know the cost of everything. (The ‘actual’ cost, that is, not the full economic cost, which by including opportunity cost includes the value problem.) You could probably get away with an approximation of the cost, though a guarantee like “at least as high as the actual cost” is probably helpful.
So if Lawrence says “I’ll pay up to $10 for a hamburger,” either it can find a plan that provides Lawrence a hamburger for less than $10 (gross cost, not net cost), or it says “sorry, can’t find anything at that price range.”
I think there’s a huge amount of work to get there—you have to have an idea of ‘gross cost’ that matches up well enough with our intuitions, which is an intuition-encoding problem and thus hard. (If it tweets at the local burger company to get a coupon for a free burger, what’s the cost?)