matthewp

Karma: 41

matthewp 27 Oct 2019 19:47 UTC
1 point
in reply to: Donald Hobson’s comment on: Fetch The Coffee!
I think the second robot you’re talking about isn’t the candidate for the AGI-could-kill-us-all level alignment concern. It’s more like a self driving car that could hit someone due to inadequate testing.

Guess I’m not sure though how many answers to our questions you envisage the agent you’re describing generating from second principles. That’s the nub here because both the agents I tried to describe above fit the bill of coffee fetching, but with clearly varying potential for world-ending generalisation.

matthewp 27 Oct 2019 20:15 UTC
1 point
in reply to: Isnasene’s comment on: Fetch The Coffee!

The Promethean Servant doesn’t have to be able to generate all those answers. If we could hardcode all of those and programmed it to never make decisions related to them, it would still be dangerous. For instance, if it thought “Fetching coffee is easier when more coffee is nearby->Coffee is most nearby when everything is coffee->convert all possible resources into coffee to maximize fetching”).

We have to imagine a system not specifically designed to fetch the coffee that happens to be instructed to ‘fetch the coffee’. Everything to do with the understanding of any instruction it is given has to be generated by higher level principles.

You should be able to see before any coffee fetching instruction was ever uttered how other problems would be approached by the agent. There’s a sense in which understanding ‘fetch the coffee’ also entails exclusion of things which aren’t fetching the coffee such as transforming the building into a cafetiere. But ‘don’t turn the building into a cafetiere’ is not a rule specified in any dictionary. It is though, the kind of rule that could be generated on the fly by a kernel operating on the principle that the major effects of verbing a noun will tend to be on the noun. The installation of this principle would, to some extent, be visible from behaviours in other scenarios (did the robot use Jupiter to make a giant mechanical leg to kick the Earth when instructed to ‘kick the ball’).

The very idea of an AGI must surely be more like a general solution to a family of problems, than a family of solutions mapping in to a family of problems.

matthewp 29 Oct 2019 10:37 UTC
1 point
in reply to: Isnasene’s comment on: Fetch The Coffee!
Lots of good points here, thanks.

My overall reaction is that:

The corrigibility framework does look like a good framework to hang the discussion on.

Your instruction to examine Y-general danger rather than X-specific danger here seems right. However, we then need to inspect what this means for the original argument. The Russell criticism being that it’s blindingly obvious that an apparently trivial MDP is massively risky.

After this detour we see different kinds of risks: industrial machinery operation, and existential risk. The fixed objective, hard-coded, hard-designed Javan Roomba seems limited to posing the first kind of risk. When we start talking about the systems that could give rise to the second kind, the reasoning becomes far more subtle.

In which case I think it would be wise for someone with Russell’s views not to call the opposition stupid. Or to assert that the position is trivial. When in fact the argument might come down to fairly nuanced points about natural language understanding, comprehension, competence, corrigibility etc. As far as I can tell from limited reading, the arguments around how tightly bundled these things may be are not watertight.
- May try to respond more fully later. Cheers for the thoughts.

matthewp 29 Oct 2019 19:51 UTC
1 point
in reply to: TurnTrout’s comment on: Fetch The Coffee!
This misses the original point. The Roomba is dangerous, in the sense that you could write a trivial ‘AI’ which merely gets to choose angle to travel along, and does so irregardless of grandma in the way.

But such an MDP not going to pose an X-risk. You can write down the objective function (y—x(theta))^2 differentiate wrt theta. Follow the gradient and you’ll never end up at an AI overlord. Such a system lacks any analogue of opposable thumbs, memory and a good many other things.

Pointing at dumb industrial machinery operating around civilians and saying it is dangerous may well be the truth, but it’s not the right flavour of dangerous to support Russell’s claim.

So, yes, it is going to come down to a more nuanced argument.

matthewp 29 Oct 2019 21:15 UTC
1 point
in reply to: TurnTrout’s comment on: Fetch The Coffee!
The human-off-button doesn’t help Russell’s argument with respect to the weakness under discussion.

It’s the equivalent of a Roomba with a zap obstacle action. Again the solution is to dial theta towards the target and hold the zap button assuming free zaps. It still has a closed form solution that couldn’t be described as instrumental convergence.

Russell’s argument requires a more complex agent in order to demonstrate the danger of instrumental convergence rather than simple industrial machinery operation.

Isnasene’s point above is closer to that, but that’s not the argument that Russell gives.

‘and the assumption that an agent can compute a farsighted optimal policy)’

That assumption is doing a lot of work, it’s not clear what is packed into that, and it may not be sufficient to prove the argument.

matthewp 2 Nov 2019 22:03 UTC
3 points
in reply to: Said Achmiz’s comment on: The Technique Taboo
:D If I could write the right 50-80 words of code per minute my career would be very happy about it.

matthewp 10 Dec 2019 20:01 UTC
3 points
on: Bayesian examination
I like this idea generally.

Here is an elaboration on a theme I was thinking of running in a course:

If they could have a single yes / no question answered on the topic, what should most people ask?

The idea being to get people to start thinking about what the best way to probe for more information is when “directly look up the question’s answer” is not an option.

This isn’t something that can be easily operationalized on a large scale for examination. It is an exercise that could work in small groups.

One way to operationalize would be to construct the group average distribution, and score the question according to (0.5 - sum(mass of states mapping to true))^2. This only works (easily) for questions like, “Is the IOC in either of Geneva or Lugano?”

matthewp 9 Jan 2020 14:58 UTC
3 points
on: Are “superforecasters” a real phenomenon?
Just observing that the answer to this question should be more or less obvious from a histogram (assuming large enough N and a sufficient number of buckets), “Is there a substantial discontinuity at the 2% quantile?”

Power law behaviour is not necessary and arguably not sufficient for “superforecasters are a natural category” to win (e.g. it should win in a population in which 2% have a brier score of zero and the rest 1, which is not a power law).

matthewp 23 Apr 2020 20:05 UTC
1 point
on: An Orthodox Case Against Utility Functions
The description of a particular version of expected utility theory feels very particular to me.

Utility is generally expressed as a function of a random variable. Not as a function of an element from the sample space.

For instance: suppose that my utility is linear in the profit or loss from the following game. We draw one bit from /dev/random. If it is true, I win a pound, else I lose one.

Utility is not here a function of ‘the configuration of the universe’. It is a function of a bool. The bool itself may depend on (some subset of) ‘the configuration of the universe’ but reality maps universe to bool for us, computability be damned.

matthewp 26 Apr 2020 10:02 UTC
3 points
on: My experience with the “rationalist uncanny valley”
I was glad to read a post like this!

The following is as much a comment about EA as it is about rationality:

“My self-worth is derived from my absolute impact on the world—sometimes causes a vicious cycle where I feel worthless, make plans that take that into account, and feel more worthless.”

If you are a 2nd year undergraduate student, this is a very high bar to set.

First impact happens downstream, so we can’t know our impact for sure until later. Depending on what we do, until possibly after we are dead.

Second, on the assumption that impact is uncertain, it is possible to live an exemplary life and yet have near zero impact due to factors beyond out control. (You cure cancer moments before the asteroid hits)

Third. If we pull down the veil of ignorance, it is easy to imagine people with the motivation but not the opportunity to have impact. We generally don’t think such people have no worth—otherwise what is it all for? By symmetry we should not judge ourselves more harshly than others.

I find intrusive thoughts take hold when I suspect they may be true. I hope this is one which might be exorcised on the basis that it is a bad idea, not an uncomfortable truth.

matthewp 19 Aug 2020 21:15 UTC
10 points
on: The US Already Has A Wealth Tax
Capital gains has important differences to wealth tax. It’s a tax on net-wealth-disposed-of-in-a-tax-year, or perhaps the last couple for someone with an accountant.

So your proverbial founder isn’t taxed a penny until they dispose of their shares.

Someone sitting on a massive pile of bonds won’t be paying capital gains tax, but rather enjoying the interest on them.

matthewp 31 Mar 2022 13:09 UTC
1 point
in reply to: Rohin Shah’s comment on: The ground of optimization
Thanks for the additions here. I’m also unsure how to gel this definition (which I quite like) with the inner/outer/mesa terminology. Here is my knuckle dragging model of the post’s implication:

target_set = f(env, agent)

So if we plug in a bunch of values for agent and hope for the best, the target_set we get might might not be what we desired. This would be misalignment. Whereas the alignment task is more like to fix target_set and env and solve for agent.

The stuff about mesa optimisers mainly sounds like inadequate (narrow) modelling of what env, agent and target_set are. Usually fixating on some fraction of the problem (win the battle, lose the war problem).

matthewp 28 Apr 2022 16:25 UTC
18 points
in reply to: supposedlyfun’s comment on: The Game of Masks

I had to read some Lacan in college, putatively a chunk that was especially influential on the continental philosophers we were studying.

Same. I am seeing a trend where rats who had to spend time with this stuff in college say, “No, please don’t go here it’s not worth it.” Then get promptly ignored.

The fundamental reason this stuff is not worth engaging with is because it’s a Rorschach. Using this stuff is a verbal performance. We can make analogies to Tarot cards but in the end we’re just cold reading our readers.

Lacan and his ilk aren’t some low hanging source of zero day mind hacks for rats. Down this road lies a quagmire, which is not worth the effort to traverse.

matthewp 14 Feb 2023 13:52 UTC
2 points
5
on: Childhoods of exceptional people
I’m not seeing that much here to rule out an alternative summary: get born into a rich, well-connected family.
Now, I’m not a historian, but iirc private tutoring was very common in the gentry/aristocracy 200 years back. So most of the UK examples might not say that much other than this person was default educated for class+era.
Virginia Woolf is an interesting case in point, as she herself wrote, “A woman must have money and a room of her own if she is to write fiction.”