• Great points. I think one of those charts, though, is incorrect: the one right above the “What’s going on?” header. The numbers don’t look right: Penn State has the almost identical SAT scores as Harvard? The scores listed on Penn State’s website are dramatically different: 1270-1450 for their top campus, not 1450-1560. Other scores listed look right. My guess for the mistake is they swapped it with University of Pennsylvania, for which I found 1440-1560 and 1500 average scores in one source but isn’t on the list. Regardless, I weep for today’s students on this academic treadmill.

I’ll try to answer this since no one else has yet, but I’m not super confident in my answer. You’re accurately summarizing a shift and it’s about learning to walk before you learn to run. If we can’t “align” an AI to optimizing the number of paperclips (say), then we surely can’t align it to human values. See the concept of ‘mesa optimizers’ for some relevant ideas. I think this used to be thought of as not such an issue since traditional AI developments like Deep Blue had no difficulties getting an AI to follow a prescribed goal of “win at Chess” while modern ML methods make this issue more obvious.

This actually sounds quite different from “deliberate practice”. This says that to get better at playing piano, you should play the piano. Deliberate practice says that just playing the piano isn’t enough and maybe even mostly a waste of time (for this specific goal). It feels to me that deliberate practice would win out, so am I misunderstanding this? Is it just a framework for the concept and the implementation would look basically like deliberate practice?

This is a nice write-up, thanks for sharing it.

It seems like it’s possible to design models that always have this property. For any model M, consider two copies of it, M’ and M″. We can construct a third model N that is composed of M’ and M″ plus a single extra weight p (with 0 ⇐ p ⇐ 1). Define N to have output by running the input on M’ and M″ and choose the result of M’ with probability p and otherwise the result of M″. Any N achieves minimal loss if and only if it’s one of the following cases:

• M’ and M″ are both individually optimal and 0 < p < 1,

• M’ is optimal and p = 1 (M″ arbitrary), or

• M″ is optimal and p = 0 (M’ arbitrary).

Then to get a path between any optimal weights, we just move to p = 0, modify M″ as desired, then move to p=1 and modify M’ as desired, then set p as desired. I think there’s a few more details than this; it probably depends upon some properties of the loss function, for example, that it doesn’t depend upon the weights of M only the output.

Unfortunately this doesn’t help us at all in this case! I don’t think it’s any more interpretable than M alone. So maybe this example isn’t useful.

Also, don’t submersions never have (local or global) minima? Their derivative is surjective so can never be zero. Pretty nice-looking loss functions end up not having manifolds for their minimizing sets, like x^2y^2. I have a hard time reasoning about whether this is typical or atypical though. I don’t even have an intuition for why the global minimizer isn’t (nearly always) just a point. Any explanation for that observation in practice?

This is only partially related, but I was trying to learn about hydration for exercise and this paper was the best source I could find. Downsides are it focuses on very long-duration exercise (think ultramarathons). If I had to summarize it, I would say it’s very nonchalant about salt and water consumption with its main recommendations being to just avoid over-consumption of water (particularly avoid consumption such that body weight increases during the exercise). They do note that individuals who find visible salt deposits on their clothes may need to increase salt consumption due to having high sweat rates and high salt-in-sweat concentrations. But otherwise the recommendations about whether or how to consume salt left me confused so it’s a pretty imperfect source.

• It looks like Shapley values satisfy an equilibrium property that should take into account more than just pairwise bargaining. Specifically, there is no subset of participants that can gain more than the Shapley values by excluding the others (assuming that v satisfies [superadditivity](https://​​en.wikipedia.org/​​wiki/​​Shapley_value#Stand-alone_test), i.e. a group is always at least as valuable as it’s subsets individually added together). We can prove this:

First, by induction see that for any S. And by superadditivity, for all S. Then we can do:

So then

That means that the total value produced by the subset R is going to be less than (or equal to) the total of the Shapley values they obtain from participating in the whole group. Therefore, they can’t possibly all profit by excluding anyone since there’s not enough profit to go around. Presumably this is well known and has a name. It’s basically a direct extension of the ‘stand-alone test’ that Wikipedia lists, so maybe it’s the ‘stand-together test’?

So that makes me think Shapley values are what you might get after multi-party bargaining arrives at equilibrium. This a pretty amazing topic, and great selections of examples to explore!

• I think the unemployed worker example is the following: If I’m an unemployed worker, I tell the owner I’ll work for $X-1 dollars rather than the current worker who is working for$X dollars. The owner accepts, of course. So to keep that from happening, the worker offers me \$Y to not work and let him keep the job. I go around to all the other workers and makes the same argument, getting Y from each. At the end of the day—viola I’m making 10Y dollars to not work and all the workers are making X—Y dollars and I’ll refuse any Y up until 10Y = X—Y and we make the same amount (otherwise I’ll undercut them).

That’s not enough to replicate the result since it doesn’t prescribe the value the owner gets.

Presumably in real life we can somehow include the negative term for the downsides of employment and the unemployed will accept a lower pay? Also, it’s interesting to think of unemployment benefits as “paying people to not compete with me in the job market” but I guess that kind of makes sense.

• Any progress or interest in finding limited uses of AI that would be safe? Like the “tool AI” idea but designed to be robust. Maybe this is a distraction, but it seems basically possible. For example, a proof-finding AI that, given a math statement, can only output a proof to a separate proof-checking computer that validates it and prints either True/​False/​Unknown as the only output to human eyes. Here “Unknown” could indicate that the AI gave a bogus proof, failed to give any proof of either True or False, or the proof checker ran out of time/​memory checking it. The proof is destroyed after its checked. Is that safe?

I can imagine some tiny scenarios where it’s not. The AI could try to communicate with humans even with this limited output band: e.g. disprove the Riemann hypothesis but then if we try to find non-trivial zeros by binary search, it can intentionally output “Unknown”s to guide us to certain non-trivial zeros over others and the digits of these could encode something?? Or the AI could infer something about our physics based off the questions we ask or random bit-flips it notices in its machine tell it about cosmic rays/​machine architecture and it comes up with some dastardly way to hack other machines even though its on a dedicated, airgapped machine. These aren’t entirely unimaginable but I’d also think a writer was being lazy if they used this as a plotline as they might as well just have said “magic”.

• What does the Fermi paradox tell us about AI future, if anything? I have a hard time simultaneously believing both “we will accidentally tile the universe with paperclips” and “the universe is not yet tiled with paperclips”. Is the answer just that this is just saying that the Great Filter is already past?

And what about the anthropic principle? Am I supposed to believe that the universe went like 13 billion years without much in the way of intelligent life, then for a brief few millennia there’s human civilization with me in it, and then the next N billion years it’s just paperclips?

• The key question is your second paragraph, which I still don’t really buy. Taking an action like “attempt to break out of the box” is penalized if it is done during training (and doesn’t work), so the very optimization process will be to find systems that do not do that. It might know that it could, but why would it? Doing so in no way helps it, in the same way that outputting anything unrelated to the prompt would be selected against.

# [Question] Is keep­ing AI “in the box” dur­ing train­ing enough?

• The full answer is: they cannot return if there are only finitely many balls, but they can if there are infinitely many.

Let’s first assume that there are finitely many balls. As Thomas pointed out, we can assume that the center of mass is fixed. Let’s consider R defined to be the distance from the center of mass to the furthest ball and call that furthest ball B (which ball that is might change over time). R might be decreasing at the start—we might start with B going towards the center of mass. But if R decreased forever then we would know that they never return to their starting location (since R would be different)! So at some point it must become at least as large as it was at the start. At that point either the derivative of R is 0 or it is positive. In either case, R must increase forever onwards—which again shows it can’t return to its original starting point. Why is it always increasing from that point onwards? Well, the only way for the ball B to turn around and start heading back towards the center is if there is another ball further away than it to collide with it. But that can’t be, since B is the furthest out ball! (Edit: I see now that this is essentially equivalent to cousin_it’s argument.)

For infinitely many balls, you can construct a situation where they return to their original position! We’re going to put a bunch of balls on a line (you don’t even need the whole plane). In the interval [0,1], there’ll be two balls with initial velocity heading in towards each other at unit speed, with one ball at the left edge of the interval and one ball at the right. Then do the same thing for each interval [k,k+1]. When you let them go, each pair in each interval will collide and then be heading outwards with unit interval. Then they’ll collide at the boundary with the next interval with the ball from the next interval. That sets them back at the starting position. I.e. all balls collide first with their neighbor on one side, then their neighbor on the other side, setting them back to their starting position.

• No problem!

• A cursory glance through Fivethirtyeight’s collected poll data shows a survey with over 84,000 voters (CCES/​YouGov) giving Clinton a +4 percentage point lead, with 538 adjusting that to +2. Google and SurveyMonkey routinely had surveys of 20,000+ individuals, with one SurveyMonkey one having 70,000 with Clinton +5 (+4 adjusted). There was no clear reason to prefer your poll (whichever that one was) over these. https://​​projects.fivethirtyeight.com/​​2016-election-forecast/​​national-polls/​​

And it should go without saying that Clinton did end up at +2 nationally.

I don’t like any of the proposed solutions to that when I glanced through the SEP article on it. They’re all insightful but are sidestepping the hypothetical. Here’s my take:

Compute the expected utility not of a choice BET/​NO_BET but of a decision rule that tells you whether to bet. In this case, the OP proposed the rule “Always BET” which has expected utility of 0 and is bested by the rule “BET only once” which is in turn bested by the rule “BET twice if possible” and so on. The ‘paradox’ then is that there is a sequence of rules whose expected earnings are diverging to infinity. But then this is similar to the puzzle “Name a number; you get that much wealth.” Which number do you name?

(Actually I think the proposed rule is not “Always BET” but “Always make the choice for which maximizes expected utility conditional to choosing NO_BET on the next choice”. The fact that this strategy is flawed seems reasonable: you’re computing the expectation assuming you choose NO_BET next but don’t actually choose NO_BET next. Don’t count your eggs before they hatch.)

Vaporising a comet takes significant energy. Heating up a comet to vaporization point takes significant energy. Dissipating the vaporized comet (still the same total mass and momentum as when it was in a solid state) takes significant energy. I really find this simplistic a treatment to be not useful. Still an interesting thought-experiment and a little scary.

I’m curious why you that I’m not part of your target audience – feel free to elaborate.

I’m not sure we understand each other here, but I’m assuming you want to know why I do not consider myself part of your target audience. I don’t have a concrete answer here, it’s just that I read this and thought it didn’t apply to me. I had some of the same difficulties as you, but not in a way that I feel your advice would have applied or still does apply. I can think of a friend for whom some of your advice would probably apply, though, and imagine you are targeting him and not me.

I might be oblivious, but I don’t see where I called myself smarter than the typical LW reader

Another example would be something like describing yourself as “The guy who has deep insights but who doesn’t get anything done, because he he’s socially dysfunctional so nobody listens to him”. This is a pretty big humble brag. If I wanted to say that to a typical person I might have said, “The guy who doesn’t get anything done, no matter what insight he has, because he’s socially dysfunctional so nobody listens to him.” It’s more cautious and definitely doesn’t claim “deep insight” which is a phrase I’d reserve for describing someone else. You leave it up to the reader exactly how insightful you are implying yourself to be. It also changes the focus to your difficulty rather than the strength (which is demoted to an aside). I’m no writer though, so take this specific suggestion with a grain of salt.

Similarly for claims about deep insights from machine learning. Make the focus the difficulty you faced, not the deep insight you had. Maybe say, “I struggled even more after picking up machine learning jargon and modes of thought which I couldn’t well articulate, even to my close friends.”

Others have pointed out that you’re also very humble throughout. I agree with them, too, and admire your ability to spell out your own failings. But people read “humble brag” mixed statements as primarily bragging. To you, it might seem really really significant that you were struggling, but that’s not the focus people will read.

For the doctor analogy, I agree that that’s what you’re trying to say and I think you partly succeeded at that on one level. But on the other level, people will be turned off when you express expertise in areas where you do not have an obvious qualification. A doctor has a diploma to point to, and people are okay with that. A self-proclaimed student of medicine who had spent 15 years learning privately would be treated quite differently form the doctor. It’s not a fair world! Had you been more specific I also might not have taken it like that, instead it seemed to be a blanket statement, like how an adult might say that all conversation with a child is tedious since the child just hasn’t had any exposure to interesting ideas. Regardless of how factually true that is, the child could feel slighted.

This is all my take. Lumifer’s response seems reasonable, too.

What would I have eliminated to make it shorter? It’s a matter of taste, I suppose. I might have removed most of the part about how you grew up. I felt it could be summarized in a few sentences. But looking over this a second time, I think I may have clumped a lot of the things I thought came off as “arrogant” under the tag of “needs to be removed” and then interpreted that to mean that the article was too long. I’m sure it could be tightened up, but other than that growing up section there doesn’t seem to be anything major. So take that complaint of mine lightly.

Ugh, I need to take my own advice and not write so much. Easier said than done.

• For example, if a student tells me that I’m the worst teacher he or she has ever had, it makes me feel bad because I feel like I’m not contributing value, but I’m not at all upset with the student: my attitude is that the student is conveying valuable information to me, and that I should be appreciative.

I’m tempted to take that as a Crocker’s rule invocation. But I have realized that you wrote this for people-like-you, that is, after all, pretty much its explicit purpose. As such, I’m not sure I have an criticism that I can’t definitively think is helpful.

Nonetheless, I want to point out two general things about this will make this hard post to read for most people. First is the length, and even in this you note that you spend too much time explaining something that you’ve worked on. I think the length was partially unnecessary and not just a reflection of me not being your target audience (I assume). The second is that you come across as exceedingly arrogant. I think you are attempting to explain your background so that we understand the situation. But you explicitly call yourself smarter than the typical reader of the site that you are posting this on. Ouch! But again, perhaps this is just a reflection of you having a very narrow target audience and for them this could read like a “ah, finally someone gets it!”

I hope that you take this to be useful, particularly for when you write for a wider audience. For what its worth, my mental post it note has you labelled as a user that I should pay attention to. I say that since I kind of suspect that you already know everything I just mentioned and aren’t bad at overcoming these in other situations, but thought this worth saying explicitly given the context of trying to improve.

• We’ve gotten much less senses of doom with this encounter with Quirrell than any in the past as far as I can tell. Only Sprout’s magic caused apprehension, Q not at all.

Was Quirrell in control of the sense of doom all along? Has something changed?