Thanks; fixed & will try to remember.

# Rafael Harth

If we use some variant of UDT, the same line of reasoning is experienced by many other minds and we should reason as if we have causal power over all these minds.

As I understand UDT, this isn’t right. UDT 1.1 chooses an input-output mapping that maximizes expected utility. Even assuming that all people who read LW run UDT 1.1, this choice still only determines the input-output behavior of a couple of programs (humans). The outputs of programs that don’t depend on our outputs because those programs aren’t running UDT are held constant. Therefore, if you formalized this problem, UDT’s output could be “stockpile food” even if [every human doing that] would lead to a disaster.

I think “pretend as if everyone runs UDT” was neither intentioned by Wei Dei nor is it a good idea.

Differently put, UDT agents don’t cooperate in a one-shot prisoner’s dilemma if they play vs. CDT agents.Also: if a couple of people stockpile food, but most people don’t, that seems like a preferable outcome to everyone doing nothing (provided stockpiling food is worth doing). It means some get to prepare, and the food market isn’t significantly affected. So this particular situation actually doesn’t seem to be isomorphic to the prisoner’s dilemma (if modeled via game theory).

The advice of this post seems to be advice on the margin (i.e., assuming everything else is held constant), which seems reasonable given that this one post won’t change collective behavior by much.

So the question isn’t “what happens if everyone stockpiles food?” but rather, “do we expect enough people to stockpile food that stockpiling more food will lead to bad consequences?”. I don’t know the answer to that one.

# UML XII: Dimensionality Reduction

That sounds interesting. Can you share an example other than decision trees?

# UML XI: Nearest Neighbor Schemes

I’m not sure EY meant to imply that the response is factually correct. Smarter-than-expected could just mean “not a totally vapid applause light.” A wrong but genuine response could meet that standard.

It’s supposed to be inf (the infimum). Which is the same as the minimum whenever the minimum exists, but sometimes it doesn’t exist.

Suppose is , i.e. and the point is 3. Then the set doesn’t have a smallest element. Something like is pretty close but you can always find a pair that’s even closer. So the distance is defined as the largest lower-bound on the set , which is the infimum, in this case 2.

Okay – since I don’t actually know what is used in practice, I just added a bit paraphrasing your correction (which is consistent with a quick google search), but not selling it as my own idea. Stuff like this is the downside of someone who is just learning the material writing about it.

What’s the “ℓ”? (I’m unclear on how one iterates from L to 2.)

is the number of layers. So if it’s 5 layers, then . It’s one fewer transformation than the number of layers because there is only one between each pair of layers.

Absolute value, because bigger errors are quadratically worse, it was tried and it worked better, or tradition?

I genuinely don’t know. I’ve wondered forever why squaring is so popular. It’s not just in ML, but everywhere.

My best guess is that it’s in some fundamental sense more natural. Suppose you want to guess a location on a map. In that case, the obvious error would be the straight-line distance between you and the target. If your guess is and the correct location is , then the distance is – that’s just how distances are computed in 2-dimensional space. (Draw a triangle between both points and use the Pythagorean theorem.) Now there’s a square root, but actually the square root doesn’t matter for the purposes of minimization – the square root is minimal if and only if the thing under the root is minimal, so you might as well minimize . The same is true in 3-dimensional space or -dimensional space. So if general distance in abstract vector spaces works like the straight-line distance does in geometric space, then squared error is the way to go.

Also, thanks :)

# A Simple Introduction to Neural Networks

Good post. I would actually argue that the cost of many second activities is much lower than the cost of one block of seconds, because taking small breaks in between work isn’t zero value.

Have you been doing something that puts stuff on you hands that is no already spread everywhere you are and will touch or for some reason has caused a significantly higher concentration on your hands versus the environment?

Don’t think so.

Bite your fingernails, or stick you fingers, hands on/in you mouth a lot? Stop or be aware of what you’ve been touching since the last cleaning.

That’s not at all practical, though. Changing a habit such as biting fingernails is

*extremely*difficult, and definitely not worth it to reduce the risk of getting a virus.

To make Wei Dai’s answer more concrete, suppose something like the symmetry theory of valence is true; in that case, there’s a crisp, unambiguous formal characterization of all valence. Then add open individualism to the picture, and it suddenly becomes a lot more plausible that many civilizations converge not just towards similar ethics, but exactly identical ethics.

What’s missing for me here is a quantitative argument for why this is actually worth doing. Washing your hands more often would reduce risk, but is it actually worth the effort? (And for me there’s also the problem that my doctor literally instructed me to wash my hands less often because of a skin infection thing.)

I believe query and target category are the same here, but after reading it again, I see that I don’t fully understand the respective paragraph.

I think the query category is the pattern, as you say, and the target category is [original category + copy + edges between them]. That way, if the matching process returns a match, that match corresponds to a path-that-is-equivalent-to-the-path-in-the-query-category.

e.g. “colou*r” matches “color” or “colour” but not “pink”.

Is this correct? I’d have thought “colo*r” matches to both “color” and “colour”, but “colou*r” only to “colour”.

Next-most complicated

Least complicated?

I’m very likely to read every post you write on this topic – I’ve gotten this book a while ago, and while it’s not a priority right now, I do intend to read it, and having two different sources explaining the material from two explicitly different angles is quite nice. (I’m mentioning this to give you a an idea of what kind of audience gets value out of your post; I can’t judge whether it’s an answer to your category resource question, although it seems very good to me.)

I initially thought that the clouds were meant to depict matches and was wondering why it wasn’t what I thought it should be, before realizing that they always depict the same stuff and were meant to depict “all stuff” before we figure out what the matches are.

I don’t think there is a UDT-idea that prescribes cooperating with non-UDT agents. UDT is sufficiently formalized that we know what happens if a UDT agent plays a prisoner’s dilemma with a CDT agent and both parties know each other’s algorithm/code: they both defect.

If you want to cooperate out of altruism, I think the solution is to model the game differently. The outputs that go into the game theory model should be whatever your utility function says, not your well-being. So if you value the other person’s well-being as much as yours, then you don’t have a prisoner’s dilemma because cooperate/defect is a better outcome for you than defect/defect.

But they’re only doing that if there will, in fact, be a supply shortage. That was my initial point – it depends on how many other people will stockpile food.