Mark Xu

Karma: 3,698

I do alignment research at the Alignment Research Center. Learn more about me at markxu.com/about

Mark Xu 16 Jan 2020 19:08 UTC
7 points
on: Risks from Learned Optimization: Introduction
I’m confused why the inner alignment problem is conceptually different from the outer alignment problem. From a general perspective, we can think of the task of building any AI system as humans trying to optimizer their values by searching over some solution space. In this scenario, the programmer becomes the base optimizer and the AI system becomes the mesa optimizer. The outer alignment problem thus seems like a particular manifestation of the inner alignment problem where the base optimizer is a human.
In particular, if there exists a robust solution to the outer alignment problem, then presumably there’s some property $P$ that we want the AI system to have and that we have some process $P_{m e t a}$ that convinces us that the AI system has property $P$ . I don’t see why we can’t just give the AI system the ability to enact $P_{m e t a}$ to ensure that any optimizer’s that it creates have property $P$ (modulo the problem of ensuring that the system has $P_{m e t a}$ with $P_{m e t a_{m e t a}}$ , ensuring that with $P_{m e t a_{m e t a_{m e t a}}}$ , etc.). I guess you can have a solution to the outer alignment problem by having $P, P_{m e t a}$ and not have the recursive tower needed to solve the inner alignment problem, but that seems like not the issues that were being brought up. (something something Lobian Obstacle)
In particular,
We will call the problem of eliminating the base-mesa objective gap the inner alignment problem, which we will contrast with the outer alignment problem of eliminating the gap between the base objective and the intended goal of the programmers. This terminology is motivated by the fact that the inner alignment problem is an alignment problem entirely internal to the machine learning system, whereas the outer alignment problem is an alignment problem between the system and the humans outside of it (specifically between the base objective and the programmer’s intentions). In the context of machine learning, outer alignment refers to aligning the specified loss function with the intended goal, whereas inner alignment refers to aligning the mesa-objective of a mesa-optimizer with the specified loss function.
My view says that if $M$ is the machine learning system and $P$ are the programmers, we can view $P$ as the “machine learning system” and $M$ as a mesa-optimizer. The task of aligning the the mesa-objective with the specific loss seems the same type of problem as aligning the loss function of $M$ with the programmers values.
Maybe the important thing is that loss functions are functions and values are not, so the point is that even if we have a function that represents our values, things can still go wrong. That is, before people thought that the problem was that finding a function that does what we want when it gets optimized was the main problem, but mesa-optimizer pseudo-alignment shows that even if we have that then we can’t just optimize the function.
An implication is that all the reasons why mesa-optimizers can cause problems are reasons why strategies for trying to turn human values into a function can go wrong too. For example, value learning strategies seem vulnerable to the same pseudo-alignment problems. Admittedly, I do not have a good understanding of current approaches to value learning, so I am not sure if this is a real concern. (Assuming that the authors of this post are adequate, if such a similar concern existed in value learning, I think they would have mentioned it. This suggests that either I am wrong about this being a problem or that no one has given it serious thought. My priors are on the former, but I want to know why I’m wrong.)
I suspect that I’ve failed to understand something fundamental because it seems like a lot of people that know a lot of stuff think this is really important. In general, I think this paper has been well written and extremely accessible to someone like me who has only recently started reading about AI safety.

Mark Xu 21 Jan 2020 4:40 UTC
6 points
on: Definitions of Causal Abstraction: Reviewing Beckers & Halpern
I don’t know if you’ve seen this, but https://arxiv.org/abs/1906.11583 is a follow-up that generalizes the Beckers and Halpern paper to a notion of approximate abstraction by measuring the non-commutativity of the diagram by using some distance function and taking expectations. I think the most useful notion that the paper introduces is the idea of a probability distribution over the set of allowed interventions. Intuitively, you don’t need your abstraction of temperature to behave nicely w.r.t freezing half the room and burning the other half such that the average kinetic energy balances out. Thus you can determine the “approximate commutativeness” of the diagram by fixing a high-level intervention and taking an expectation over the low-level interventions that were likely to map to that high-level intervention.
Also, if you are willing to write up your counter example to the conjecture that Beckers and Halpern make, I am currently researching under Eberhardt and he (and I) would be extremely interested in seeing it. I also initially thought that the conjecture was obviously false, but when I tried to actually construct counter examples, all of them ended up as either not strong abstractions or not recursive (acyclic) causal models.

Mark Xu 17 Feb 2020 21:18 UTC
24 points
on: How to Lurk Less (and benefit others while benefiting yourself)
There is a phenomenon among students of mathematics where things go from being “difficult” to “trivial” as soon as concepts are grasped. The main reason why I don’t comment many of my thoughts is that I think that since I can think them, they must not be very hard to think, so commenting them is kind of useless. I think me thinking that my thoughts aren’t very novel/insightful/good explains nearly all of the times I don’t comment—if I have a thought I think is non-trivial to think or I have access to information that I think most people do not have access to, I will likely comment it (this happens extremely rarely).

However, I agree that people should say more obvious things on the margin.

(I also think that, on the margin, people should compliment other people more. I liked this post and think it is an important problem to try and solve.).

Mark Xu 17 Feb 2020 22:16 UTC
3 points
in reply to: lifelonglearner’s comment on: Training Regime Day 3: Tips and Tricks
Thanks! I mentioned on day 0 that this was one of my training regimes for rationality, but it’s also training writing and posting thinking in public places, among other things. I’m glad that people are able to get something out of it.

Mark Xu 19 Feb 2020 18:14 UTC
3 points
in reply to: adamShimi’s comment on: Training Regime Day 2: Searching for bugs
In my view, there is no “right level” for bugs. Some bugs are simpler and thus more suited to practicing, but the goal is to get to the point where you can solve even your largest bugs. I’ll provide more prompts for finding larger bugs later on in the sequence.

Thanks for participating!

Mark Xu 21 Feb 2020 6:13 UTC
4 points
in reply to: Pattern’s comment on: Training Regime Day 6: Seeking Sense
I’m not sure I understand what you’re saying. I’m not advising people to drop their items in an attempt to discover new uses for them, I’m just describing a time when I accidentally dropped something and discovered that I was using it wrong.

I think part of what I’m saying is that your priors for things working properly should be higher and I should have been more surprised at the difficulty of refilling the pepper grinder. This should have prompted me to search harder for a way to use it more effectively.

Mark Xu 23 Feb 2020 4:27 UTC
3 points
in reply to: Pattern’s comment on: Training Regime Day 7: Goal Factoring
I agree that that comment didn’t really add that much. I was just trying to caution against the view that goal factoring was a technique for convincing yourself to take/not take certain actions. I’m not sure whether I should have spent more time discussing that though, because I’m not sure how common such a failure mode is.

Thanks for the style pointer!

Mark Xu 23 Feb 2020 4:28 UTC
2 points
in reply to: Pattern’s comment on: Training Regime Day 8: Noticing
Yes. Fixed.

Mark Xu 26 Feb 2020 6:57 UTC
3 points
in reply to: NANO’s comment on: Training Regime Day 11: Socratic Ducking
You’re welcome! Glad you’re finding them helpful. Any insights you want to share?

Mark Xu 29 Feb 2020 17:32 UTC
5 points
on: Coronavirus: Justified Practical Advice Thread
Related question: at what point should university students begin partial-quarantine measures such as not going to large classes and not going to cafeterias?

I’m concerned about the cafeteria things especially, because it seems like a pretty potent vector for transmission and also relatively avoidable by just buying/preparing your own food. At the minimum, I suspect that doing things like bringing your own utensils/plates that you’ve disinfected to be a reasonable precaution, although all the food is sort of sitting out so I’m not sure if that’s even a thing that will help. I live reasonably close to my university, so I could also just go home at some point.

For reference, it would roughly cost ~$5-20 a day to avoid the cafeteria, depending on type of food created/purchased.

Mark Xu 8 Mar 2020 10:19 UTC
1 point
on: How’s the case for wearing googles for COVID-19 protection when in public transportation?
FYI, regular surgical masks are insufficient for protection against COVID-19. A respirator graded n95 or higher is required. Not sure what you mean by “surgical mask”, but want to make sure they’re not regular surgical masks.

Source: https://www.livescience.com/face-mask-new-coronavirus.html

Mark Xu 10 Mar 2020 8:11 UTC
3 points
on: Mark Xu’s Shortform
Coinfection rates of COVID and normal flu are very low. If you have the set of flu/COVID symptoms, you’re basically guaranteed to have one or the other. You can test for the flu pretty easily. Therefore, people can just test for the flu as a proxy for testing for COVID.

Is this just a really obvious chain of reasoning that everyone has missed? Which one of my assumptions is wrong?

https://twitter.com/katyw2004/status/1236848300143280128 says coinfection rates are low

https://www.wikiwand.com/en/Rapid_influenza_diagnostic_test means we can test for the flu fast

Thus it’s either the case that if you have the set of flu/COVID symptoms, you’re basically guaranteed to have either flu or COVID.

Maybe the tests are only useful for people who don’t have symptoms, but if that’s not the case, then the flu test provides a lot of evidence as to whether or not someone has COVID (even if “basically guaranteed” is replaced with “probable”).

Mark Xu 10 Mar 2020 8:50 UTC
3 points
in reply to: Mark Xu’s comment on: Mark Xu’s Shortform
update the CDC advises testing for the flu and there’s a lot of medical things that cause “flu-like” symptoms. Turns out that “flu-like” symptoms is basically “immune system doing things”, which is going to happen with most things your body doesn’t like.

Mark Xu 10 Mar 2020 18:32 UTC
3 points
on: Growth rate of COVID-19 outbreaks
I’m skeptical of the 2-3 day doubling time claim.

This data indicates that the doubling time has been ~4 days for the past week or so. I suspect that the doubling time was faster in the beginning because testing was being rapidly ramped up and that the actual doubling time is closer to 4-5 days.

Mark Xu 10 Mar 2020 22:59 UTC
4 points
in reply to: Bucky’s comment on: Growth rate of COVID-19 outbreaks
Hadn’t realized that the last week was dominated by those countries, although it seems obvious in retrospect.

I think that testing being ramped up in different rates at different countries is a little incompatible with all countries exhibiting relatively the same doubling time. Countries that ramped up testing quicker should see faster doubling times, so the observed doubling time should be tied to the speed at which countries ramped up testing.

My (rough) model was that all countries basically ramped up testing at the same-ish rate once they had a non-trivial number of infections in their country, so they had fast doubling times in the beginning but slow doubling times once they had caught the majority of people infected (i.e. the infection had been spreading for many days before they realized and had to spend the first week or so of testing just trying to catch the people that were already infected). This doesn’t quite make that much sense though, because it’s obvious that not all countries are ramping up testing at the same speed. But that confuses me, because that means that the doubling time should be different for all the countries?

Thanks for this analysis. I have moderately updated towards a lower doubling time.

Mark Xu 12 Mar 2020 8:20 UTC
2 points
in reply to: Eli Tyre’s comment on: Coronavirus Open Thread
Based on googling “hospital occupancy rates”, about 66% of beds are already in use on any given day. Doctors I’ve talked to have said that extremely busy days result in near or over 100% capacity.

I expect that there is going to be gradual overload as COVID spreads through various communities, e.g. we’re starting to see Washington hospitals starting to be overloaded

A rough estimate: there are ~333k empty hospital beds, a doubling time of 4 days, 300 new cases today, 0.2 percent of patients hospitalized and 14 days per hospitalization. Thus we want to solve for k such that $\sum_{n = k}^{k + 14} 0.2 * 300 * 2^{(k / 4)} > 333, 000$ , giving k > 34, so hospitals will be overloaded in 34 days. This estimate assumes that patients are distributed uniformly throughout all hospitals, so it’s more of an upper bound given unchecked exponential growth.

Edit: Rob Wiblin provides an estimate (on FB) of 15k new cases in the US every day, giving k > 11.5. I haven’t thought much about 15k new cases, but it seems far more correct than 300.

Mark Xu 17 Mar 2020 3:42 UTC
10 points
on: Refactoring EMH – Thoughts following the latest market crash
I’ve been thinking about the EMH a lot, since it seemed to me that even after the initial COVID crash the market still was overvaluing everything. This turned out to be correct and now I have a lot more money than I did before (although far less because it takes 4 days to be approved for option trading).

My confusion was about half resolved by the following thought: since actors in the market have finite capital, it can be consistent that the value of an asset is X, everyone knows that it will be Y > X in the far-ish future, but no one buys X until it reaches value Y now because there are other things that you can do that will make you more money, i.e. exploit market volatility, short-term puts/calls, etc. In terms of COVID, everyone can know that a recession is coming, but think that you can make more money through exploiting short term volatility, e.g. panic buying/selling.

What I’m still a little confused about is the fact that recent market movement seems incredibly obvious to me (so obvious that I put literally all my assets into shorting positions), but hedge funds are going out of business and Bridgewater’s Pure Alpha lost money. Dalio even thought about COVID significantly before the crash.

I only thought about COVID for like 5 days become becoming extremely confident that the market was going to drop. I wasn’t an expert on pandemics. I don’t think I’m a nimble generalist. I didn’t even make a guesstimate model. I just looked at COVID, read LW comments and read the news. Hedge funds can read both the news and LW, so they should know all the things that I know. They also have more time to think and more incentive to think correctly, so they should be efficient relative to me. But they were not. I notice I am confused.

What do people at hedge funds even do all day? You would at least have a few people working full-time to think about COVID, right? And they can’t all get it wrong? Is it really that important to be nimble?

And it didn’t just drop once more to correct, it dropped like 3 more times? Something must have gone horribly wrong, but all of my explanations seem way too forced.

Mark Xu 20 Apr 2020 3:56 UTC
1 point
in reply to: Wei Dai’s comment on: Refactoring EMH – Thoughts following the latest market crash
I misspoke. It wasn’t actually getting approved that cost time, it was money I transferred taking time to be approved for options trading. I was led to believe that 4 days was standard and relatively unavoidable.

Mark Xu 22 Apr 2020 18:31 UTC
1 point
in reply to: Pattern’s comment on: Training Regime Day 18: Negative Visualization
Not really anything that I can think of. I will rephrase to make myself more clear.

Mark Xu 22 Apr 2020 23:05 UTC
8 points
on: Mark Xu’s Shortform
John Flanagan: “An ordinary archer practices until he gets it right. A ranger practices until he never gets it wrong.”
I want to reword this in to make it about rationality in a way that isn’t pretentious.
Cavilo, The Vor Game: “The key to strategy… is not to choose a path to victory, but to choose so that all paths lead to a victory.” is close to what I want, but not quite.