Align it
sudo
[Question] Do mesa-optimization problems correlate with low-slack?
My least favorite thing
I’ll think more carefully about this later, but my instinctual response is that I don’t think this is nicely framed as a coordination problem. Basically, I don’t think you really lose that much when you “de-wheel,” in terms of actually effectiveness at EA-like goals. It’s worth noting that a major part of this article is that “wheeling” is incredibly costly (especially in opportunity).
What exactly do you think changes when everyone decides to de-wheel rather than just a few people? My read is that the biggest change is simply that de-wheeling becomes less stigmatized. There are cheaper ways to de-stigmatize de-wheeling than coordinating everyone to de-wheel at the same time.
Edit: first sentence, for clarity
Here’s a potential divergence I see. Do you believe the viability of skipping a conventionally credentialed path is more true for “EA-like” goals than for “mainstream” goals? If so, why is that, and are there some in-between goals that you could identify to make a spectrum out of?
This is slightly complicated.
If your goal is something like “become wealthy while having free time,” the Prep school->Fancy college->FIRE in finance or software path is actually pretty darn close to perfect.
If your goal is something like “make my parents proud” or “gain social validation,” you probably go down the same route too.
If your goal is something like “seek happiness” or “live a life where I am truly free” I think that the credentialing is something you probably need to get off ASAP. It confuses your reward mechanisms. There’s tons of warnings in pop culture about this.
If you have EA-like goals, you have a “maximize objective function”-type goal. It’s in the same shape as “become as rich as possible” or “make the world as horrible as possible.” Basically, the conventional path is highly highly unlikely to get you all the way there. In this case, you probably want to get into the
Get skills+resources
Use skills+resources to do impact
Repeat
Loop.
For a lot of important work, the resources required are minimal and you already have it. You only need skills. If you have skills, people will also give you resources.
It shouldn’t matter how much money you have.
Also, even if you were totally selfish, stopping apocalypse is better for you than earning extra money right now. If you believe the sort of AI arguments made on this forum, then it is probably directly irrational for you to optimize for things other than save the world.
So, do you think it’s instrumental to saving the world to focus on credentials? Perhaps it’s even a required part of this loop? (Perhaps you need credentials to get opportunities to get skills?)
I basically don’t think that is true. Even accepting colleges teach more effectively than people can learn auto-didactically, the amount of time wasted on bullshit and the amount of health wasted on stress probably makes this not true. It seems like you’d have to get very lucky for the credential mill to not be a significant skill cost.--
I guess it’s worthwhile for me to reveal some weird personal biases too.
I’m personally a STEM student at a fancy college with a fancy (non-alignment) internship lined up. I actually love and am very excited about the internship (college somewhat less so. I might just be the wrong shape for college.), because I think it’ll give me a lot of extra skills.
My satisfaction with this doesn’t negate the fact that I mostly got those things by operating in a slightly different (more wheeled) mental model. A side effect of my former self being a little more wheeled is that I’d have to mess up even more badly to get into a seriously precarious situation. It’s probably easier for me to de-wheel at the current point, already having some signalling tools, then it is for the average person to de-wheel.
I’m not quite sure what cycles you were referring to (do you have examples?), but this might be me having a bad case of “this doesn’t apply to me so I will pay 0 thought to it,” and thus inadvertently burning a massive hole in my map.
Despite this though, that I probably mostly wish I de-wheeled earlier (middle school, when I first started thinking somewhat along these lines) rather than later. I’d be better at programming, better at math, and probably more likely to positively impact the world. At the expense of being less verbally eloquent and having less future money. I can’t say honestly that I would take the trade, but certainly a very large part of me wants to.
Certainly, I’d at least argue that Bob should de-wheel. The downside is quite limited.
--
There definitely is a middle path, though. Most of the AI alignment centers pay comparable salaries to top tech companies. You can start AI alignment companies and get funding, etc. There’s an entire gradient there. I also don’t entirely see how that is relevant.
ranked-biserial’s point was largely about Charlie, who wasn’t really a focus of the essay. What they said about Charlie might very much be correct. But it’s not given that Alice and Bob secretly want this. They may very well have done something else if not given the very conservative advice.
I’ll reply to ranked-biserial later.
Edit: typo
My Solution
--
Optimize skill-building over resource-collection. You don’t need that many resources.
Ask:
What is the skill I’m most interested in building right now?
What’s the best way to get this skill?
A couple of things:
Open source libraries are free tutoring
Most alignment-relevant skills can be effectively self-taught
Projects are learning opportunities that demand mastery
Tutoring is cheaper and possibly more effective than college tuition
If you think that we have a fair shot at stopping AI apocalypse, and that AGI is a short-term risk, then it is absolutely rational to optimize for solving AI safety. This is true even if you are entirely selfish.
Also, this essay is about advice given to ambitious people. It’s not about individual people choosing unambitious paths (wheeling). Charlie is a sad example of what can happen to you. I’m not complaining about him.
It’s advice that you generally see from LessWrongers and rationality-adjacent people who are not actively working on technical alignment.
I don’t know if that’s true, but it might be. That does not change the fact that there is a lot of “stay realistic”-type advise that you get from people in these circles. I’d wager this type of advice does not generally come from a more lucid view of reality, but rather from (irrationally high) risk aversion.
If I’d summarize this in one sentence: we need to be much more risk-tolerant and signalling-averse if we want a chance at solving the most important problems.
There is no universal advice that I can give.
The problem is that people are assuming that wheeling is correct without checking that it is.
I’m not proposing developing an allergic reaction to colleges or something.
Cool!
Action, but also advice to do the action. I think you’re somewhat right.
I need to think about this.
I think the fundamental misunderstanding here is that you are attributing a much smaller success probability to my ideas than me.
It is very likely that becoming highly skilled at AI outside of college will make you both useful (to saving the world) and non-homeless. You will probably make more progress toward your goal than if you stayed “tracked” (for almost any goal, assuming short AI timelines).
Bob isn’t likely to waste all his day on video-games? I wonder why you think that? I mean, conditional on him being addicted, perhaps. But surely if that was his problem, then he should focus on solving it.
Do you somehow attribute lower capabilities to people than I do? I certainly think that Bob can figure out a way to learn more effectively than his college can provide. He can prove this to himself if he has doubts.
None of this is the point. Many people have much more ability to take risk than we assume. Giving people overly risk-averse advice, and assuming the bad case scenario as highly likely as you are doing right now, seems very hurtful.
I fundamentally think that this EA idea that donation is just as effective as doing something grossly overestimates how liquid and fungible labor is.
I’m getting some new comments in the tone of, “it would be risky and irrational for Bob and Alice to not at least somewhat stay on the wheel.”
This is better addressed in a new article, which I’ll write soon.
There’s something really tyrannical about externally imposed KPIs.
I can’t stop thinking about my GPA even if I make a conscious choice to stop optimizing for it.
Choosing to not optimize for it actually made it worse. A lower number is louder in my mind.
There’s something about a number being used for sorting that completely short circuits my brain, and makes me agonize over it.
Enlightened:
Terminal goal → Instrumental goal → Planning → Execution
Buffoonery:
Terminal goal → Instrumental goal → Planning → wait what did [insert famous person] do? Guess I need to get a PhD.
How much should we worry about mesa-optimization challenges?
I’m not sure how you arrived at the conclusion that “the vast majority of possible utility functions have maxima where the universe is full of undifferentiated clouds of hydrogen and radiation.”
But more fundamentally, yes I think that we should start by expecting L’ to be sampled from the pool of possible objective functions. If this leads to absurd conclusions for you, that might be because you have assumptions about L’ which you haven’t made explicit.
Could you give examples of what absurd conclusions this leads to?
Even “L’ will aim for a universe full of undifferentiated clouds of hydrogen and radiation.” is not an absurd conclusion to me. Do we just disagree on what counts as absurd?
Signalling Considered Harmful.
I want to write an essay about how we so dramatically overvalue signalling that it might be good to completely taboo it for oneself.