Wei Dai(Wei Dai)
I’ve arguably lived under totalitarianism (depending on how you define it), and my parents definitely have and told me many stories about it. I think AGI increases risk of totalitarianism, and support a pause in part to have more time to figure out how to make the AI transition go well in that regard.
Even if someone made a discovery decades earlier than it otherwise would have been, the long term consequences of that may be small or unpredictable. If your goal is to “achieve high counterfactual impact in your own research” (presumably predictably positive ones) you could potentially do that in certain fields (e.g., AI safety) even if you only counterfactually advance the science by a few months or years. I’m a bit confused why you’re asking people to think in the direction outlined in the OP.
Some of my considerations for college choice for my kid, that I suspect others may also want to think more about or discuss:
status/signaling benefits for the parents (This is probably a major consideration for many parents to push their kids into elite schools. How much do you endorse it?)
sex ratio at the school and its effect on the local “dating culture”
political/ideological indoctrination by professors/peers
workload (having more/less time/energy to pursue one’s own interests)
I added this to my comment just before I saw your reply: Maybe it changes moment by moment as we consider different decisions, or something like that? But what about when we’re just contemplating a philosophical problem and not trying to make any specific decisions?
I mostly offer this in the spirit of “here’s the only way I can see to reconcile subjective anticipation with UDT at all”, not “here’s something which makes any sense mechanistically or which I can justify on intuitive grounds”.
Ah I see. I think this is incomplete even for that purpose, because “subjective anticipation” to me also includes “I currently see X, what should I expect to see in the future?” and not just “What should I expect to see, unconditionally?” (See the link earlier about UDASSA not dealing with subjective anticipation.)
ETA: Currently I’m basically thinking: use UDT for making decisions, use UDASSA for unconditional subjective anticipation, am confused about conditional subjective anticipation as well as how UDT and UDASSA are disconnected from each other (i.e., the subjective anticipation from UDASSA not feeding into decision making). Would love to improve upon this, but your idea currently feels worse than this...
As you would expect, I strongly favor (1) over (2) over (3), with (3) being far, far worse for ‘eating your whole childhood’ reasons.
Is this actually true? China has (1) (affirmative action via “Express and objective (i.e., points and quotas)”) for its minorities and different regions and FWICT the college admissions “eating your whole childhood” problem over there is way worse. Of course that could be despite (1) not because of it, but does make me question whether (3) (“Implied and subjective (‘we look at the whole person’).”) is actually far worse than (1) for this.
Intuitively this feels super weird and unjustified, but it does make the “prediction” that we’d find ourselves in a place with high marginal utility of money, as we currently do.
This is particularly weird because your indexical probability then depends on what kind of bet you’re offered. In other words, our marginal utility of money differs from our marginal utility of other things, and which one do you use to set your indexical probability? So this seems like a non-starter to me… (ETA: Maybe it changes moment by moment as we consider different decisions, or something like that? But what about when we’re just contemplating a philosophical problem and not trying to make any specific decisions?)
By “acausal games” do you mean a generalization of acausal trade?
Yes, didn’t want to just say “acausal trade” in case threats/war is also a big thing.
This was all kinda rambly but I think I can summarize it as “Isn’t it weird that ADT tells us that we should act as if we’ll end up in unusually important places, and also we do seem to be in an incredibly unusually important place in the universe? I don’t have a story for why these things are related but it does seem like a suspicious coincidence.”
I’m not sure this is a valid interpretation of ADT. Can you say more about why you interpret ADT this way, maybe with an example? My own interpretation of how UDT deals with anthropics (and I’m assuming ADT is similar) is “Don’t think about indexical probabilities or subjective anticipation. Just think about measures of things you (considered as an algorithm with certain inputs) have influence over.”
This seems to “work” but anthropics still feels mysterious, i.e., we want an explanation of “why are we who we are / where we’re at” and it’s unsatisfying to “just don’t think about it”. UDASSA does give an explanation of that (but is also unsatisfying because it doesn’t deal with anticipations, and also is disconnected from decision theory).
I would say that under UDASSA, it’s perhaps not super surprising to be when/where we are, because this seems likely to be a highly simulated time/scenario for a number of reasons (curiosity about ancestors, acausal games, getting philosophical ideas from other civilizations).
It occurs to me that many alternatives you mention are also superstimuli:
Reading a book
Pretty unlikely or rare to encounter stories or ideas with this much information content or entertainment value in the ancestral environment.
Some people do get addicted to books, e.g., romance novels.
Extroversion / talking to attractive people
We have access to more people, including more attractive people, but talking to anyone is less likely to lead to anything consequential because of birth control and because they also have way more choices.
Sex addiction. People who party all the time.
Creativity
We have the time and opportunity to do a lot more things that feel “creative” or “meaningful” to us, but these activities have less real-world significance than such feelings might suggest because other people have way more creative products/personalities to choose from.
Struggling artists/entertainers who refuse to give up their passions. Obscure hobbies.
Not sure if there are exceptions or not, but it seems like everything we could do for fun these days is some kind of supernormal stimulus, or the “fun” isn’t much related to the original evolutionary purpose anymore. This includes e.g. forum participation. So far I haven’t tried to make great efforts to quit anything, and instead have just eventually gotten bored of certain things I used to be “addicted” to (e.g., CRPGs, micro-optimizing crypto code). (This is not meant to be advice for other people. Also the overall issue of superstimuli/addiction is perhaps more worrying to me than this comment might suggest.)
Does anyone know why security amplification and meta-execution are rarely talked about these days? I did a search on LW and found just 1 passing reference to either phrase in the last 3 years. Is the problem not considered an important problem anymore? The problem is too hard and no one has new ideas? There are too many alignment problems/approaches to work on and not enough researchers?
If you think there’s something mysterious or unknown about what happens when you make two copies of yourself
Eliezer talked about some puzzles related to copying and anticipation in The Anthropic Trilemma that still seem quite mysterious to me. See also my comment on that post.
I think the way morality seems to work in humans is that we have a set of potential moral values, determined by our genes, that culture can then emphasize or de-emphasize. Altruism seems to be one of these potential values, that perhaps got more emphasized in recent times, in certain cultures. I think altruism isn’t directly evolutionarily connected to power, and it’s more like “act morally (according to local culture) while that’s helpful for gaining power” which translates to “act altruistically while that’s helpful for gaining power” in cultures that emphasize altruism. Does this make more sense?
What are some failure modes of such an agency for Paul and others to look out for? (I shared one anecdote with him, about how a NIST standard for “crypto modules” made my open source cryptography library less secure, by having a requirement that had the side effect that the library could only be certified as standard-compliant if it was distributed in executable form, forcing people to trust me not to have inserted a backdoor into the executable binary, and then not budging when we tried to get an exception for this requirement.)
The only way to win is not to play.
Seems like a lot of people are doing exactly this, but interpreting it as “not having kids” instead of “having kids but not trying to compete with others in terms of educational investment/signaling”. As a parent myself I think this is pretty understandable in terms of risk-aversion, i.e., being worried that one’s unconventional parenting strategy might not work out well in terms of conventional success, and getting a lot of guilt / blame / status loss because of it.
Given it is a dystopian status competition hell, pay for it seems terrible, but if we have 98% participation now and 94% financial hardship, then this could be a way to justify a huge de facto transfer to parents.
I don’t understand how this justifies paying. Wouldn’t a big transfer to parents just cause more educational investment/signaling and leave the overall picture largely unchanged?
Trying to draw some general lessons from this:
We are bad at governance, even on issues/problems that emerge/change slowly relative to human thinking (unlike, e.g., COVID-19). I think people who are optimistic about x-risk governance should be a bit more pessimistic based on this.
Nobody had the foresight to think ahead of time about status dynamics in relation to fertility and parental investment. Academic theories about this are lagging empirical phenomena by a lot. What important dynamics will we miss with AI? (Nobody seems to be thinking about status and AI, which is one obvious candidate.)
It seems that humans, starting from a philosophically confused state, are liable to find multiple incompatible philosophies highly plausible in a path-dependent way, see for example analytic vs continental philosophy vs non-Western philosophies. I think this means if we train an AI to optimize directly for plausibility, there’s little assurance that we actually end up with philosophical truth.
A better plan is to train the AI in some way that does not optimize directly for plausibility, have some independent reason to think that the AI will be philosophically competent, and then use plausibility only as a test to detect errors in this process. I’ve written in the past that ideally we would first solve metaphilosophy so we that we can design the AI and the training process with a good understanding of the nature of philosophy and philosophical reasoning in mind, but failing that, I think some of the ideas in your list are still better than directly optimizing for plausibility.
You can do something like train it with RL in an environment where doing good philosophy is instrumentally useful and then hope it becomes competent via this mechanism.
This is an interesting idea. If it was otherwise feasible / safe / a good idea, we could perhaps train AI in a variety of RL environments, see which ones produce AIs that end up doing something like philosophy, and then see if we can detect any patterns or otherwise use the results to think about next steps.
I’m guessing you’re not being serious, but just in case you are, or in case someone misinterprets you now or in the future, I think we probably do not want to train AIs to give us answers optimized to sound plausible to humans, since that would make it even harder to determine whether or not the AI is actually competent at philosophy. (Not totally sure, as I’m confused about the nature of philosophy and philosophical reasoning, but I think we definitely don’t want to do that in our current epistemic state, i.e., unless we had some really good arguments that says it’s actually a good idea.)
Many comments pointed out that NYT does not in fact have a consistent policy of always revealing people’s true names. There’s even a news editorial about this which I point out in case you trust the fact-checking of NY Post more.
I think that leaves 3 possible explanations of what happened:
NYT has a general policy of revealing people’s true names, which it doesn’t consistently apply but ended up applying in this case for no particular reason.
There’s an inconsistently applied policy, and Cade Metz’s (and/or his editors’) dislike of Scott contributed (consciously or subconsciously) to insistence on applying the policy in this particular case.
There is no policy and it was a purely personal decision.
In my view, most rationalists seem to be operating under a reasonable probability distribution over these hypotheses, informed by evidence such as Metz’s mention of Charles Murray, lack of a public written policy about revealing real names, and lack of evidence that a private written policy exists.
While reading this, I got a flash-forward of what my life (our lives) may be like in a few years, i.e., desperately trying to understand and evaluate complex philosophical constructs presented to us by superintelligent AI, which may or may not be actually competent at philosophy.
I gave this explanation at the start of the UDT1.1 post:
When describing UDT1 solutions to various sample problems, I’ve often talked about UDT1 finding the function S* that would optimize its preferences over the world program P, and then return what S* would return, given its input. But in my original description of UDT1, I never explicitly mentioned optimizing S as a whole, but instead specified UDT1 as, upon receiving input X, finding the optimal output Y* for that input, by considering the logical consequences of choosing various possible outputs. I have been implicitly assuming that the former (optimization of the global strategy) would somehow fall out of the latter (optimization of the local action) without having to be explicitly specified, due to how UDT1 takes into account logical correlations between different instances of itself. But recently I found an apparent counter-example to this assumption.
I note that China is still doing market economics, and nobody is trying (or even advocating, AFAIK) some very ambitious centrally planned economy using modern computers, so this seems like pure speculation? Has someone actually made a detailed argument about this, or at least has the agreement of some people with reasonable economics intuitions?