https://ninapanickssery.com/
Views purely my own unless clearly stated otherwise
https://ninapanickssery.com/
Views purely my own unless clearly stated otherwise
My bad, I read you as disagreeing with Neel’s point that it’s good to gain experience in the field or otherwise become very competent at the type of thing your org is tackling before founding an AI safety org.
That is, I read “I think that founding, like research, is best learned by doing” as “go straight into founding and learn as you go along”.
I naively expect the process of startup ideation and experimentation, aided by VC money
It’s very difficult to come with AI safety startup ideas that are VC-fundable. This seems like a recipe for coming up with nice-sounding but ultimately useless ideas, or wasting a lot of effort on stuff that looks good to VCs but doesn’t advance AI safety in any way.
I disagree with this frame. Founders should deeply understand the area they are founding an organization to deal with. It’s not enough to be “good at founding”.
This makes sense as a strategic choice, and thank you for explaining it clearly, but I think it’s bad for discussion norms because readers won’t automatically understand your intent as you’ve explained it here. Would it work to substitute the term “alignment target” or “developer’s goal”?
When I say “human values” without reference I mean “type of things that human-like mind can want and their extrapolations”
This is a reasonable concept, but should have a different handle from “human values”. (Because it makes common phrases like “we should optimize for human values” nonsensical. For example, human-like minds can want chocolate cake but that tells us nothing about the relative importance of chocolate cake and avoiding disease, which is relevant for decision making.)
What “human values” gesture at is distinction from values-in-general, while “preferences” might be about arbitrary values.
I don’t understand what this means.
Taking current wishes/wants/beliefs as the meaning of “preferences” or “values” (denying further development of values/preferences as part of the concept) is similarly misleading as taking “moral goodness” as meaning anything in particular that’s currently legible, because the things that are currently legible are not where potential development of values/preferences would end up in the limit.
Is your point here that “values” and “preferences” are based on what you would decide to prefer after some amount of thinking/reflection? If yes, my point is that this should be stated explicitly in discussions, for example like “here I am discussing the preferences you, the reader, would have, after thinking for many hours.”
If you want to additionally claim that these preferences are tied to moral obligation, this should also be stated explicitly.
Yeah that’s fair. I didn’t follow the “In other words” sentence (it doesn’t seem to be restating the rest of the comment in other words, but rather making a whole new (flawed) point).
Has this train of thought caused you to update away from “Human Values” as a useful construct?
I was curious so I read this comment thread, and am genuinely confused why Tsvi is so annoyed by the interaction (maybe I am being dumb and missing something). My interpretation of Wei Dai’s point is the following:
Tsvi is saying something like:
People have a tendency to defer too much (though deferring sometimes is necessary). They should consider deferring less and thinking for themselves more.
When one does defer, it’s good to be explicit about that fact, both to oneself and others.
As an example to illustrate his point, Tsvi mentions a case where he deferred to Yudkowsky. This is used as an example because Yudkowsky is considered a particularly good thinker on the topic Tsvi (and many others) deferred on, but nevertheless there was too much deference.
Wei Dai points out that he thinks the example is misleading, because to him it looks more like being wrong about who it’s worth deferring to, rather than deferring too much. The more general version of his point is “You, Tsvi, are noticing problems that occur from people deferring. However, I think these problems may be at least partially due to them deferring to the wrong people, rather than deferring at all.”
(If this is indeed the point Wei Dai is making, I happen to think Tsvi is more correct, but I don’t think WD’s contribution is meaningless or in bad faith.)
That’s a decision whose emotional motivation is usually mainly oxytocin IIUC.
I strongly doubt this, especially in men. I suspect it plays a role in promoting attachment to already-born kids but not in deciding to have them.
Oxytocin is one huge value-component which drives people to sink a large fraction of their attention and resources into local things which don’t pay off in anything much greater. It’s an easier alternative outlet to ambition. People can feel basically-satisfied with their mediocre performance in life so long as they feel that loving connection with the people around them, so they’re not very driven to move beyond mediocrity.
I know you are posting on LW which is a skewed audience, but most people are mediocre at most things and are unlikely to achieve great feats according to you, even with more ambition. Having a happy family is quite a reasonable ambition for most people. In fact, it is of the few things an everyday guy can do that “pays off in anything much greater” (i.e. the potential for a long generational line and family legacy).
(Also consider that stereotypically, women are the ones who spend the most effort on domestic and child-related matters, and are also less likely to be on the far right of bell curves.)
At risk of committing a Bulverism, I’ve noticed a tendency for people to see ethical bullet-biting as epistemically virtuous, like a demonstration of how rational/unswayed by emotion you are (biasing them to overconfidently bullet-bite). However, this makes less sense in ethics where intuitions like repugnance are a large proportion of what everything is based on in the first place.
Maybe I will make a (somewhat lazy) LessWrong post with my favorite quotes
Edit: I did it: https://www.lesswrong.com/posts/jAH4dYhbw3CkpoHz5/favorite-quotes-from-high-output-management
Nice principle.
Reminds me of the following quote from classic management book High Output Management:
Given a choice, should you delegate activities that are familiar to you or those that aren’t? Before answering, consider the following principle: delegation without follow-through is abdication. You can never wash your hands of a task. Even after you delegate it, you are still responsible for its accomplishment, and monitoring the delegated task is the only practical way for you to ensure a result. Monitoring is not meddling, but means checking to make sure an activity is proceeding in line with expectations. Because it is easier to monitor something with which you are familiar, if you have a choice you should delegate those activities you know best. But recall the pencil experiment and understand before the fact that this will very likely go against your emotional grain.
A common use of “Human Values” is in sentences like “we should align AI with Human Values” or “it would be good to maximize Human Values upon reflection”, i.e. normative claims about how Human Values are good and should be achieved. However, if you’re not a moral realist, there’s no (or very little) reason to believe that humans, even if they reflect for a long time etc., will arrive on the same values. Most of the time if someone says “Human Values” they don’t mean to include the values of Hitler or a serial killer. This makes the term confusing, because it can both be used descriptively and normatively, and the normative use is common enough to make it confusing when used as a purely descriptive term.
I agree that if you’re a moral realist, it’s useful to have a term for “preferences shared amongst most humans” as distinct from Goodness, but Human Values is a bad choice because:
It implies preferences are more consistent amongst humans than they really are
The use of “Human Values” has been too polluted by others using it in a normative sense
I really appreciate your clear-headedness at recognizing these phenomena even in people “on the same team”, i.e. people very concerned about and interested in preventing AI X-Risk.
However, I suspect that you also underrate the amount of self-deception going on here. It’s much easier to convince others if you convince yourself first. I think people in the AI Safety community self-deceive in various ways, for example by choosing to not fully think through how their beliefs are justified (e.g. not acknowledging the extent to which they are based on deference—Tsvi writes about this in his recent post rather well).
There are of course people who explicitly, consciously, plan to deceive, thinking things like “it’s very important to convince people that AI Safety/policy X is important, and so we should use the most effective messaging techniques possible, even if they use false or misleading claims.” However, I think there’s a larger set of people who, as they realize claims A B C are useful for consequentialist reasons, internally start questioning A B C less, and become biased to believe A B C themselves.
The <1% comes from a combination of:
Thinking “superintelligence”, as described by Yudkowsky et al., will not be built in the next 20 years. “AGI” means too many different things, in some sense we already have AGI, and I predict continued progress in AI development.
Thinking the kind of stronger AI we’ll see in the next 20 years is highly unlikely to kill everyone. Less certain about true superintelligence, but even in that case, I’m far less pessimistic than most lesswrongers.
Very rough numbers would be p(superintelligence within 20 years) = 1%, p(superintelligence kills everyone within 100 years of being built) = 5%, though it’s very hard to put numbers on such things while lacking info, so take this as gesturing at a general ballpark.
I haven’t written much about (1). Some of it is intuition from working in the field and using AI a lot. (Edit: see this from Andrej Karpathy that gestures some of this intuition).
Re (2), I’ve written a couple relevant posts (post 1, post 2 - review of IABIED), though I’m somewhat dissatisfied with their level of completeness. The TLDR is that I’m very skeptical of appeals to coherence argument style reasoning, which is central to most misalignment-related doom stories (relevant discussion with Raemon).
Correct. Though when writing the original comment I didn’t realize Nikola’s p(doom) within 19yrs was literally >50%. My main point was that even if your p(doom) is relatively high, but <50%, you can expect to be able to raise a family. Even at Nikola’s p(doom) there’s some chance he can raise children to adulthood (15% according to him), which makes it not a completely doomed pursuit if he really wanted them.
I mean, I also think it’s OK to birth people who will die soon. But indeed that wasn’t my main point.
Yeah I think it’s very unlikely your family would die in the next 20 years (<<1%) so that’s the crux re. whether or not you can raise a family
I think those other types of startups also benefit from expertise and deep understanding of the relevant topics (for example, for advocacy, what are you advocating for and why, how well do you understand the surrounding arguments and thinking...). You don’t want someone who doesn’t understand the “field” working on “field-building”.