I don’t think Gates, Musk, or Pinker should count as much more than laymen when it comes to AI risk, either.
1 and 2 are hard to succeed at without making a lot of progress on 4
It’s not obvious to me why this ought to be the case. Could you elaborate?
Being at or above the 75th-percentile mark corresponds to 2 bits of information. About 32.7 bits of information are required to specify a single person out of a population of 7 billion; even if we truncate that to 32 bits, you’d need to be in the top 25% at 16 different things to be considered “best in the world” in that one particular chunk of skill-space (assuming that the skills you choose aren’t correlated). And then you have to consider the problem density in that chunk—how likely is it, realistically speaking, that there are major problems that (a) require the intersection of 16 different domains, but (b) require only a mediocre grasp of all 16 of those domains?
If you don’t know what it means, how do you know that it’s significantly different from choosing an “objective function” and why do you feel comfortable in making a judgment about whether or not the concept is useful?
Because words tend to mean things, and when you use the phrase “define a search space”, the typical meaning of those words does not bring to mind the same concept as the phrase “choose an objective function”. (And the concept it does bring to mind is not very useful, as I described in the grandparent comment.)
Now, perhaps your contention is that these two phrases ought to bring to mind the same concept. I’d argue that this is unrealistic, but fine; it serves no purpose to argue whether I think you used the right phrase when you did, in fact, clarify what you meant later on:
in a looser sense a loss function induces “search space” on network weights, insofar as it practically excludes certain regions of the error surface from the region of space any training run is ever likely to explore.
All right, I’m happy to accept this as an example of defining (or “inducing”) a search space, though I would maintain that it’s not a very obvious example (and I think you would agree, considering that you prefixed it with “in a looser sense”). But then it’s not at all obvious what your original objection to the article is! To quote your initial comment:
It only makes sense to talk about “search” in the context of a *search space*; and all extent search algorithms / learning methods involve searching through a comparatively simple space of structures, such as the space of weights on a deep neural network or the space of board-states in Go and Chess. As we move on to attack more complex domains, such as abstract mathematics, or philosophy or procedurally generated music or literature which stands comparison to the best products of human genius, the problem of even /defining/ the search space in which you intend to leverage search-based techniques becomes massively involved.
Taken at face value, this seems to be an argument that the original article overstates the importance of search-based techniques (and potentially other optimization techniques as well), because there are some problems to which search is inapplicable, owing to the lack of a well-defined search space. This is a meaningful objection to make, even though I happen to think it’s untrue (for reasons described in the grandparent comment).
But if by “lack of a well-defined search space” you actually mean “the lack of a good objective function”, then it’s not clear to me where you think the article errs. Not having a good objective function for some domains certainly presents an obstacle, but this is not an issue with search-based optimization techniques; it’s simply a consequence of the fact that you’re dealing with an ill-posed problem. Since the article makes no claims about ill-posed problems, this does not seem like a salient objection.
Defining a search space for a complex domain is equivalent to defining a subspace of BF programs or NNs which could and probably does have a highly convoluted, warped separating surface.
The task of locating points in such subspaces is what optimization algorithms (including search algorithms) are meant to address. The goal isn’t to “define” your search space in such a way that only useful solutions to the problem are included (if you could do that, you wouldn’t have a problem in the first place!); the point is to have a search space general enough to encompass all possible solutions, and then converge on useful solutions using some kind of optimization.
EDIT: There is an analogue in machine learning to the kind of problem you seemed to be gesturing at when you mentioned “more complex domains”—namely, the problem of how to choose a good objective function to optimize. It’s true that for more abstract domains, it’s harder to define a criterion (or set of criteria) that we want our optimizer to satisfy, and this is (to a first approximation) a large part of the AI alignment problem. But there’s a significant difference between choosing an objective function and “defining your search space” (whatever that means), and the latter concept doesn’t have much use as far as I can see.
all extent search algorithms / learning methods involve searching through a comparatively simple space of structures, such as the space of weights on a deep neural network [...] As we move on to attack more complex domains, such as abstract mathematics, or philosophy or procedurally generated music or literature which stands comparison to the best products of human genius, the problem of even /defining/ the search space in which you intend to leverage search-based techniques becomes massively involved.
Since deep neural networks are known to be Turing-complete, I don’t think it’s appropriate to characterize them as a “comparatively simple” search space (unless of course you hold that “more complex domains” such as abstract mathematics, philosophy, music, literature, etc. are actually uncomputable).
Now, this does all fit into the broader pattern of “leveraging computation”. Fair enough, I guess, but what else would you expect?
It also fits into the pattern of (as you yourself pointed out) minimizing human knowledge during the construction of these programs, allowing them to tease out the features of the problem space on their own. The claim here is that as computing power increases, domain-agnostic approaches (i.e. approaches that do not require programmers to explicitly encode human-created heuristics) will increasingly outperform domain-specific approaches (which do rely on externally encoded human knowledge).
This is a non-trivial claim! For example, it wasn’t at all obvious prior to January 2017 that traditional chess engines (whose static evaluation functions are filled with human-programmed heuristics) could be overtaken by a pure learning-based approach, and yet the AlphaZero paper came out and showed it was possible. If the larger claim is true, then that might suggest directions for further research—in particular, approaches that abstract away large parts of a problem may have more success than approaches that focus on the details of the problem structure.
you can have meta uncertainty about WHICH type of environment you’re in, which changes what strategies you should be using to mitigate the risk associated with the uncertainty.
While I agree that it’s helpful to recognize situations where it’s useful to play more defensively than normal, I don’t think “meta uncertainty” (or “Knightian uncertainty”, as it’s more typically called) is a good concept to use when doing so. This is because there is fundamentally no such thing as Knightian uncertainty; any purported examples of “Knightian uncertainty” can actually be represented just fine in the standard Bayesian expected utility framework in one of two ways: (1) by modifying your prior, or (2) by modifying your assignment of utilities.
I don’t think it’s helpful to assign a separate label to something that is, in fact, not a separate thing. Although humans do exhibit ambiguity aversion in a number of scenarios, ambiguity aversion is a bias, and we shouldn’t be attempting to justify biased/irrational behavior by introducing additional concepts that are otherwise unnecessary. Nate Soares wrote a mini-sequence addressing this idea several years ago, and I really wish more people had read it (although if memory serves, it was posted during the decline of LW1.0, which may explain the lack of familiarity).
I seriously recommend anyone unfamiliar with the sequence to give it a read; it’s not long, and it’s exceptionally well-written. I already linked three of the posts above, so here’s the last one.
“I am 75% confident that hypothesis X is true—but if X really is true, I expect to gather more and more evidence in favor of X in the future, such that I expect my probability estimate of X to eventually exceed 99%. Of course, right now I am only 75% confident that X is true in the first place, so there is a 25% (subjective) chance that my probability estimate of X will decrease toward 0 instead of increasing toward 1.”
Upvoted for what seems like an honest attempt to represent arguments on both sides of an issue.
They value EA because conditions in their lives caused them to value it, and if those conditions change so be it.
I find this kind of argument to be entirely uncompelling, and stemming from a fairly basic error regarding what kind of thing morality is. (I say “kind of argument”, rather than simply “argument”, because you could replace “EA” in the quoted sentence with just about anything else, and I would find the modified version no more compelling than the original.)
There are several problems with this kind of argument, so let’s go over them in sequence. The first problem is that it’s vacuous. “People only value X because something in their lives caused them to value X” is true for any X you could suggest (provided, of course, that the X in question is valued by at least some people), and thus it fails to distinguish between values that are worth preserving and values that are not. Unless your thesis is literally “no values are better than any other values, which makes it okay for our current values to be replaced by any other set of values” (and if that is your thesis, I think it’s worth saying explicitly), the notion that we should be willing to relinquish any of our current values simply because something at some point caused us to acquire those values is an incredibly poor criterion to use.
That brings us to the second problem: even if your thesis really is that no values are better than any other, there would still remain the question of why the reader ought to accept such a thesis. You can’t justify it via some external argument, because no such external argument exists: the question of “what values should we have?” is itself a fundamentally value-laden question, and value-laden questions can only be addressed by appealing to other values. With some effort on the part of the reader, the article could (vaguely) be interpreted as making such an appeal, but even if such an interpretation is used, much of the philosophical force of the argument is lost. The sense that the reader is compelled to accept that values cannot have greater meaning, because the author has triumphantly explained that “values” exist only as “after-the-fact reifications” of a particular agent’s actions/judgments—and, after all, nobody cares about those—is lost.
And well it should be! I am inherently suspicious of any argument that claims people are “wrong” to value something, that does not itself rely upon other values. Often such arguments really consist of subtly hidden, value-laden assertions, which are strengthened by pretending to be something they are not (such as e.g. ironclad philosophical arguments). In the case of this article, the value-laden assertion is this:
If a particular value you hold was arrived at via a causal process that could plausibly have gone the other way (i.e. there’s a counterfactual world in which you ended up with a different value as a result of this causal process), then you shouldn’t consider that value worth preserving against value drift.
Note that this assertion is extremely value-laden! It contains a claim about what you should do, which the original article completed omits in favor of obfuscatory talk regarding the neurological processes behind “valuing”. And since (as I discussed above) any value you hold is the result of a causal process that could plausibly have gone the other way, the assertion simplifies to the following:
You shouldn’t consider any values worth preserving against value drift.
This is, again, a normative statement—and not a particularly compelling one at that. I don’t find the idea of relinquishing all my values—of becoming an agent whose utility function is 0 everywhere—at all attractive, and absent an unimaginably strong argument in favor of such, I can’t imagine such a prospect ever being attractive to me. The goal of metaethical theory is not to produce counterintuitive results (such as the assertion that nobody should value anything ever); the goal of metaethical theory is to produce a framework that explains and justifies the moral intuitions we already have. (This is what I meant when I said that the very first quoted statement stems from an error regarding what type of thing morality is: morality is not something you prove things about. Morality is simply the study of that which we choose to regard as good.)
The rest of the article is populated by sentence structures more typically found in continental philosophy works than on LessWrong, of which the most egregious is probably this one:
What instead drifts or changes are actions, although saying they drift or change is wrought because it supposes some stable viewpoint from which to observe the change, yet actions, via the preferences that cause us to choose any particular action over all others, are continuously dependent on the conditions in which they arise because what we sense (value, judge, assess) is conditional on the entire context in which we do the sensing.
As far as I can tell, the above sentence simply expresses the following sentiment:
It’s hard to say what it means for an action to “change”, since actions do not persist across time.
I don’t know what it is about some writers that makes them think every sentence they produce must meet a 50-word quota at minimum, but in my opinion the clarity of their writing would be greatly improved if they would stop doing that. That the entire article is comprised of such constructions did nothing to improve my experience of reading it; in fact, it left me rather annoyed, which I think can probably be deduced from the tone of this comment.
Beyond that, I don’t have much to say, except perhaps that I think the problem of the criterion (which you only tangentially bring up in this article, but which I’ve seen you repeatedly mention elsewhere, to the point where I’m starting to suspect it’s some kind of weird hobbyhorse of yours) is nothing more than vague sophistry of the same kind many mainstream philosophers seem so fond of.
Final thoughts: I think it’s a shame to be criticizing an article that obviously had a lot of effort put into it by the author, especially so harshly, but I’ve expressed my opinion of the article’s contents as frankly as I can, and it’s simply the case that my opinion of said contents is… not good. Ultimately, I think that (a) your central claim here is mistaken, and that (b) if I’m wrong about that, obviously it would be good if you convinced me otherwise, but that your current style of writing is not very conducive to that task.
Did you read the linked article?
My naive guess would be it demotivates mediocre posters more strongly because they’re wrong more often.
A lot of the time, “mediocre posters” tend to be the source of the nitpicking. This is because writing up a nuanced objection takes time and effort, and requires much of the same skills as writing a good top-level post; whereas posting low-effort nitpicks is easy, especially if other people reward you with karma when you do so. (And empirically, I observed a great deal of poorly reasoned comments receiving upvotes on LW 1.0 towards the end of its lifespan, although I will decline to speculate publicly as to the cause of this.)
a probability distribution over… something, I am not sure what in your case, if not an external reality.
I confess to being quite confused by this statement. Probability distributions can be constructed without making any reference to an “external reality”; perhaps the purest example would simply be some kind of prior over different input sequences. At this point, I suspect you and I may be taking the phrase “external reality” to mean very different things—so if you don’t mind, could I ask you to rephrase the quoted statement after Tabooing “external reality” and all synonyms?
EDIT: I suppose if I’m going to ask you to Taboo “external reality”, I may as well do the same thing for “cosmic coincidence”, just to try and help bridge the gap more quickly. The original statement (for reference):
There is no external reality, and our observations are only structured due to a giant cosmic coincidence.
And here is the Tabooed version (which is, as expected, much longer):
Although there is a model in our hypothesis space with an excellent compression ratio on our past observations, we should not expect this model to continue performing well on future observations. That is, we should not expect there to be a model in our hypothesis space that outperforms the max-entropy distribution (which assigns equal probability to all possible future observation sequences), and although we currently have a model that appears to be significantly outperforming the max-entropy distribution, this is merely an artifact of our finite dataset, which we may safely expect to disappear shortly.
Taken literally, the “coincidence hypothesis” predicts that our observations ought to dissolve into a mess of random chaos, which as far as I can tell is not happening. To me, this suffices to establish the (probable) existence of some kind of fixed reality.
I haven’t said ‘bad person’ unless I’m missing something.
I mean, you haven’t called anyone a bad person, but “It’s Not The Incentives, It’s You” is a pretty damn accusatory thing to say, I’d argue. (Of course, I’m also aware that you weren’t the originator of that phrase—the author of the linked article was—but you at least endorse its use enough to repeat it in your own comments, so I think it’s worth pointing out.)
I might try and write up a reply of my own (to Zvi’s comment), but right now I’m fairly pressed for time and emotional energy, so until/unless that happens, I’m going to go ahead and endorse this response as closest to the one I would have given.
EDIT: I will note that this bit is (on my view) extremely important:
If one were to be above average but imperfect (emphasis mine)
“Above average”, of course, a comparative term. If e.g. 95% of my colleagues in a particular field regularly submit papers with bad data, then even if I do the same, I am no worse from a moral perspective than the supermajority of the people I work with. (I’m not claiming that this is actually the case in academia, to be clear.) And if it’s true that I’m only doing what everyone else does, then it makes no sense to call me out, especially if your “call-out” is guilt-based; after all, the kinds of people most likely to respond to guilt trips are likely to be exactly the people who are doing better than average, meaning that the primary targets of your moral attack are precisely the ones who deserve it the least.
(An interesting analogy can be made here regarding speeding—most people drive 10-15 miles over the official speed limit on freeways, at least in the US. Every once in a while, somebody gets pulled over for speeding, while all the other drivers—all of whom are driving at similarly high speeds—get by unscathed. I don’t think it’s particularly controversial to claim that (a) the driver who got pulled over is usually more annoyed at being singled out than they are recalcitrant, and (b) this kind of “intervention” has pretty much zero impact on driving behavior as a whole.)
This sentence already presumes external reality, right there in the words “cosmic coincidence,”
I’m not sure what you mean by this. The most straightforward interpretation of your words seems to imply that you think the word “coincidence”—which (in usual usage) refers simply to an improbable occurrence—presumes the existence of an external reality, but I’m not sure why that would be so.
(Unless it’s the word “cosmic” that you object to? If so, that word can be dropped without issue, I think.)
One can argue that coordinated action would be more efficient, and I’d agree. One can argue that in context, it’s not worth the trade-off to do the thing that reinforces good norms and makes things better, versus the thing that’s better for you and makes things generally worse. Sure. Not everything that would improve norms is worth doing.
But don’t pretend it doesn’t matter.
This reads as enormously uncharitable to Raemon, and I don’t actually know where you’re getting it from. As far as I can tell, not a single person in this conversation has made the claim that it “doesn’t matter”—and for good reason: such a claim would be ridiculous. That you seem willing to accuse someone else in the conversation of making such a claim (or “pretending” it, which is just as bad) doesn’t say good things about the level of conversation.
What has been claimed is that “doing the thing that reinforces good norms” is ineffective, i.e. it doesn’t actually reinforce the good norms. The claim is that without a coordinated effort, changes in behavior on an individual level have almost no effect on the behavior of the field as a whole. If this claim is true (and even if it’s false, it’s not obviously false), then there’s no point hoping to see knock-on effects from such a change—and that in turn means all that’s left is the cost-benefit calculation: is the amount of good that I would do by publishing a paper with non-fabricated data (even if I did, how would people know to pay attention to my paper and not all the other papers out there that totally did use fabricated data?), worth the time/effort/willpower it would take me to do so?
As you say: it is indeed a trade-off. Now, you might argue (perhaps rightly so!) that one individual’s personal time/effort/willpower is nowhere near as important as the effects of their decision whether to fabricate data. That they ought to be willing to expend their own blood, sweat, and tears to Do The Right Thing—at least, if they consider themselves a moral person. And in fact, you made just such an argument in your comment:
Similarly, I find it odd that one uses the idea that ‘doing the right thing is not free’ as what seems to be a justification for not doing the right thing. Yes, obviously when the right thing is free for you versus the not-right thing you should do the right thing. And of course being good is costly! Optimizing for anything is costly if you’re not counting the thing itself as a benefit.
But the whole point of some things being right is that you do them even though it’s not free, because It’s Not The Incentives, It’s You. You’re making a choice.
But this ignores the fact that every decision has an opportunity cost: if I spend vast amounts of time and effort designing and conducting a rigorous study, pre-registering my plan, controlling for all possible confounders (and then possibly getting a negative result and needing to go back to the drawing board, all while my colleague Joe Schmoe across the hall fabricates his way into Nature), this will naturally make me more tired than I would be otherwise. Perhaps it will cause me to have less patience than I normally do, become more easily frustrated at events outside of my control, be less willing to tolerate inconveniences in other areas of my life, etc. If, for example, I believed eating meat was morally wrong, I might nonetheless find meat more difficult to deliberately deprive myself of it if I was already spending a great deal of willpower every day on seeing this study through. And if I expect that to be the case, then I have to ask myself which thing I ought to prioritize: not eating meat, or doing the study properly?
This is the (somewhat derisively named) “goodness budget” Benquo mentioned upthread. But another name for it might be Moral Slack. It’s the limited amount of room we have to be less than maximally good in our lives, without being socially punished for it. It’s the privilege we’re granted, to not have to constantly ask ourselves “Should I be doing this? Am I being a bad person for doing this?” It’s—look, you wrote half the posts I just linked to. You know the concept. I don’t know why you’re not applying it here, but it seems pretty obvious to me that it applies just as well here as it does in any other aspect of life.
To be clear: you know that falsifying data is a Very Bad Thing. I know that falsifying data is a Very Bad Thing. Raemon knows that falsifying data is a Very Bad Thing. We all know that falsifying data is bad. But if that’s the way the incentives point (and that’s a very important if!), then it’s also bad to call people out for doing it. If you do that, then you’re using moral indignation as a weapon—a way to not only coerce other people into using up their willpower, but to come out of it looking good yourself.
People who manage to resist the incentives—who ignore the various siren calls they constantly hear—are worthy of extremely high praise. They are exceptionally good people—by definition, in fact, because if they weren’t exceptional, everyone else would be doing it, too. By all means, praise those people as much as you want. But that doesn’t mean that everyone who fails to do what they did is an exceptionally bad person, and lambasting them for it isn’t actually a very good way to get them to change. “It’s Not The Incentives, It’s You” puts the emphasis in the wrong place, and it degrades communication with people who might have been reachable with a more nuanced take.
Please describe a world in which there is no predictability at all, yet where agents “exist”. How they survive without being able to find food, interact, or even breathe, because there breathing means you have a body that can anticipate that breathing keeps it alive.
I can write a computer program which trains some kind of learner (perhaps a neural network; I hear those are all the rage these days). I can then hook that program up to a quantum RNG, feeding it input bits that are random in the purest sense of the term. It seems to me that my learner would then exist in a “world” where no predictability exists, where the next input bit has absolutely nothing to do with previous input bits, etc. Perhaps not coincidentally, the learner in question would find that no hypothesis (if we’re dealing with a neural network, “hypothesis” will of course refer to a particular configuration of weights) provides a predictive edge over any other, and hence has no reason to prefer or disprefer any particular hypothesis.
You may protest that this example does not count—that even though the program’s input bits are random, it is nonetheless embedded in hardware whose behavior is lawfully determined—and thus that the program’s very existence is proof of at least some predictability. But what good is this assertion to the learner? Even if it manages to deduce its own existence (which is impossible for at least some types of learners—for example, a simple feed-forward neural net cannot ever learn to reflect on its own existence no matter how long it trains), this does not help it predict the next bit of input. (In fact, if I understood your position correctly, shminux, I suspect you would argue that such a learner would do well not to start making assumptions about its own existence, since such assumptions do not provide predictive value—just as you seem to believe the existence of a “territory” does not provide predictive value.)
But to tie this back to the original topic of conversation: empirically, we are not in the position of the unfortunate learner I just described. We do not appear to be receiving random input data; our observations are highly structured in a way that strongly suggests (to me, at least) that there is something forcing them to be so. Perhaps our input bits come from a lawful external reality; that would certainly qualify as “something forcing them to be [structured]”. This “external reality” hypothesis successfully explains what would otherwise be a gigantic improbability, and I don’t think there are any competing hypotheses at this stage—unless of course you consider “there is no external reality, and our observations are only structured due to a giant cosmic coincidence” to be an alternative hypothesis worth putting forth. (As some of my comments so far might imply, I do not consider this alternative hypothesis very probable.)