A couple of guesses for why we might see this, which don’t seem to depend on property:
An obligation to act is much more freedom-constraining than a prohibition on an action. The more and more one considers all possible actions with the obligation to take the most ethically optimal one, the less room they have to consider exploration, contemplation, or pursuing their own selfish values. Prohibition on actions does not have this effect.
The environment we evolved in had roughly the same level of opportunity to commit harmful acts, bur far less opportunity to take positive consequentialist action (and far less complicated situations to deal with). It was always possible to hurt your friends and suffer consequences, but it was rare to have to think about the long term consequences of every action.
The consequences of killing, stealing, and hurting people are easier to predict than altruistic actions. Resources are finite, therefore sharing them can be harmful or beneficial, depending on the circumstances and who they are shared with. Other people can defect or refuse to reciprocate. If you hurt someone, they are almost guaranteed to retaliate. If you help someone, there is no guarantee there will be a payoff for you.
It seems to construct an estimate of it by averaging a huge number of observations together before each update (for Dota 5v5, they say each batch is around a million observations, and I’m guessing it processes about a million batches). The surprising thing is that this works so well, and it allows leveraging of computational resources very easily.
My guess for how it deals with partial observability in a more philosophical sense is that it must be able to store an implicit model of the world in some way, in order to better predict the reward it will eventually observe. I’m beginning to wonder if the distinction between partial and full observability isn’t very useful after all. Even with AlphaGo, even though it can see the whole board, there are also a whole bunch of “spaces” it can’t see fully, possibly the entire action space, the space of every possible game trajectory, or the mathematical structure of play strategies. And yet, it managed to infer enough about those spaces to become good at the game.
I don’t know how hard it would be to do a side by side “FLOPS” comparison of Dota 5v5 vs AlphaGo / AlphaZero, but it seems like they are relatively similar in terms of computational cost required to achieve something close to “human level”. However, as has been noted by many, Dota is a game of vastly more complexity because of its continuous state, partial observability, large action space, and time horizon. So what does it mean when it requires roughly similar orders of magnitude of compute to achieve the same level of ability as humans, using a fairly general architecture and learning algorithm?
Some responses to AlphaGo at the time were along the lines of “Don’t worry too much about this, it looks very impressive, but the game still has a discrete action space and is fully observable, so that explains why this was easy.”
I’ve been meditating since I was about 19, and before I came across rationality / effective altruism. There is quite a bit of overlap between the sets of things I’ve been able to learn from both schools of thought, but I think there are still a lot of very useful (possibly even necessary) things that can only be learned from meditative practices right now. This is not because rationality is inherently incapable of learning the same things, but because within rationality it would take very strong and well developed theories, perhaps developed through large scale empirical observations of human behavior, to come to the same conclusions. On the other hand, with meditation a lot of these same conclusions are just “obvious.”
Most of these things have to do with subtle issues of psychology, particularly with values and morality. For example, before I began meditating, I generally believed that:
Moral principles could be determined logically from a set of axioms that were “self-evidently true” and that once I deduced those things, I would simply follow them.
The set of things that seemed to make me happy, like having friends, being in love, feeling accomplished, were not incompatible with true moral principles, and in fact were instrumentally helpful in achieving terminal moral goals.
I intrinsically value what is moral. If it ever seemed like I valued what was not moral, I could chalk it up to temporary or easily surmountable issues, like vestigial animal instincts or lack of willpower. Basically desires that could be easily overridden.
Pleasure, pain, and emotions were more like guidelines, things that made it possible to act quickly in certain situations. Insofar as certain forms of pleasure were “intrinsic values” (like love) they did not interfere with moral goals. They were not things that determined my behavior very strongly, and certainly they didn’t have subtle cascading effects on the entire set of my beliefs.
After having meditated for a long time, many of these beliefs were eradicated. Right now it seems more likely that:
My values are not even consistent, let alone determined by moral principles. It’s not clear that deducing a good set of moral principles could even change my values.
My values are malleable, but not easily malleable in a direction that can be controlled by me (not without a ton of meditation, anyway).
The formalization of my values in my mind are not a good predictor of what my actions will be. A better predictor involves far more short term mechanisms in my psyche.
The beliefs I had prior to meditating were more likely constructed so that I could report these to other people in a way that would make them more likely to value me and approve of me.
Values that truly do seem hard to deconstruct are surprisingly selfish. For example, I assumed that I valued approval from other humans because this was an instrumental goal in helping me judge the quality of my actions. It now seems more likely that social approval is in fact an intrinsic goal, which is very worrying to me in regards to my ability to attain my altruistic goals.
If it turns out that meditating has given me better self-reflective capabilities, and the things I’ve observed are accurate, then this has some pretty far-reaching implications. If I’m not extremely atypical, then most people are probably very blind to their own intrinsic values. This is a worrying prospect for the long-term efficacy of effective altruism.
Hopefully this isn’t too controversial to say, but it seems to me like a lot of the main currents within EA are operating more-or-less along the lines of my prior-to-meditating beliefs. Here I’m thinking about the type of ethics where you are encouraged to maximize your altruistic output. Things like, “earn to give”, “choose only the career that maximizes your ability to be altruistic”, “donate as much of your time and energy as you can to being altruistic”, etc. Of course EA thought is very diverse, so this doesn’t represent all of it. But the way that my values currently seem structured, it’s probably unrealistic that I could actually fulfill these, unless I experienced an abnormally large amount of happiness for each altruistic act that outweighed most of my other values. It’s of course possible that I’m unusually selfish or even a sociopath, but my prior on that is very low.
On the other hand, if my values really are malleable, and it is possible to influence those values, then it makes sense for me to spend a lot of time deciding how that process should proceed. This is only possible because my values are inconsistent. If they were consistent, it would be against my values to change them, but it seems that once a set of values is inconsistent, it could actually make sense to try to alter them. And meditation might turn out to be one of the ways to make these kind of changes to your own mind.
It seems like in the vast majority of conversations, we find ourselves closer to the “exposed to the Deepak Chopra version of quantum mechanics and haven’t seen the actual version yet” situation than we do to the “Arguing with someone who is far less experienced and knowledgeable than you are on this subject.” In the latter case, it’s easy to see why steelmanning would be counterproductive. If you’re a professor trying to communicate a difficult subject to a student, and the student is having trouble understanding your position, it’s unhelpful to try to “steelman” the student (i.e. try to present a logical-sounding but faulty argument in favor of what the student is saying), but it’s far more helpful to the student to try to “pass their ITT” by modeling their confusions and intuitions, and then use that to try to help them understand the correct argument. I can imagine Eliezer and Holden finding themselves in this situation more often than not, since they are both experts in their respective fields and have spent many years refining their reasoning skills and fine-tuning the arguments to their various positions on things.
But in most situations, for most of us who may not quite know how strong the epistemological ground we stand on really is, are probably using some mixture of flawed intuitions and logic to present our understandings of some topic. We might also be modeling people whom we really respect as being in a similar situation as we are. In which case it seems like the line between steelmanning and ITT becomes a bit blurry. If I know that both of us are using some combination of intuition (prone to bias and sometimes hard to describe), importance weighting of various facts, and different logical pathways to reach some set of conclusions, both trying to pass each other’s ITT as well as steelmanning potentially have some utility. The former might help to iron out differences in our intuitions and harder to formalize disagreements, and the latter might help with actually reaching more formal versions of arguments, or reasoning paths that have yet to be explored.
But I do find it easy to imagine that as I progress in my understanding and expertise in some particular topic, the benefits of steelmanning relative to ITT do seem to decrease. But it’s not clear to me that I (or anyone outside of the areas they spend most of their time thinking about) have actually reached this point in situations where we are debating with or cooperating on a problem together with respected peers.
I don’t see him as arguing against steelmanning. But the opposite of steelmanning isn’t arguing against an idea directly. You’ve got to be able to steelman an opponent’s argument well in order to argue against it well too, or perhaps determine that you agree with it. In any case, I’m not sure how to read a case for locally valid argumentation steps as being in favor of not doing this. Wouldn’t it help you understand how people arrive at their conclusions?
I would also like to have a little jingle or ringtone play every time someone passes over my comments, please implement for Karma 3.0 thanks
What’s most unappealing to me about modern, commercialized aesthetics is the degree to which the bandwidth is forced to be extremely high—something I’d call the standardization of aesthetics. When I walk down the street in the financial district of SF, there’s not much variety to be found in people’s visual styles. Sure, everything looks really nice, but I can’t say that it doesn’t get boring after a while. It’s clear that a lot of information is being packed into people’s outfits, so I should be able to infer a huge amount about someone just by looking at them. Same thing with websites. There’s really only one website design. Can it truly be said that there is something inherently optimal about these designs? I strongly suspect no. There are more forces at play that guarantee convergence that don’t depend on optimality.
Part of it might be the extremely high cost of defection. As aesthetics is a type of signalling mechanism, most of what Robin Hanson says applies here. It’s just usually not worth it to be an iconoclast or truly original. And at some point we just start believing the signals are inherently meaningful, because they’ve been there for so long. But all it takes is to look at the different types of beauty produced by other cultures or at different points in human history to see that this is not the case. The color orange, in silicon valley, might represent “innovation” or “ingenuity” (look at Tensorflow’s color scheme), but the orange robes of Buddhist monks evoke serenity, peace and compassion (but of course the color was originally dependent on the dyes that were available). However, one can also observe that there is little variety within each culture as well, suggesting that the same forces pushing towards aesthetic convergence are at play.
The sum of the evidence suggests to me that I am getting an infinitesimal fraction of the possible pleasant aesthetic experiences which could feasibly be created by someone given that they were not subject to signalling constraints. This seems deeply disappointing.
It seems like this objection might be empirically testable, and in fact might be testable even with the capabilities we have right now. For example, Paul posits that AlphaZero is a special case of his amplification scheme. In his post on AlphaZero, he doesn’t mention there being an aligned “H” as part of the set-up, but if we imagine there to be one, it seems like the “H” in the AlphaZero situation is really just a fixed, immutable calculation that determines the game state (win/loss/etc.) that can be performed with any board input, with no risk of the calculation being incorrectly performed, and no uncertainty of the result. The entire board is visible to H, and every board state can be evaluated by H. H does not need to consult A for assistance in determining the game state, and A does not suggest actions that H should take (H always takes one action). The agent A does not choose which portions of the board are visible to H. Because of this, “H” in this scenario might be better understood as an immutable property of the environment rather than an agent that interacts with A and is influenced by A. My question is, to what degree is the stable convergence of AlphaZero dependent on these properties? And can we alter the setup of AlphaZero such that some or all of these properties are violated? If so, then it seems as though we should be able to actually code up a version in which H still wants to “win”, but breaks the independence between A and H, and then see if this results in “weirder” or unstable behavior.
I can’t emphasize enough how important the thing you’re mentioning here is, and I believe it points to the crux of the issue more directly than most other things that have been said so far.
We can often weakman postmodernism as making basically the same claim, but this doesn’t change the fact that a lot of people are running an algorithm in their head with the textual description “there is no outside reality, only things that happen in my mind.” This algorithm seems to produce different behaviors in people than if they were running the algorithm “outside reality exists and is important.” I think the first algorithm tends to produce behaviors that are a lot more dangerous than the latter, even though it’s always possible to make philosophical arguments that make one algorithm seem much more likely to be “true” than the other. It’s crucial to realize that not everyone is running the perfectly steelmanned version of such algorithms to do with updating our beliefs based on observations of the processes of how we update on our beliefs, and such things are very tricky to get right.
Even though it’s valid to make observations of the form “I observe that I am running a process that produces the belief X in me”, it is definitely very risky to create a social norm that says such statements are superior to statements like “X is true” because such norms create the tendency to assign less validity to statements like “X is true”. In other words, such a norm can itself become a process that produces the belief “X is not true” when we don’t necessarily want to move our beliefs on X just because we begin to understand how the processes work. It’s very easy to go from “X is true” to “I observe I believe X is true” to “I observe there are social and emotional influences on my beliefs” to “There are social and emotional influences on my belief in X” to finally “X is not true” and I can’t help but feel a mistake is being made somewhere in that process.
I could probably write a lot more about this somewhere else, but I’m wondering if anyone else felt that this paper seemed to be kind of shallow. This comment is probably too brief to really do this feeling justice, but I’ll probably decompose this into two things I found disappointing:
“Intelligence” is defined in such a way that leaves a lot to be desired. It doesn’t really define it in a way that makes it qualitatively different than technology in general (“tasks thought to require intelligence” is probably much less useful than “narrowing the set of possible futures into one that match an agent’s preference ordering.”). For this reason, the paper imagines a lot of scenarios that amount to basically one party being able to do one narrow task much better than another party. This is not specific enough to really narrow us down to any approaches that deal with AI more generally.
As a consequence of the choice of the authors to leave their framework sort of fuzzy, their suggestions for how to respond to this problem also take on this fuzziness. For example their first suggestion is that policy leaders should consult with AI researchers. This reads a little bit like an applause light, and it doesn’t seem to offer many suggestions about how to make this more likely, or about how to make sure that policy leaders are well informed enough to make sure they are considering the right people to be advised by and take their advice seriously.
Overall I’m happy that these kinds of things can be discussed by a large group of various organizations. But I think any public efforts to work towards mitigating AI risk need to be very careful that they aren’t losing something extremely important by trying to appeal to too many uncertainties and disagreements at once.
This is partly a reply and partly an addendum to my first comment. I’ve been thinking about a sort of duality that exists within the rationalist community to a degree and that has become a lot more visible lately, in particular with posts like this. I would refer to this duality as something like “The Two Polarities of Noncomformists”, although I’m sure someone could think of something better to call it. The way I would describe it is that, communities like this one are largely composed of people who feel fairly uncomfortable with the way they are situated in society, either because they are outliers on some dimension of personality, interests, or intellect, or because of the degree to which they are sensitive to social reality around them. What this leads to is basically a bimodal distribution of people where both modes are outliers, with respect to the distribution of people in general, on one axis (namely the way that social reality is sensed) but on opposite ends. And these two groups differ very strongly in the way that their values are formed and quite possibly even in subtle ways reality itself is perceived.
On the one hand, you have the “proper non-conformists” who are somewhat unplugged from the Omega / distributed network you are describing, and who I imagine will have trouble digesting a lot of your claims here. I call them proper non-conformists because they genuinely seem to not feel so much of the tension the tugs in the network are giving them. I think there’s a connection here with people who consider themselves “status blind” or tend to visualize Slack only in a very concrete, visible sense like having a lot of wealth. They might tend to have aversion to heavily socially conscious displays of signalling and things like that. You suspect that this might be what “psychopathy” is, and I think there might be some partial truth to that, but ultimately not in the sense of how one would visualize a psychopath as an evil person or a person completely disconnected from reality as a whole. For example, one might have the ability to do altruistic things in a way that is invisible, in a way that could be very difficult for someone strongly connected to Omega.
On the other hand you have the people who are very tightly connected to the Matrix and are highly sensitive to its inputs, and the increased sensitivity is what made it likely for them to realize it exists in the first place. This group is constantly feeling the networks’ tugs and can’t disconnect from it (perhaps not without Looking, anyway), so a lot of their strategies tend to encompass trying to fit into it as well as they can, mastering it’s tricks, and overall playing the game extremely well. People from here could learn to be highly charismatic if they choose to do so, but they might risk being seen as ultra-conformists. They still end up being non-conformists in the sense of not being neurotypical, and because a lot of their practices will seem extreme if one were to really examine them in detail. This group might have serious difficulty imagining not being inside the network and may even be skeptical that someone could still be able to function if they were.
(I’m going to avoid trying to place anyone in particular in one or the other of these).
I think the second group will have the upper hand in terms of group coordination problems because they have direct access to all the mechanisms, but have the disadvantage of being prone to inadequate equilibria problems. The first group is in the opposite situation. But in the end I think the first group can’t really “Look” at the network from within—ideally you want to be inside of it so you have direct access, but also with the ability to see it for what it really is in some sense. So it’s possible that “Lookers” in the second group could accomplish much larger stuff than “Lookers” from the first group.
If that’s correct, then it might suggest that, with Looking, attempts to move more toward polarity one might be less fruitful than attempts to move toward polarity two. And our previous inertia seemed to be going more towards polarity one.
I think I’ve been doing “mythic mode” for nearly my entire life, but not because I came up with this idea explicitly and intentionally deployed it, but just because it sort of happens on its own without any effort. It’s not something that happens at every moment, but frequently enough that it’s a significant driving force.
But it used to be that I didn’t notice myself doing this at all, this is just how it felt to be a person with goals and desires, and doing mythic mode was just how that manifested in my subconscious desires. And when I followed along in stories and fiction it gave rise to a very similar feeling. In some cases, this might have manifested by my “trying on” different roles by imagining myself in the story as a character. To this day this still happens to a degree, with the additional noticing of it happening in real time, and questioning if I can subtly modify these stories in any way. I’m also much, much more careful about how seriously I take these feelings. I could probably attribute several great missteps to relying on it too heavily.
Since this kind of story-creating and script writing seems to happen so fluidly and frequently, it seems very difficult to know how and when it arises or is triggered, which seems to happen very subconsciously, and very hard to know how changing any individual piece of it will affect things down the road.
You seem to have re-derived Jungian Archetypes with the distributed network / Omega playing the role of the collective unconscious. I think the main difference is that you posit the distributed intelligence to be able to predict people’s actions. I think this is probably more true the more we believe it is true—obviously it sort of “wants” you to be predictable by rewarding you for following some sort of predefined path, which by definition are easier to predict. The punishment for not following these paths is for it to remove your ability to predict how the network will respond. How I interpret your advice here is that if we find ourselves unhappy with our state of affairs we should try to find locations where there are either forks in the paths or places where they are quite close together to make a quick jump...never straying outside of a path for very long.
The main problem I see is that we need to be able to make predictions in the spaces between narratives, which according to this framework is difficult if not impossible.
Or more generally: Break up a larger and more difficult task into a chain of subtasks, and see if you have enough willpower to accomplish just the first piece. If you can, you can allow yourself a sense of accomplishment and use that as a willpower boost to continue, or try to break up the larger task even further.
If this works and people are able to get themselves to do more complex and willpower heavy tasks that they wouldn’t normally be able to do, wouldn’t that be a good thing by default? Or are you worried that it would allow people with poorly aligned incentives to do more damage?
Circling seems like one of those things where both its promoters and detractors vastly overestimate its effects, either positive or negative. Like a lot the responses to this are either “it’s pretty cool” or “it’s pretty creepy.” What about “meh”? The most likely outcome is that Circling does something extremely negligible if it does anything at all, or if it does seem to have some benefit it’s because of the extra hour you set aside to think about things without many other distractions. In which case, a question I’d ask is “What was inadequate about the boring technique that doesn’t have a name because it’s so obvious?” Or, if you want to make a comparison to meditation / hypnosis, other stuff: When you stumbled across a Chesterton’s Fence, what made you go “Hey, let’s try moving this fence 50 feet to the left!”? You’ll either get a) something that works pretty much the same as some more traditional practice or b) something that doesn’t work at all.
Personally I wonder how much of this disagreement can be attributed to prematurely settling on specifc fundamental positions or some hidden metaphysics that certain organizations have (perhaps unknowingly) committed to—such as dualism or pansychism. One of the most salient paragraphs from Scott’s article said:
Morality wasn’t supposed to be like this. Most of the effective altruists I met were nonrealist utilitarians. They don’t believe in some objective moral law imposed by an outside Power. They just think that we should pursue our own human-parochial moral values effectively. If there was ever a recipe for a safe and milquetoast ethical system, that should be it. And yet once you start thinking about what morality is – really thinking, the kind where you try to use mathematical models and formal logic – it opens up into these dark eldritch vistas of infinities and contradictions. The effective altruists started out wanting to do good. And they did: whole nine-digit-sums worth of good, spreadsheets full of lives saved and diseases cured and disasters averted. But if you really want to understand what you’re doing – get past the point where you can catch falling apples, to the point where you have a complete theory of gravitation – you end up with something as remote from normal human tenderheartedness as the conference lunches were from normal human food.
I feel like this quote has some extremely deep but subtly stated insight that is in alignment with some of the points you made. Somehow, even if we all start from the position that there is no univeral or ultimately real morality, when we apply all of our theorizing, modeling, debate, measurement, thinking, etc., this somehow leads us to making absolutist conclusions about what the “truly most important thing” is. And I wonder if this is primarily a social phenomenon: In the process of debate and organizing groups of people to accomplish things, it’s easier if we all converge to agreement about specific and easy to state questions.
A possible explanation for Scott’s observed duality between the “suits” on the one hand who just want to do the most easily-measurable good, and the “weirdos” on the other hand who want to converge to rigorous answers on the toughest of philosophical questions (and those answers tend to look pretty bizarre), and the fact that these are often the same people—my guess is this has something to do with coverging to agreement on relatively formalizable questions. Those questions often appear in two forms: The “easy to measure” kind of questions (how many people are dying from malaria, how poor is this group of people, etc.), and the “easy to model or theorize about” questions (what do we mean by suffering, what counts as a conscious being, etc.), and so you see a divergence of activity and effort spent between those two forms of questions.
“Easy” is meant in a relative sense, of course. Unfortunately, it seems that the kind of questions that interest you (and which I agree are of crucial importance) fall into the “relatively hard” category, and therefore are much more difficult to organize concerted efforts around.
I’m actually having a hard time deciding what kind of superstimuli are having too strong of a detrimental effect on my actions. The reason for this is that some superstimuli also act as a willpower restorer. Take music, for example. Listening to music does not usually get mentioned as a bad habit, but it also is an extremely easy stimuli to access, requires little to no attention or effort to maintain use of, and at least for me, tends to amplify the degree of mind wandering and daydreaming. On the other hand, it is a huge mood booster and increases confidence and determination to complete a lot of other tasks during the day, so in that regard it does seem to be helpful. But I could probably say something similar about many superstimuli, and so I wonder if straight “giving up” would be a less effective strategy than trying to optimize some kind of schedule for usage of each type of superstimulus.
I’m not actually seeing why this post is purely an instance of conjunctive fallacy. A lot of the details he describes are consequences of cars being autonomous or indirect effects of this. And that’s not to say there are no errors here, just that I don’t think it’s merely a list of statements A,B,C,etc with no causal relationship.