I’m an independent researcher currently working on a sequence of posts about consciousness. You can send me anonymous feedback here: https://www.admonymous.co/rafaelharth. If it’s about a post, you can add [q] or [nq] at the end if you want me to quote or not quote it in the comment section.
Rafael Harth
One thing I’d like to say at this point is that I think you (jessicata) have shown very high levels of integrity in responding to comments. There’s been some harsh criticism of your post, and regardless of how justified it is, it takes character not to get defensive, especially given the subject matter. To me, this is also a factor in how I think about the post itself.
I feel like you can summarize most of this post in one paragraph:
It is not the case that an observation of things happening in the past automatically translates into a high probability of them continuing to happen. Solomonoff Induction actually operates over possible programs that generate our observation set (and in extension, the observable universe), and it may or not may not be the case that the simplest universe is such that any given trend persists into the future. There are no also easy rules that tell you when this happens; you just have to do the hard work of comparing world models.
I’m not sure the post says sufficiently many other things to justify its length.
I can’t really argue against this post insofar as it’s the description of your mental state, but it certainly doesn’t apply to me. I became way happier after trying to save the world, and I very much decided to try to save the world because of ethical considerations rather than because that’s what I happened to find fun. (And all this is still true today.)
Eliezer Yudkowsky’s portrayal of a single self-recursively improving AGI (later overturned by some applied ML researchers)
I’ve found myself doubting this claim, so I’ve read the post in question. As far as I can tell, it’s a reasonable summary of the fast takeoff position that many people still hold today. If all you meant to say was that there was disagreement, then fine—but saying ‘later overturned’ makes it sound like there is consensus, not that people still have the same disagreement they’ve had 13 years ago. (And your characterization in the paragraph I’ll quote below also gives that impression.)
In hindsight, judgements read as simplistic and naive in similar repeating ways (relying on one metric, study, or paradigm and failing to factor in mean reversion or model error there; fixating on the individual and ignoring societal interactions; assuming validity across contexts):
- 21 Mar 2021 18:43 UTC; 9 points) 's comment on Some blindspots in rationality and effective altruism by (EA Forum;
I think this argument doesn’t deserve anywhere near as much thought as you’ve given it. Caplan is committing a logical error, nothing else.
He probably reasoned as follows:
-
If determinism is true, I am computable.
-
Therefore, a large enough computer can compute what I will say.
-
Since my reaction is just more physics, those should be computable as well, hence it should also be possible to tell me what I will do after hearing the result.
This is wrong because “what Caplan outputs after seeing the prediction of our physics simualtor” is a system larger than the physics simulator and hence not computable by the physics simulator. Caplan’s thought experiment works as soon as you make it so the physics simulator is not causally entangled with Caplan.
I don’t think fixed points have any place in this analysis. Obviously, Caplan can choose to implement a function without a fixed point, like (edit: rather ), in fact he’s saying this in the comment you quoted. The question is why he can do this, since (as by the above) he supposedly can’t.
See also my phrasing of this problem and Richard_Kennaway’s answer. I think the real problem with his quote is that it’s so badly phrased that the argument isn’t even explicit, which paradoxically makes it harder to refute. You first have to reconstruct the argument, and then it gets easier to see why it’s wrong. But I don’t think there’s anything interesting there.
-
Option 2: Weak-upvote this if you give non-negligible consideration to what you think the total karma should be, but it isn’t your primary concern.
I feel like this result should have rung significant alarm bells. Bayes theorem is not a rule someone has come up with that has empirically worked out well. It’s a theorem. It just tells you a true equation by which to compute probabilities. Maybe if we include limits of probability (logical uncertainty/infinities/anthropics) there would be room for error, but the setting you have here doesn’t include any of these. So Bayesians can’t commit a fallacy. There is either an error in your reasoning, or you’ve found an inconsistency in ZFC.
So where’s the mistake? Well, as far as I’ve understood (and I might be wrong), all you’ve shown is that if we restrict ourselves to three priors (uniform, streaky, switchy) and observe a distribution that’s uniform, then we’ll accumulate evidence against streaky more quickly than against switchy. Which is a cool result since the two do appear symmetrical, as you said. But it’s not a fallacy. If we set up a game where we randomize (uniform, streaky, switchy) with 1⁄3 probability each (so that the priors are justified), then generate a sequence, and then make people assign probabilities to the three options after seeing 10 samples, then the Bayesians are going to play precisely optimally here. It just happens to be the case that, whenever steady is randomized, the probability for streaky goes down more quickly than that for switchy. So what? Where’s the fallacy?
First upshot: whenever she’s more confident of Switchy than Sticky, this weighted average will put more weight on the Switchy (50-c%) term than the Sticky (50+c%) term. This will her to be less than 50%-confident the streak will continue—i.e. will lead her to commit the gambler’s fallacy.
In other words, if a Bayesian agent has a prior across three distributions, then their probability estimate for the next sampled element will be systematically off if only the first distribution is used. This is not a fallacy; it happens because you’ve given the agent the wrong prior! You made her equally uncertain between three hypotheses and then assumed that only one of them is true.
And yeah, there are probably fewer than sticky and streaky distributions each other there, so the prior is probably wrong. But this isn’t the Bayesian’s fault. The fair way to set up the game would be to randomize which distribution is shown first, which again would lead to optimal predictions.
I don’t want to be too negative since it is still a cool result, but it’s just not a fallacy.
Baylee is a rational Bayesian. As I’ll show: when either data or memory are limited, Bayesians who begin with causal uncertainty about an (in fact independent) process—and then learn from unbiased data—will, on average, commit the gambler’s fallacy.
Same as above. I mean the data isn’t “unbiased”, it’s uniform, which means it is very much biased relative to the prior that you’ve given the agent.
Strong upvote from me. This new technology has helped me view the existing content from a different angle.
Survey on model updates from reading this post. Figuring out to what extent this post has led people to update may inform whether future discussions are valuable.
Results: (just posting them here, doesn’t really need its own post)
The question was to rate agreement on the 1=Paul to 9=Eliezer axis before and after reading this post.
Data points: 35
Mean:
Median:
Anynymous Comments:
Agreement more on need for actions than on probabilities. Would be better to first present points of agreement (that it is at least possible for non(dangerously)-general AI to change situation).
the post was incredibly confusing to me and so I haven’t really updated at all because I don’t feel like I can crisply articulate yudkowsky’s model or his differences with christiano
With the rate of posts being that far up, there may have been a lot of posts that were totally overlooked, especially those that started at 1 or 2 karma, since they disappeared from the frontpage very quickly. It may be a good time to look a bit through the All Posts page (which I assume most people don’t).
(That could be true even if some people upvoted more during this week rather than less. Maybe the posts that went above 20 or even 10 karma had an advantage, but I doubt the 1-2 karma posts did.)
I gotta say, having the power to just gift someone 2$-21$ with the click of a button, at no cost to myself, was pretty neat :-)
I think the central argument of this post is grossly wrong. Sure, you can find some people who want to censor based on which opinions feel too controversial for their taste. But pretending as if that’s the sole motivation is a quintessential strawman. It’s assuming the dumbest possible reason for why other person has a certain position. It’s like if you criticize the bible, and I assume it’s only because you believe the Quran is the literal word of god instead.
We do not censor other people more conventional-minded than ourselves. We only censor other people more-independent-minded than ourselves. Conventional-minded people censor independent-minded people. Independent-minded people do not censor conventional-minded people. The most independent-minded people do not censor anyone at all.
Bullshit. If your desire to censor something is due to an assessment of how much harm it does, then it doesn’t matter how open-minded you are. It’s not a variable that goes into the calculation.
I happen to not care that much about the object-level question anymore (at least as it pertains to LessWrong), but on a meta level, this kind of argument should be beneath LessWrong. It’s actively framing any concern for unrestricted speech as poorly motivated, making it more difficult to have the object-level discussion.
And the other reason it’s bullshit is that no sane person is against all censorship. If someone wrote a post here calling for the assassination of Eliezer Yudkowsky with his real-life address attached, we’d remove the post and ban them. Any sensible discussion is just about where to draw the line.
I would agree that this post is directionally true, in that there is generally too much censorship. I certainly agree that there’s way too much regulation. But it’s also probably directionally true to say that most people are too afraid of technology for bad reasons, and that doesn’t justify blatantly dismissing all worries about technology. We have to be more specific than that.
Any attempt to censor harmful ideas actually suppresses the invention of new ideas (and correction of incorrect ideas) instead.
Proves too much (like that we shouldn’t ban gain-of-function research).
One of the lessons I draw from this: listen to gwern.
It is fascinating to learn about the extent to which AI technologies like GPT-4 and Copilot X have been integrated into the operations of LessWrong. It is understandable that the LW team wanted to keep this information confidential in order to prevent the potential negative consequences of revealing the economic value of AI.
However, with the information now out in the open, it’s important to discuss the ethical implications of such a revelation. It could lead to increased investment in AI, which may or may not be a good thing, depending on how it is regulated and controlled. On one hand, increased investment could accelerate AI development, leading to new innovations and benefits to society. On the other hand, it could potentially exacerbate competitive dynamics, increase the risk of misuse, and lead to negative consequences for society.
Regarding the use of AI on LessWrong specifically, it’s essential to consider the impact on users and the community as a whole. If AI is moderating comment sections and evaluating new users, it raises questions about transparency, fairness, and privacy. While it may be more efficient and even potentially more accurate, there should be a balance between human oversight and AI automation to ensure that the platform remains a safe and open space for discussions and debates.
Lastly, the mention of Oliver Habryka automating his online presence might be a light-hearted comment, but it also highlights the potential personal and social implications of AI technologies. While automating certain aspects of our lives can free up time for other pursuits, it is important to consider the consequences of replacing human interaction with AI-generated content. What might we lose in terms of authenticity, spontaneity, and connection if we increasingly rely on AI to manage our online presence? It’s a topic that merits further reflection and discussion.
I consider GPT to be have falsified the “all humans are extremely close together on the relevant axis” hypothesis. Vanilla GPT-3 was already sort of like a dumb human (and like a smart human, sometimes). If it were a 1000x greater step from nothing to chimp than from chimp to Einstein, then Chat-GPT should, for all intents and purposes, have at least average human level intelligence. Yet it does not, at all; this quote from jbash’s post puts it well
You can take it step by step through a chain of simple inferences, and still have it give an obviously wrong, pattern-matched answer at the end.
Maybe the scale is true in some absolute sense—you can make a lot of excuses, like maybe GPT is based entirely on “log files” rather than thoughts or whatever. Maybe 90%+ of people who criticized the scale before did so for bad reasons. That’s all fine. But it doesn’t change the fact that the scale isn’t a useful model; in terms of performance, the step from chimp to Einstein is, in fact, hard.
The total absence of obvious output of this kind from the rest of the “AI safety” field even in 2020 causes me to regard them as having less actual ability to think in even a shallowly adversarial security mindset, than I associate with savvier science fiction authors. Go read fantasy novels about demons and telepathy, if you want a better appreciation of the convergent incentives of agents facing mindreaders than the “AI safety” field outside myself is currently giving you.
While this this may be a fair criticism, I feel like someone ought to point out that the vast majority of AI safety output (at least that I see on LW) isn’t trying to do anything like “sketch a probability distribution over the dynamics of an AI project that is nearing AGI”. This includes all technical MIRI papers I’m familiar with.
Perhaps we should be doing this (though, isn’t that more for AI forecasting/strategy rather than alignment? Of course still AI safety), but then the failure isn’t “no-one has enough security mindset” but rather something like “no-one has the social courage to tackle the problems that are actually important”. (This would be more similar to EY’s critique in the Discussion on AGI interventions post.)
While reading this, a thought popped into my head that feels important enough to share:
Could being “status-blind” in the sense that Eliezer claims to be (or perhaps somet other not yet well-understood status-related property) be strongly correlated to managing to create lots of utility? (In the sense of helping the world a lot).
Currenlty I consider Yudkowsky, Scott Alexander, and Nick Bostrom to be three of the most important people. After reading superintelligence and watching a bunch of interviews, one of first things I said about Nick Bostrom to a friend was that I felt like he legitimately has almost no status concerns (that was well before LW 2.0 launched). In case of S/A it’s less clear, but I suspect similar things.
(Eliezer did think neural nets wouldn’t work; he explicitly said it on the Lex Fridman podcast.)
Edit @request from gwern: at 11:30 in the podcast, Eliezer says,
back in the day I went around saying like, I do not think that just stacking more layers of transformers is going to get you all the way to AGI, and I think that GPT-4 is past where I thought this paradigm is going to take us, and I, you know, you want to notice when that happens, you want to say like “oops, I guess I was incorrect about what happens if you keep on stacking more transformer layers”
and then Fridman asks him whether he’d say that his intuition was wrong, and Eliezer says yes.
(I’m basing this on what I feel like – unlike you, Isusr, and Eliezer, I feel this emotion very strongly.)
I agree that Justin’s answer is missing the point. I also think your description isn’t quite right. You assume that what is inappropriate is based on social norms. That does not need to be true.
For example, I am not at all angry at the success of HPMoR because I think the success is appropriate. But my blood still boils in other cases where people are successful. And success isn’t even required – I can get angry at someone even attempting to do something that I consider inappropriate.
The regulation module that decides what is or isn’t appropriate is complicated and very bizarre. There have been several instances where I felt anger at someone for being successful immediately before reading their stuff, but then performing a perfect turnaround and deciding they’re high status and deserve even more success. I distinctly remember this happening with Scott Alexander and SSC. I also got vaguely angry at your generic description of the guy being successful, but reading that he worked on it really hard did remedy that.
I think there might be a subset of [people who have this emotion] who base what is appropriate primarily on social norms, and that’s what you describe.
I also suspect that blindness to this emotion is disproportionately common on LessWrong because it correlates with all sorts of good things. I certainly think I would be much more successful if I had never felt it.
I think calling this an assumption is misleading. He’s written extensively about why he thinks this is true. It’s a result/output of his model.
I would agree with this if Eliezer had never properly engaged with critics, but he’s done that extensively. I don’t think there should be a norm that you have to engage with everyone, and “ok choose one point, I’ll respond to that” seems like better than not engaging with it at all. (Would you have been more enraged if he hadn’t commented anything?)