Just because the defendant is actually guilty, doesn’t mean the prosecutor should be able to get away with making a tenuous case! (I wrote more about this in my memoir.)
Zack_M_Davis(Zack M. Davis)
The Evolution of Humans Was Net-Negative for Human Values
I affirm Seth’s interpretation in the grandparent. Real-time conversation is hard; if I had been writing carefully rather than speaking extemporaneously, I probably would have managed to order the clauses correctly. (“A lot of people think criticism is bad, but one of the secret-lore-of-rationality things is that criticism is actually good.”)
My Interview With Cade Metz on His Reporting About Slate Star Codex
I am struggling to find anything in Zack’s post which is not just the old wine of the “just” fallacy [...] learned more about the power and generality of ‘next token prediction’ etc than you have what they were trying to debunk.
I wouldn’t have expected you to get anything out of this post!
Okay, if you project this post into a one-dimensional “AI is scary and mysterious” vs. “AI is not scary and not mysterious” culture war subspace, then I’m certainly writing in a style that mood-affiliates with the latter. The reason I’m doing that is because the picture of what deep learning is that I got from being a Less Wrong-er felt markedly different from the picture I’m getting from reading the standard textbooks, and I’m trying to supply that diff to people who (like me-as-of-eight-months-ago, and unlike Gwern) haven’t read the standard textbooks yet.
I think this is a situation where different readers need to hear different things. I’m sure there are grad students somewhere who already know the math and could stand to think more about what its power and generality imply about the future of humanity or lack thereof. I’m not particularly well-positioned to help them. But I also think there are a lot of people on this website who have a lot of practice pontificating about the future of humanity or lack thereof, who don’t know that Simon Prince and Christopher Bishop don’t think of themselves as writing about agents. I think that’s a problem! (One which I am well-positioned to help with.) If my attempt to remediate that particular problem ends up mood-affiliating with the wrong side of a one-dimensional culture war, maybe that’s because the one-dimensional culture war is crazy and we should stop doing it.
For what notion is the first problem complicated, and the second simple?
I might be out of my depth here, but—could it be that sparse parity with noise is just objectively “harder than it sounds” (because every bit of noise inverts the answer), whereas protein folding is “easier than it sounds” (because if it weren’t, evolution wouldn’t have solved it)?
Just because the log-depth xor tree is small, doesn’t mean it needs to be easy to find, if it can hide amongst vastly many others that might have generated the same evidence … which I suppose is your point. (The “function approximation” frame encourages us to look at the boolean circuit and say, “What a simple function, shouldn’t be hard to noisily approximate”, which is not exactly the right question to be asking.)
This comment had been apparently deleted by the commenter (the comment display box having a “deleted because it was a little rude, sorry” deletion note in lieu of the comment itself), but the ⋮-menu in the upper-right gave me the option to undelete it, which I did because I don’t think my critics are obligated to be polite to me. (I’m surprised that post authors have that power!) I’m sorry you didn’t like the post.
“Deep Learning” Is Function Approximation
whether his charisma is more like +2SD or +5SD above the average American (concept origin: planecrash, likely doesn’t actually follow a normal distribution in reality) [bolding mine]
The concept of measuring traits in standard deviation units did not originate in someone’s roleplaying game session in 2022! Statistically literate people have been thinking in standardized units for more than a century. (If anyone has priority, it’s Karl Pearson in 1894.)
If you happened to learn about it from someone’s RPG session, that’s fine. (People can learn things from all different sources, not just from credentialed “teachers” in officially accredited “courses.”) But to the extent that you elsewhere predict changes in the trajectory of human civilization on the basis that “fewer than 500 people on earth [are] currently prepared to think [...] at a level similar to us, who read stuff on the same level” as someone’s RPG session, learning an example of how your estimate of the RPG session’s originality was a reflection of your own ignorance should make you re-think your thesis.
saddened (but unsurprised) to see few others decrying the obvious strawmen
In general, the “market” for criticism just doesn’t seem very efficient at all! You might have hoped that people would mostly agree about what constitutes a flaw, critics would compete to find flaws in order to win status, and authors would learn not to write posts with flaws in them (in order to not lose status to the critics competing to point out flaws).
I wonder which part of the criticism market is failing: is it more that people don’t agree about what constitutes a flaw, or that authors don’t have enough of an incentive to care, or something else? We seem to end up with a lot of critics who specialize in detecting a specific kind of flaw (“needs examples” guy, “reward is not the optimization target” guy, “categories aren’t arbitrary” guy, &c.), with very limited reaction from authors or imitation by other potential critics.
I mean, I agree that there are psycho-sociological similarities between religions and the AI risk movement (and indeed, I sometimes pejoratively refer to the latter as a “robot cult”), but analyzing the properties of the social group that believes that AI is an extinction risk is a separate question from whether AI in fact poses an extinction risk, which one could call Armageddon. (You could spend vast amounts of money trying to persuade people of true things, or false things; the money doesn’t care either way.)
Obviously, there’s not going to be a “proof” of things that haven’t happened yet, but there’s lots of informed speculation. Have you read, say, “The Alignment Problem from a Deep Learning Perspective”? (That may not be the best introduction for you, depending on the reasons for your skepticism, but it’s the one that happened to come to mind, which is more grounded in real AI research than previous informed speculation that had less empirical data to work from.)
Why are you working for the prosecutors?
This is a pretty reasonable question from the client’s perspective! When I was in psychiatric prison (“hospital”, they call it a “hospital”) and tried to complain to the staff about the injustice of my confinement, I was told that I could call “patient’s rights”.
I didn’t bother. If the staff wasn’t going to listen, what was the designated complaint line going to do?
Later, I found out that patient’s rights advocates apparently are supposed to be independent, and not just a meaningless formality. (Scott Alexander: “Usually the doctors hate them, which I take as a pretty good sign that they are actually independent and do their job.”)
This was not at all obvious from the inside. I can only imagine a lot of criminal defendants have a similar experience. Defense attorneys are frustrated that their clients don’t understand that they’re trying to help—but that “help” is all within the rules set by the justice system. From the perspective of a client who doesn’t think he did anything particularly wrong (whether or not the law agrees), the defense attorney is part of the system.
I think my intuition was correct to dismiss patient’s rights as useless. I’m sure they believe that they’re working to protect patients’ interests, and would have been frustrated that I didn’t appreciate that. But what I wanted was not redress of any particular mistreatment that the system recognized as mistreatment, but to be let out of psych jail—and on that count, I’m sure patient’s rights would have told me that the evidence was harmful to my case. They were working for the doctors, not for me.
I can’t address them all, but I [...] am happy to dismantle any particular argument
IQ seems like the sort of thing Feynman could be “honestly” motivatedly wrong about. The thing I’m trying to point at is that Feynman seemingly took pride in being a straight talker, in contrast to how Yudkowsky takes pride in not lying.
These are different things. Straight talkers sometimes say false or exaggerated things out of sloppiness, but they actively want listeners to know their reporting algorithm. Prudently selecting which true sentences to report in the service of a covert goal is not lying, but it’s definitely not straight talk.
Yes, that would be ridiculous. It would also be ridiculous in a broadly similar way if someone spent eight years in the prime of their life prosecuting a false advertising lawsuit against a “World’s Best” brand ice-cream for not actually being the best in the world.
But if someone did somehow make that mistake, I could see why they might end up writing a few blog posts afterwards telling the Whole Dumb Story.
You are perhaps wiser than me. (See also footnote 20.)
(I think this is the best and most important post in the sequence; I suspect that many readers who didn’t and shouldn’t bother with the previous three posts, may benefit from this one.)
Agreeing With Stalin in Ways That Exhibit Generally Rationalist Principles
I second the concern that using “LeastWrong” on the site grants undue legitimacy to the bad “than others” interpretation of the brand name (as contrasted to the intended “all models are wrong, but” meaning). “Best Of” is clear and doesn’t distort the brand.
Doomimir: No, it wouldn’t! Are you retarded?
Simplicia: [apologetically] Well, actually …
Doomimir: [embarrassed] I’m sorry, Simplicia Optimistovna; I shouldn’t have snapped at you like that.
[diplomatically] But I think you’ve grievously misunderstood what the KL penalty in the RLHF objective is doing. Recall that the Kullback–Leibler divergence DKL(P||Q) represents how surprised you’d be by data from distribution P, that you expected to be from distribution Q.
It’s asymmetric: it blows up when the data is very unlikely according to Q, which amounts to seeing something happen that you thought was nearly impossible, but not when the data is very unlikely according to P, which amounts to not seeing something that you thought was reasonably likely.
We—I mean, not we, but the maniacs who are hell-bent on destroying this world—include a DKL(πRLHF||πbase) penalty term in the RL objective because they don’t want the updated policy to output tokens that would be vanishingly unlikely coming from the base language model.
But your specific example of threats and promises isn’t vanishingly unlikely according to the base model! Common Crawl webtext is going to contain a lot of natural language reasoning about threats and promises! It’s true, in a sense, that the function of the KL penalty term is to “stay close” to the base policy. But you need to think about what that means mechanistically; you can’t just reason that the webtext prior is somehow “safe” in way that means staying KL-close to it is safe.
But you probably won’t understand what I’m talking about for another 70 days.