Worth noting that the reason SquirrelInHell is dead is that they committed suicide after becoming mentally unstable, likely in part due to experimentation with exotic self-modification techniques. This one in particular seems fine AFAICT, but, ya know, caveat utilitor.
interstice
Recent Progress in the Theory of Neural Networks
Tao, Kontsevich & others on HLAI in Math
Noting that the author deleted a critical comment which was somewhat rude but IMO made some reasonable points. That’s fair enough, but in conjunction with the way the site handles deletions this strikes me as bad, since there’s no way of (a) seeing which user posted the deleted comment (this might be a bug?) (b) examining the text of deleted comments. Together this means that you can’t distinguish cases where a post attracts no criticism(an important signal) and cases where there were critical comments that were deleted, and you can’t examine deleted criticisms.
In the same vein, I humbly suggest “The entire bee movie but every time they say bee it gets faster” as a good model for what the singularity will seem like from our perspective.
Alignment Might Never Be Solved, By Humans or AI
It’s neither a hoax nor a HLAI, instead a predictable consequence of prompting a LLM with questions about its sentience: it will imitate the answers a human might give when prompted, or the sort of answers an AI in a science fiction story would give.
[Question] What’s the Relationship Between “Human Values” and the Brain’s Reward System?
As John conjectured, alt-complexity is a well-known notion in algorithmic information theory, and differs from K-complexity by at most a constant. See section 4.5 of this book for a proof. So I think the stuff about how physics favors alt-complexity is a bit overstated—or at least, can only contribute a bounded amount of evidence for alt-complexity.
ETA: this result is about the complexity of finite strings, not semimeasures on potentially infinite sequences; for such semimeasures, there actually is a non-constant gap between the log-total-probability and description complexity.
Interpretability seems like our clear best bet for developing a more principled understanding of how deep learning works
If our goal is developing a principled understanding of deep learning, directly trying to do that is likely to be more effective than doing interpretability in the hope that we will develop a principled understanding as a side effect. For this reason I think most alignment researchers have too little awareness of various attempts in academia to develop “grand theories” of deep learning such as the neural tangent kernel. I think the ideal use for interpretability in this quest is as a way of investigating how the existing theories break down—e.g. if we can explain 80% of a given model’s behavior with the NTK, what are the causes of the remaining 20%? I think of interpretability as basically collecting many interesting data points; this type of collection is essential, but it can be much more effective when it’s guided by a provisional theory which tells you what points are expected and what are interesting anomalies which call for a revision of the theory, which in turn guides further exploration, etc.
NTK/GP Models of Neural Nets Can’t Learn Features
Interesting that you think the risk of us “going crazy” after getting AI in some way is roughly comparable to overall AI takeover risk. I’d be interested to hear more if you have more detailed thoughts here. On this view it also seems like it could be a great x-risk-reduction opportunity if there are any tractable strategies, given how neglected it is compared to takeover risk.
- Meta Questions about Metaphilosophy by 1 Sep 2023 1:17 UTC; 148 points) (
- 28 Jun 2023 20:31 UTC; 35 points) 's comment on Carl Shulman on The Lunar Society (7 hour, two-part podcast) by (
- 2 Sep 2023 6:57 UTC; 1 point) 's comment on Meta Questions about Metaphilosophy by (
I think it would also be valuable to have someone translate “in the other direction” and take (for example) Paul Christiano’s writings and produce vivid, concrete parable-like stories based on them. I think such stories can be useful not just as persuasive tools but also epistemically, as a way of grounding the meaning of abstract definitions of the sort Paul likes to argue in terms of.
In rationalist circles, you might find out that you’re being instrumentally or epistemically irrational in the course of a debate—the norms of such a debate encourage you to rebut your opponent’s points if you think they are being unfair. In contrast, the central thesis of this book is that white people disputing their racism is a mechanism for protecting white supremacy and needs to be unlearned, along with other cornerstones of collective epistemology such as the notion of objective knowledge. So under the epistemic conditions promoted by this book, I expect “found about being racist” to roughly translate to “was told you were racist”.
ZMD: Looking at “Silicon Valley’s Safe Space”, I don’t think it was a good article. Specifically, you wrote,
In one post, [Alexander] aligned himself with Charles Murray, who proposed a link between race and I.Q. in “The Bell Curve.” In another, he pointed out that Mr. Murray believes Black people “are genetically less intelligent than white people.”
End quote. So, the problem with this is that the specific post in which Alexander aligned himself with Murray was not talking about race. It was specifically talking about whether specific programs to alleviate poverty will actually work or not.
So on the one hand, this particular paragraph does seem like it’s misleadingly implying Scott was endorsing views on race/iq similar to Murray’s even though, based on the quoted passages alone, there is little reason to think that. On the other hand, it’s totally true that Scott was running a strategy of bringing up or “arguing” with hereditarians with the goal of broadly promoting those views in the rationalist community, without directly being seen to endorse them. So I think it’s actually pretty legitimate for Metz to bring up incidents like this or the Xenosystems link in the blogroll. Scott was basically using a strategy of communicating his views in a plausibly deniable way by saying many little things which are more likely if he was a secret hereditarian, but any individual instance of which is not so damning. So I feel it’s total BS to then complain about how tenuous the individual instances Metz brought up are—he’s using it as an example or a larger trend, which is inevitable given the strategy Scott was using.
(This is not to say that I think Scott should be “canceled” for these views or whatever, not at all, but at this stage the threat of cancelation seems to have passed and we can at least be honest about what actually happened)
Not sure how much I believe this myself, but Jacob cannell has an interesting take that social status isn’t a “base drive” either, but is basically a proxy for “empowerment”, influence over future states of the world. If that’s true it’s perhaps not so surprising that we’re still well-aligned, since “empowerment” is in some sense always being selected for by reality.
I always liked the interpretation of the determinant as measuring the expansion/contraction of n-dimensional volumes induced by a linear map, with the sign being negative if the orientation of space is flipped. This makes various properties intuitively clear such as non-zero determinant being equivalent to invertibility.
In the analogy, it’s only possible to build a calculator that outputs the right answer on non-13 numbers because you already understand the true nature of addition. It might be more difficult if you were confused about addition, and were trying to come up with a general theory by extrapolating from known cases—then, thinking 6 + 7 = 15 could easily send you down the wrong path. In the real world, we’re similarly confused about human preferences, mind architecture, the nature of politics, etc., but some of the information we might want to use to build a general theory is taboo. I think that some of these questions are directly relevant to AI—e.g. the nature of human preferences is relevant to building an AI to satisfy those preferences, the nature of politics could be relevant to reasoning about what the lead-up to AGI will look like, etc.
Seems to me that the ‘helpful’ works you listed contain falsehoods and wrong associations. They also contain useful information and enjoyable aspects, true—but couldn’t the same be said of lots of non-”rational” fiction? As it stands this just looks like a list of fiction that’s popular among our subculture.
At this point timelines look short enough that you likely increase your personal odds of survival more by increasing the chance that AI goes well than by speeding up timelines. Also I don’t see why you think cryonics doesn’t make sense as alternative option.