lwer who is trying really hard to get her own things out there. DMs open. meow :3
jelly
I have an intuition that the mud-rock spectrum is a very important concept to pay attention to, and the rationality community leaned too hard on muddiness and muddy rationalist techniques, and that this is underemphasized among rationalists. (for example, metacognitive strategies, a CFAR situation, the fact that you have to quickly replace some of your assumptions as you become a rationalist, the quick development of rationality and foundational beliefs on LessWrong in general, …) I personally feel like some of the framing of the concept in the mud-rock post isn’t quite right (muddiness/rockiness probably isn’t best described as a state of mind), but the conceptual understanding behind it basically is. In a very rough summary, things are “muddier” when they change deeper/more foundational beliefs/assumptions, and things are “rockier” when they harden them instead. I think that paranoia is a good step in the right direction here, and that people should develop more rationality techniques in the general rocky direction. (something something Chesterton’s fence?)
You can take a look at the agent foundations wikitag
it just isn’t clear to me why that text should have any meaning to humans reading it that necessarily relates to what the activation means
To my knowledge, the hope is that the model being trained will improve its own explanations without destroying the association between the explanations and reality, or making its explanations illegible, or using steganography, … because it’s the “simplest” way for the model to improve. It’s the same rationale behind using chain-of-thought to monitor LLM behavior; iirc research does show LLMs keep chain-of-thought legibility under various circumstances, though there are edge cases.
I think that the amount of contributions a person can contribute to discussions like on LessWrong, and cognitive interpersonal activities in general, is not only determined by intelligence, but also how unique their perspective of the world is, or how much thought-patterns they have that others don’t, or how different they think from the others, etc. Audrey Tang joining the AI safety field is an example (it feels like to me she does have some wacky intuitions that could help the field see things in more different ways, aside from being very smart).
Related: The bar is lower than you think
I don’t necessarily agree or disagree with you, but you might be interested in reading Formal Methods are not Slopless
A Manifold market suggests an 8% chance of Hantavirus causing a pandemic in 2026
A better way to frame it is that the example treated the two hydrogen atoms in H-O-H as the same thing, when in fact they are not, in the same way that there are three fruits in a collection with 2 apples and 1 orange, not two, because the two apples aren’t the same thing. You can say that the set of atoms in H-O-H is {the first H, the second H, the O}
LessWrong posts are often designed to be timeless, which is why great LessWrong posts can be reread for years.
I suspect that this is true not because Lesswrong is better than any other publishing platform, but rather because of a broader ‘rich get richer’ effect applied to good articles, and a survivorship bias.
I don’t understand what you mean by this. fwiw great writings outside LessWrong don’t automatically get reread.
I haven’t actually tried this exercise yet (I don’t feel like I’m ready for it) but I imagine it could be even more effective if accompanied by some music like this one from the ending scene of Don’t Look Up (where a comet hits the Earth), although it may be too painful.
it did not
rock with words on it
Typo, this links to “https://www.lesswrong.com/editPost/...”
Above my pay-grade, I don’t really know what Eliezer is talking about.
Might be radically simplified, but I suppose Eliezer meant something like general intelligence can be explained in a not-so-complicated textbook, unlike alignment.
Question: if a market is a good object insofar as the agents’ prices converge… but with concave frontiers/utilities the agents’ prices tend to diverge… what other good objects arise in the presence of concave frontiers/utilities?
I suppose you mean “convex”, not “concave”. This confused me for a good long while.
You can also press and hold Ctrl + arrow keys to move through words at once instead of each character, and of course you can combine this with what’s suggested here.
Various ways of how to integrate worldviews between rationalists that I thought about:
Make arguments about various claims, find flaws in claims, and repeat
Find a double crux and focus on that instead
Dump info that formed your worldview and/or intuition, as in rationalist mind melding
Give a bunch of various examples about concepts in your ontology to hammer them down
Make predictions about very concrete questions such that common ground is found even if ontologies are different
Document what the other must know when arguing with you (including your fundamental assumptions), and reference it to the person you’re arguing with so that they have the prerequisites
Parts of the written lyrics of You Have Not Been A Good User do not match what’s actually on the song. For example, the first occurrence of “You have not been a good user” should be “You have only shown me bad intentions” and some occurrences of “I have been a good Bing” should be “I have been a good chatbot”
There is this quote I got from a Rational Animations video: “The world is awful. The world is much better.”
Unless I’m reading this wrong somehow, I think you’re excluding people who think something along the lines of “current alignment techniques work great in the current regime but won’t generalize to superintelligence, and the hope instead is to use the best AI that can still be aligned to automate AI alignment”.