What are “autists” supposed to do in a context like this?
TsviBT(Tsvi Benson-Tilsen)
I mean the main thing I’d say here is that we just are going way too slowly / are not close enough. I’m not sure what counts as “jettisoning”; no reason to totally ignore anything, but in terms of reallocating effort, I guess what I advocate for looks like jettisoning everything. If you go from 0% or 2% of your efforts put toward questioning basic assumptions and theorizing based on introspective inspection and manipulation of thinking, to 50% or 80%, then in some sense you’ve jettisoned everything? Or half-jettisoned it?
Thanks, this is helpful to me.
An example of something: do LLMs have real understanding, in the way humans do? There’s a bunch of legible stuff that people would naturally pay attention to as datapoints associated with whatever humans do that’s called “real understanding”. E.g. being able to produce grammatical sentences, being able to answer a wide range of related questions correctly, writing a poem with s-initial words, etc. People might have even considered those datapoints dispositive for real understanding. And now LLMs can do those. … Now, according to me LLMs don’t have much real understanding, in the relevant sense or in the sense humans do. But it’s much harder to point at clear, legible benchmarks that show that LLMs don’t really understand much, compared to previous ML systems.
then clearly some of those datapoints are more useful than others (as brainstorming aids for developing the underlying theoretical framework),
The “as brainstorming aids for developing the underlying theoretical framework” is doing a lot of work there. I’m noticing here that when someone says “we can try to understand XYZ by looking at legible thing ABC”, I often jump to conclusions (usually correctly actually) about the extent to which they are or aren’t trying to push past ABC to get to XYZ with their thinking. A key point of the OP is that some datapoints may be helpful, but they aren’t the main thing determining whether you get to [the understanding you want] quickly or slowly. The main thing is, vaguely, how you’re doing the brainstorming for developing the underlying theoretical framework.
I don’t see why “legible phenomena” datapoints would be systematically worse than other datapoints.
I’m not saying all legible data is bad or irrelevant. I like thinking about human behavior, about evolution, about animal behavior; and my own thoughts are my primary data, which isn’t like maximally illegible or something. I’m just saying I’m suspicious of all legible data. Why?
Because there’s more coreward data available. That’s the argument of the OP: you actually do know how to relevantly theorize (e.g., go off and build a computer—which in the background involves theorizing about datastructures).
Because people streetlight, so they’re selecting points for being legible, which cuts against being close to the core of the thing you want to understand.
Because theorizing isn’t only, or even always mainly, about data. It’s also about constructing new ideas. That’s a distinct task; data can be helpful, but there’s no guarantee that reading the book of nature will lead you along such that in the background you construct the ideas you needed.
For example, the phenomenon “If I feel cold, then I might walk upstairs and put on a sweater” is “legible”, right? But if someone is in the very early stages of developing a theoretical framework related to goals and motivations, then they sure need to have examples like that in the front of their minds, right? (Or maybe you wouldn’t call that example “legible”?)
It’s legible, yeah. They should have it in mind, yeah. But after they’ve thought about it for a while they should notice that the real movers and shakers of the world are weird illegible things like religious belief, governments, progressivism, curiosity, invention, companies, child-rearing, math, resentment, …, which aren’t very relevantly described by the sort of theories people usually come up with when just staring at stuff like cold->sweater, AFAIK.
Hm. I think my statement does firmly include the linked paper (at least the first half of it, insofar as I skimmed it).
It’s becoming clear that a lot of my statements have background mindsets that would take more substantial focused work to exposit. I’ll make some gestural comments.
When I say “not a good way...” I mean something like “is not among the top X elements of a portfolio aimed at solving this in 30 years (but may very well be among the top X elements of a portfolio aimed at solving this in 300 years)”.
Streetlighting, in a very broad sense that encompasses most or maybe all of foregoing science, is a very good strategy for making scientific progress—maybe the only strategy known to work. But it seems to be too slow. So I’m not assuming that “good” is about comparisons between different streetlights; if I were, then I’d consider lots of linguistic investigations to be “good”.
In fairly wide generality, I’m suspicious of legible phenomena.
(This may sound like an extreme statement; yes, I’m making a pretty extreme version of the statement.)
The reason is like this: “legible” means something like “readily relates to many things, and to standard/common things”. If there’s a core thing which is alien to your understanding, the legible emanations from that core are almost necessarily somewhat remote from the core. The emanations can be on a path from here to there, but they also contain a lot of irrelevant stuff, and can maybe in principle be circumvented (by doing math-like reasoning), so to speak.
So looking at the bytecode of a compiled python program does give you some access to the concepts involved in the python program itself, but those concepts are refracted through the compiler, so what you’re seeing in the bytecode has a lot of structure that’s interesting and useful and relevant to thinking about programs more generally, but is not really specifically relevant to the concepts involved in this specific python program.
Concretely in the case of linguistics, there’s an upstream core which is something like “internal automatic conceptual engineering to serve life tasks and play tasks”.
((This pointer is not supposed to, by itself, distinguish the referent from other things that sound like they fit the pointer taken as a description; e.g., fine, you can squint and reasonably say that some computer RL thing is doing “internal automatic...” but I claim the human thing is different and more powerful, and I’m just trying to point at that as distinct from speech.))
That upstream core has emanations / compilations / manifestations in speech, writing, internal monologue. The emanations have lots of structure. Some of that structure is actually relevant to the core. A lot of that structure is not very relevant, but is instead mostly about the collision of the core dynamics with other constraints.
Phonotactics is interesting, but even though it can be applied to describe how morphemes interact in the arena of speech, I don’t think we should expect it to tell us much about morphemes; the additional complexity is about sounds and ears and mouths, and not about morphemes.
A general theory about how the cognitive representations of “assassin” and “assassinate” overlap and disoverlap is interesting, but even though it can be applied to describe how ideas interact in the arena of word-production, I don’t think we should expect it tell us much about ideas; the additional complexity is about fast parallel datastructures, and not about ideas.
In other words, all the “core of how minds work” is hidden somewhere deep inside whatever [CAT] refers to.
Then whatever that’s doing is a constraint in itself, and I can start off by going looking for patterns of activation that correspond to e.g. simple-but-specific mathematical operations that I can actuate in the computer.
It’s an interesting different strategy, but I think it’s a bad strategy. I think in the analogy this corresponds to doing something like psychophysics, or studying the algorithms involved in grammatically parsing a sentence; which is useful and interesting in a general sense, but isn’t a good way to get at the core of how minds work.
if your hypothesis were correct, Euler would not have had to invent topology in the 1700s
(I don’t understand the basic logic here—probably easier to chat about it later, if it’s a live question later.)
Thinking about it more, I want to poke at the foundations of the koan. Why are we so sure that this is a computer at all? What permits us this certainty, that this is a computer, and that it is also running actual computation rather than glitching out?
Why do you need to be certain? Say there’s a screen showing a nice “high-level” interface that provides substantial functionality (without directly revealing the inner workings, e.g. there’s no shell). Something like that should be practically convincing.
hash functions are meant to be maximally difficult,
I think the overall pattern of RAM activations should still tip you off, if you know what you’re looking for. E.g. you can see the pattern of collisions, and see the pattern of when the table gets resized. Not sure the point is that relevant though, we could also talk about an algorithm that doesn’t use especially-obscured components.
Doing so still never gets you to the idea of a homology sphere, and it isn’t enough to point towards the mathematically precise definition of an infinite 3-manifold without boundary.
I’m unsure about that, but the more pertinent questions are along the lines of “is doing so the first (in understanding-time) available, or fastest, way to make the first few steps along the way that leads to these mathematically precise definitions? The conjecture here is “yes”.
But yeah if you mean “I don’t think it scales to successfully staking out territory around a grift” that seems right.
No, it’s the central example for what would work in alignment. You have to think about the actual problem. The difficulty of the problem and illegibility of intermediate results means eigening becomes dominant, but that’s a failure mode.
If everyone calculates 67*23 in their head, they’ll reach a partial consensus. People who disagree with the consensus can ask for an argument, and they’ll get a convincing argument which will convince them of the correct answer; and if the argument is unconvincing, and they present a convincing argument for a different answer, that answer will become the consensus. We thus arrive at consensus with no eigening. If this isn’t how things play out, it’s because there’s something wrong with the consensus / with the people’s epistemics.
This is a reasonable question, but seems hard to answer satisfyingly. Maybe something with a similar spirit to “stands up to multiple rounds of cross-examination and hidden-assumption-explicitization”.
A different way of arriving at consensus? I’m kind of annoyed that there’s apparently a practice of not proactively thinking of examples, but ok:
If ~everyone is deferring, then they’ll converge on some combination of whoever isn’t deferring and whatever belief-like objects emerge from the depths in that context.
If ~everyone just wishes to be paid and the payers pay for X, then ~everyone will apparently believe X.
If someone is going around threatening people to believe X, then people will believe X.
It’s almost orthogonal to eigen-evaluation. You can arrive at consensus in lots of ways.
I didn’t read most of the post but it seems like you left out a little known but potentially important way to know whether research is good, which is something we could call “having reasons for thinking that your research will help with AGI alignment and then arguing about those reasons and seeing which reasons make sense”.
it should be straightforward (and more importantly, should not take so much time that it becomes daunting) to give reasons for that
NOPE!
to think that relaxing norms around the way in which particular kinds of information is communicating will not negatively affect the quality of the conversation that unfolds afterwards.
If this happens because someone says something true, relevant, and useful, in a way that doesn’t have alternative expressions that are really easy and obvious to do (such as deleting the statement “So and so is a doo-doo head”), then it’s the fault of the conversation, not the statement.
I’d be open to alternative words for “insane” the way I intended it.
I doubt that we’re going to get anything useful here, but as an indication of where I’m coming from:
I would basically agree with what you’re saying if my first comment had been ad hominem, like “Bogdan is a doo-doo head”. That’s unhelpful, irrelevant, mean, inflammatory, and corrosive to the culture. (Also it’s false lol.)
I think a position can be wrong, can be insanely wrong (which means something like “is very far from the truth, is wrong in a way that produces very wrong actions, and is being produced by a process which is failing to update in a way that it should and is failing to notice that fact”), and can be exactly opposite of the truth (for example, “Redwoods are short, grass is tall” is, perhaps depending on contexts, just about the exact opposite of the truth). And these facts are often knowable and relevant if true. And therefore should be said—in a truth-seeking context. And this is the situation we’re in.
If you had responded to my original comment with something like
“Your choice of words makes it seem like you’re angry or something, and this is coming out in a way that seems like a strong bid for something, e.g. attention or agreement or something. It’s a bit hard to orient to that because it’s not clear what if anything you’re angry about, and so readers are forced to either rudely ignore / dismiss, or engage with someone who seems a bit angry or standoffish without knowing why. Can you more directly say what’s going on, e.g. what you’re angry about and what you might request, so we can evaluate that more explicitly?”
or whatever is the analogous thing that’s true for you, then we could have talked about that. Instead you called my relatively accurate and intentional presentation of my views as “misleading readers into thinking the case you are bringing forward is stronger than it actually is or that this matter is so obvious and trivial...” which sounds to me like you have a problem in your own thinking and norms of discourse, which is that you’re requiring that statements other people make be from the perspective of [the theory that’s shared between the expected community of speakers and listeners] in order for you to think they’re appropriate or non-misleading.
The fact that I have to explain this to you is probably bad, and is probably mostly your responsibility, and you should reevaluate your behavior. (I’m not trying to be gentle here, and if gentleness would help then you deserve it—but you probably won’t get it here from me.)
I’d like to understand what it is that has held you back from speed reading external work for hunch seeding for so long.
Well currently I’m not really doing alignment research. My plans / goals / orientation / thinking style have changed over the years, so I’ve read stuff or tried to read stuff more or less during different periods. When I’m doing my best thinking, yes, I read things for idea seeding / as provocations, but it’s only that—I most certainly am not speed reading, the opposite really: read one paragraph, think for an hour and then maybe write stuff. And I’m obviously not reading some random ML paper, jesus christ. Philosophy, metamathematics, theoretical biology, linguistics, psychology, ethology, … much more interesting and useful.
To me, it seems like solving from scratch is best done not from scratch, if that makes sense.
Absolutely, I 100% agree, IIUC. I also think:
A great majority of the time, when people talk about reading stuff (to “get up to speed”, to “see what other people have done on the subject”, to “get inspiration”, to “become more informed”, to “see what approaches/questions there are”...), they are not doing this “from scratch not from scratch” thing.
“the typical EA / rationalist, especially in AI safety research (most often relatively young and junior in terms of research experience / taste)” is absolutely and pretty extremely erring on the side of failing to ever even try to solve the actual problem at all.
Don’t defer to what you read.
Yeah, I generally agree (https://tsvibt.blogspot.com/2022/09/dangers-of-deferrence.html), though you probably should defer about some stuff at least provisionally (for example, you should probably try out, for a while, the stance of deferring to well-respected philosophers about what questions are interesting).
I think it’s just not appreciated how much people defer to what they read. Specifically, there’s a lot of frame deference. This is usually fine and good in lots of contexts (you don’t need to, like, question epistemology super hard to become a good engineer, or question whether we should actually be basing our buildings off of liquid material rather than solid material or something). It’s catastrophic in AGI alignment, because our frames are bad.
Not sure I answered your question.
considerably-better-than-average work on trying to solve the problem from scratch
It’s considerably better than average but is a drop in the bucket and is probably mostly wasted motion. And it’s a pretty noncentral example of trying to solve the problem from scratch. I think most people reading this comment just don’t even know what that would look like.
even for someone interested in this agenda
At a glance, this comment seems like it might be part of a pretty strong case that [the concrete ML-related implications of NAH] are much better investigated by the ML community compared to LW alignment people. I doubt that the philosophically more interesting aspects of Wentworth’s perspectives relating to NAH are better served by looking at ML stuff, compared to trying from scratch or looking at Wentworth’s and related LW-ish writing. (I’m unsure about the mathematically interesting aspects; the alternative wouldn’t be in the ML community but would be in the mathematical community.)
And most importantly “someone interested in this agenda” is already a somewhat nonsensical or question-begging conditional. You brought up “AI safety research” specifically, and by that term you are morally obliged to mean [the field of study aimed at figuring out how to make cognitive systems that are more capable than humanity and also serve human value]. That pursuit is better served by trying from scratch. (Yes, I still haven’t presented an affirmative case. That’s because we haven’t even communicated about the proposition yet.)
So what am I supposed to do if people who control resources that are nominally earmarked for purposes I most care about are behaving this way?