Unfortunately for Korzybski, General Semantics never really took off or achieved prominence as the new field he had set out to create. It wasn’t without some success and it has been taught in some colleges. But overall, despite trying to create something grounded in science and empiricism, over the years the empiricism leaked out of general semantics and a large amount of woo and pseudoscience leaked in. This looks like it was actually a similar failure mode to what had started happening with Origin before I stopped the project.
With Origin, I introduced a bunch of rough draft concepts and tried to bake in the idea that these were rough ideas that should be iterated upon. However, because of the halo effect, those rough drafts were taken as truth without question. Instead of quickly iterating out of problematic elements, the problematic elements stuck around and became accepted parts of the canon.
Something similar seems to have happened with General Semantics, at a certain point it stopped being viewed as a science to iterate upon, and began being viewed in a dogmatic, pseudoscientific way. It would eventually spin off a bunch of actual cults like Scientology and Neuro-Linguistic Programming, and while the Institute of General Semantics still exists and still does things, no one seems to really be trying to achieve Korzybski’s goal of a science of human engineering. That goal would sit on a shelf for a long time until finally it was picked back up by one Eliezer Yudkowsky.
This makes me wonder to what extent we fail at this in the rationality movement. I think we’re better at it, but I’m also not sure we’re as systematic about fighting against it as we could be.
I’m hoping we’ll find out in the next post, but I would guess the answer is “yes” via general semantics having an impact on science fiction writers who had an impact on transhumanism and the extropians out of which SL4, Eliezer, and this whole thing grew such that even if it wasn’t known at the time the ideas were “in the water” in such a way that you could make a strong argument that they did.
Agreed, I think of this like sending a signal that at least a limited concern for safety is important. I’m sure we’ll see a bunch of papers with sections addressing this that won’t be great, but over time it stands some chance of more regularizing considering concerns about safety and ethics of ML work in the field such that safety work will become more accepted as valuable. So even without a lot of guidance or strong evaluative criteria, this seems a small win to me that, at worst, causes some papers to just have extra fluff sections their authors wrote to pretend to care about safety rather than ignoring it completely.
A decent intuition might be to think about what exploration looks like in human children. Children under the age of 5 but old enough to move about on their own—so toddlers, not babies or “big kids”—face a lot of dangers in the modern world if they are allowed to run their natural exploration algorithm. Heck, I’m not even sure this is a modern problem, because in addition to toddlers not understanding and needing to be protected from exploring electrical sockets and moving vehicles they also have to be protected from more traditional dangers that they would definitely otherwise check out like dangerous plants and animals. Of course, since toddlers grow up into powerful adult humans, this is a kind of evidence that they are powerful enough explorers (even with protections) to become powerful enough to function in society.
Obviously there are a lot of caveats to taking this idea too seriously since I’ve ignored issues related to human development, but I think it points in the right direction of something everyday that reflects this result.
Thanks, this is a really useful summary to have since linking back to Bostrom on info hazards is reasonable but not great if you want people to actually read something and understand information hazards rather than bounce of something explaining the idea. Kudos!
Couple of notes on the song:
I wrote it with the Bob Dylan cover in my head more than the original.
It doesn’t scan perfectly on purpose so that some of the syllables have to be “squished” to fit the time and make the song sound “sloppy” like the original and many covers of it do.
In case it’s not obvious, it’s meant to be a “ha ha only serious” anthem for negative utilitarians
I think of applied rationality pretty narrowly, as the skill of applying reasoning norms that maximize returns (those norms happening to have the standard name “rationality”). Of course there’s a lot to that, but I also think this framing is a poor one to train all the skills required to “win”. To use a metaphor, as requested, it’s like the skill of getting really good at reading a map to find optimal paths between points: your life will be better for it, but it also doesn’t teach you everything, like how to figure out where you are on the map now or where you might want to go.
tl;dr: read multiple things concurrently so you read them “slowly” over multiple days, weeks, months
When I was a kid, it took a long time to read a book. How could it not: I didn’t know all the words, my attention span was shorter, I was more restless, I got lost and had to reread more often, I got bored more easily, and I simply read fewer words per minute. One of the effects of this is that when I read a book I got to live with it for weeks or months as I worked through it.
I think reading like that has advantages. By living with a book for longer the ideas it contained had more opportunity to bump up against other things in my life. I had more time to think about what I had read when I wasn’t reading. I more deeply drunk in the book as I worked to grok it. And for books I read for fun, I got to spend more time enjoying them, living with the characters and author, by having it spread out over time.
As an adult it’s hard to preserve this. I read faster and read more than I did as a kid (I estimate I spend 4 hours a day reading on a typical day (books, blogs, forums, etc.), not including incidental reading in the course of doing other things). Even with my relatively slow reading rate of about 200 wpm, I can polish off ~50k words per day, the length of a short novel.
The trick, I find, is to read slowly by reading multiple things concurrently and reading only a little bit of each every day. For books this is easy: I can just limit myself to a single chapter per day. As long as I have 4 or 5 books I’m working on at once, I can spread out the reading of each to cover about a month. Add in other things like blogs and I can spread things out more.
I think this has additional benefits over just getting to spend more time with the ideas. It lets the ideas in each book come up against each other in ways they might otherwise not. I sometimes notice patterns that I might otherwise not have because things are made simultaneously salient that otherwise would not be. And as a result I think I understand what I read better because I get the chance not just to let it sink in over days but also because I get to let it sink in with other stuff that makes my memory of it richer and more connected.
So my advice, if you’re willing to try it, is to read multiple books, blogs, etc. concurrently, only reading a bit of each one each day, and let your reading span weeks and months so you can soak in what you read more deeply rather than letting it burn bright and fast through your mind to be forgotten like a used up candle.
Welcome to LessWrong!
Given the content of your post, you might find these posts interesting:
I few months ago I found a copy of Staying OK, the sequel to I’m OK—You’re OK (the book that probably did the most to popularize transactional analysis), on the street near my home in Berkeley. Since I had previously read Games People Play and had not thought about transactional analysis much since, I scooped it up. I’ve just gotten around to reading it.
My recollection of Games People Play is that it’s the better book (based on what I’ve read of Staying OK so far). Also, transactional analysis is kind of in the water in ways that are hard to notice so you are probably already kind of familiar with some of the ideas in it, but probably not explicitly in a way you could use to build new models (for example, as far as I can tell notions of strokes and life scripts were popularized by if not fully originated within transactional analysis). So if you aren’t familiar with transactional analysis I recommend learning a bit about it since although it’s a bit dated and we arguably have better models now, it’s still pretty useful to read about to help notice patterns of ways people interact with others and themselves, sort of like the way the most interesting thing about Metaphors We Live By is just pointing out the metaphors and recognizing their presence in speech rather than whether the general theory is maximally good or not.
One things that struck me as I’m reading Staying OK is its discussion of the trackback technique. I can’t find anything detailed online about it beyond a very brief summary. It’s essentially a multi-step process for dealing with conflicts in internal dialogue, “conflict” here being a technical term referring to crossed communication in the transactional analysis model of the psyche. Or at least that’s how it’s presented. Looking at it a little closer and reading through examples in the book that are not available online, it’s really just poorly explained memory reconsolidation. To the extent it’s working as a method in transactional analysis therapy, it seems to be working because it’s tapping into the same mechanisms as Unlocking the Emotional Brain.
I think this is interesting both because it shows how we’ve made progress and because it shows that transactional analysis (along with a lot of other things), were also getting at stuff that works, but less effectively because they had weaker evidence to build on that was more confounded with other possible mechanisms. To me this counts as evidence that building theory based on phenomenological evidence can work and is better than nothing, but will be supplanted by work that manages to tie in “objective” evidence.
First, thanks for posting about this even though it failed. Success is built out of failure, and it’s helpful to see it so that it’s normalized.
Second, I think part of the problem is that there’s still not enough constraints on learning. As others notice, this mostly seems to weaken the optimization pressure such that it’s slightly less likely to do something we don’t want but doesn’t actively make it into something that does things we do want and not those we don’t.
Third and finally, what this most reminds me of is impact measures. Not in the specific methodology, but in the spirit of the approach. That might be an interesting approach for you to consider given that you were motivated to look for and develop this approach.
As Stuart previously recognized with the anchoring bias, it’s probably worth keeping in mind that any bias is likely only a “bias” against some normative backdrop. Without some way reasoning was supposed to turn out, there are no biases, only the way things happened to work.
Thus things look confusing around confirmation bias, because it only becomes bias when it results in reason that produces a result that doesn’t predict reality after the fact. Otherwise it’s just correct reasoning based on priors.
Yeah, I think #1 sounds right to me, and there is nothing strange about it.
I don’t recall seeing anything addressing this directly: has there been any progress towards dealing with concerns about Goodharting in debate and otherwise the risk of mesa-optimization in the debate approach? The typical risk scenario being something like training debate creates AIs good at convincing humans rather than at convincing humans of the truth, and once you leave the training set of questions were the truth can be reasonably determined independent of the debate mechanism we’ll experience what will amount to a treacherous turn because the debate training process accidentally optimized for a different target (convince humans) than the one intended (convince humans of true statements).
For myself this continues to be a concern which seems inadequately addressed and makes me nervous about the safety of debate, much less its adequacy as a safety mechanism.
Nevertheless, extensions to PAL might still be useful. Agency rents are what might allow AI agents to accumulate wealth and influence, and agency models are the best way we have to learn about the size of these rents. These findings should inform a wide range of future scenarios, perhaps barring extreme ones like Bostrom/Yudkowsky.
For myself, this is the most exciting thing in this post—the possibility of taking the principal-agent model and using it to reason about AI even if most of the existing principal-agent literature doesn’t provide results that apply. I see little here to make me think the principal-agent model wouldn’t be useful, only that it hasn’t been used in ways that are useful to AI risk scenarios yet. It seems worthwhile, for example, to pursue research on the principal-agent problem with some of the adjustments to make it better apply to AI scenarios, such as letting the agent be more powerful than the principal and adjusting the rent measure to better work with AI.
Maybe this approach won’t yield anything (as we should expect on priors, simply because most approaches to AI safety are likely not going to work), but it seems worth exploring further on the chance it can deliver valuable insights, even if, as you say, the existing literature doesn’t offer much that is directly useful to AI risk now.
An additional possibility: everything already adds up to normality, we’re just failing to notice because of how we’re framing the question (in this case, whether or not holding middling probability estimates for difficult and controversial statements is correct).
I’m not sure, but my guess is that most of the risk lies in the future, i.e. the risks are in things that might be possible to do later that aren’t possible to do now. I say this both because it doesn’t seem very dangerous right now and because I can imagine ways in which it would be dangerous, albeit as an outsider to biology, epidemiology, and genetics.
If you go through a belief update process and it feels like the wrong belief got confirmed, the fact that you feel like the wrong belief won means that there’s still some other belief in your brain disagreeing with that winner. In those kinds of situations, if I am approaching this from a stance of open exploration, I can then ask “okay, so I did this update but some part of my mind still seems to disagree with the end result; what’s the evidence behind that disagreement, and can I integrate that”?
I sometimes find that memories and the beliefs about the world that they power are “stacked” several layers deep. It’s rare to find a memory directly connected to a mistaken ground belief, and it’s more normal that 2, 3, 4, or even 5 memories are all interacting through twists and turns to produce whatever knotted and confused sense of the world I have.
This is an interesting way to frame things. I have plenty of experience what you’re calling aspiration here via deliberative practices over the past 5 years or so that have caused me to transform in ways I wanted to while also not understanding how to get there. For example, when I started zen practice I had some vague idea of what I was there to do or get—get “enlightened”, be more present, be more capable, act more naturally, etc.—but I didn’t really understand how to do it or even what it was I was really going for. After all, if I really did understand it, I would have already been doing it. It’s only through a very slow process of experimenting, trying, being nudged in directions, and making very short moves towards nearby attractors that I’ve over time come to better unstand some of these things, or understand why I was confused and what the thing I thought I wanted really was without being skewed by my previous perceptions of it.
I think much of the problem with the kind of approach you are proposing is figuring out how to turn this into something a machine can do. That is, right now it’s understood and explained at a level that makes sense for humans, but how do we take those notions and turn them into something mathematically precise enough that we could instruct a machine to do them and then evaluate whether or not what it did was in fact what we intended. I realize you are just pointing out the idea and not claiming to have it all solved, so this is only to say that I expect much of the hard work here is figuring out what the core, natural feature of what’s going on with aspiration is such that it can be used to design an AI that can do that.