So a number of the comments have pointed in the direction of my concerns with what I interpret to be the underlying assumption of this post, namely that it is at all possible to work with something that is not at least touched by humans enough that implicit, partial modeling of humans will not happen as a result of trying to build AI, general or narrow, even if restricted to a small domain. This is not to fail to acknowledge that much current AI safety work tends to be extremely human-centric, going so far as to rely on uniquely human capabilities (at least unique among known things), and that this is in itself a problem for many of the reasons you lay out, but I think it would be a mistake to think we can somehow get away from humans in building AGI.
The reality is that humans are involved in the work of building AGI, involved in the design and construction of the hardware they will run on, the data sets they will use, etc., and even if we think we’ve removed the latent human-shaped patterns from our algorithms, hardware, and data, we should strongly suspect we are mistaken because humans are tremendously bad at noticing when they are assuming something true of the world when it is actually true of their understanding, i.e. I would expect it to be more likely that humans would fail to notice their latent presence in a “human-model-free” AI than for the AI to actually be free of human modeling.
Thus to go down the direction of working on building AGI without human models risks failure because we failed to deal with the AGI picking up on the latent patterns of humanity within it. This is not to say that we should stick to a human-centric approach, because it has many problems as you’ve described, but to try to avoid humans is to ignore making our systems robust to the kinds of interference from humans that can push us away from the goal of safe AI, especially unexpected and unplanned for interference due to hidden human influence. If we instead build expecting to deal with and be robust to the influence of humans, we stand a much better chance of producing safe AI than either being human-centric or overly ignoring humans.
I am not sure I follow. The bots indeed do end up clustering into 4 to 5 different clusters, where each cluster represents a certain convergent view. By “keeping the affinity score”, do you mean they keep track of the past interactions, not just compare current views at each step? That would be an interesting improvement, adding memory to the model, but that would be, well, an improvement, not necessarily something you put into a toy model from the beginning. Maybe you mean something else? I’m confused.
Oh, this paragraph seems to suggest your model has a lot more going on that I got from reading this post. Maybe if I followed you links I would find some more details (sounded like they were just extra details that could be skipped)? I got the impression you found a function that has a shape illustrative of what you want and that was it, but this sounds like there’s a lot more going on not described in the text of this post!
It’s unclear to me that the model you construct has much relationship to your tested idea, or at least any more than any other words or images you could have provided to illustrate the idea. I get that you see peaks and valleys in the function that you interpret as related to attraction and repulsion of people towards clusters, but again I don’t see any clear grounding of this model in the idea you are testing, so to me it just feels like you drew a picture to illustrate the idea in a very convoluted way. Put another way, I don’t feel like your model constrains your belief in the idea at all, because you could have come up with any function to draw any curve you could have wanted to show us.
Maybe I’ve missed something where you did provide a grounding, but I read back through the post a second time and still didn’t see anything definitive. If there is something like that, or if you left it out and can explain how the model is causally connected to the idea such that something about the model says something about the idea, can you do so now?
FWIW I went in to this expecting a very different sort of model, one more like a simulation using simple bots that interact in simplified ways you describe and then we could see how they end up clustering, maybe by each bot keeping an affinity score for the others and finding results about the affinity of the bots forming clusters. That feels to me like it does have some more grounding in the idea being tested, where each bot is simplified stand-in for a person and their interactions and affinity scores for the other bots stand in for human interactions and human affinity for other humans. But again maybe you have some reason for thinking the model you present has the same kind of causal connection by sharing some structural similarity; I just don’t see it so would appreciate if you could clarify.
Although I guess there’s also the question of, why don’t we just create an archipelago of subreddits on reddit if that’s the direction we want to go? Just prepend the name of each subreddit with “LessWrong” and link them together somehow and be done with it.
I think we all know the answer, though: LW has certain standards and does a better job of keeping out certain kinds of noise than reddit does, even with active moderation. LW today attracts certain folks, deters others, and its boundaries make it a compelling garden to hang out in, even if not everyone agrees on, say, whether we should allow only flowering plants in our garden or if ferns and moss are okay.
I like the direction of having LW, EA Forum, and Alignment Forum being semi-connected; I would love if EA Forum functions more like the Alignment Forum does in relation to LW, and I think it would be cool to potentially see one or two additional sites branch off if that made sense, but I also don’t feel like there’s enough volume here that I’d enjoy seeing us fracture too much, because there’s a lot of benefit too in keeping things together and exposing folks to things they otherwise might not see because it happens to be loosely connected enough to the things they do want to see that they end up encountering it. I enjoy stumbling on things I had no idea I would learn something from, but others are less open in this way and have different preferences.
The following quote was called out in a deleted comment, but I think there is something to discuss here that would be missed if we didn’t come back to it even though that comment was ruled off-topic.
Thus, ultimate concern was displaced to science, a concern that its methods were simply not capable of handling. And science itself was always completely honest about its limitations: science cannot say whether God exists or does not exist; whether there is an Absolute or not; why we are here, what our ultimate nature is, and so on. Of course science can find no evidence for the Absolute; nor can it find evidence disproving an Absolute. When science is honest, it is thoroughly agnostic and thoroughly quiet on those ultimate questions.
The now deleted complaint was that this is saying something like science is in a non-overlapping magisterium from the question of whether or not God exists. I agree trying to claim separate magisterium is a problem and doesn’t work, so what do I see as the value of including this quote?
Mainly to highlight a point that I think is often poorly understood: that science, for all the good it does, intentionally cuts itself off from certain kinds of evidence in order to allow it to function. Maybe we can debate what is the “real” science, but I’m thinking here of the normal, run-of-the-mill thing you’d call “science” we find going on in universities around the world, and that form of science specifically ignores lines of evidence we might call anecdotal or phenomenological and, for our purposes, ignores questions of epistemology by settling for a kind of epistemological pragmatism that allows science to get on with the business of science without having to resolve philosophy problems every time you want to publish a paper on fruit flies.
This choice to pragmatically ignore deep epistemological questions is a good choice for science, of course, because it lets it get things done, but it also means we cannot take results like “science finds no evidence of supernatural beings or some ever-present unifying force we could reasonably label God” as stronger evidence than it is. Yes, this is pretty strong evidence that there is no God like the kind you find in a religious text that interacts with the world, but it’s also not much evidence of anything about a God that’s more like an invisible dragon living in a garage. The thing that lets you address those sorts of questions is a bit different from what is typically done under the banner of science.
This quote does go a bit too far when it says science should be “thoroughly quiet on those ultimate questions”, because it does have something to say, but I still thought it worth including because it highlights the common overreach of science into domains which it specifically rules itself our from participating in by setting up its methodological assumptions so that it can function.
(This last point put another way, think of how annoyed you’d be if every time you told your friend you felt sad and wanted a hug they said “I don’t know, I can’t really measure your sadness very well, and it’s just you reporting this sadness anyway, so I can’t tell if it’s worth it to give you the hug”.)
I have two thoughts on this.
One is that different spiritual traditions have their own deep, complex system of jargon that sometimes stretch back thousands of years through multiple translations, schisms, and acts of syncretism. So when you first encounter it you can feel like it’s a lot and it’s new and why can’t these people just talk normally.
Of course, most LW readers live in a world full of jargon even before you add on the LW jargon, much of it from STEM disciplines. People from outside that cluster feel much the same way about STEM jargon as the average LW reader may feel about spiritual jargon. I point this out merely because I realized, when you brought up the spiritual example, that I wasn’t given a full account of what’s different about rationalists, maybe, in that there’s a tendency to make new jargon even when a literature search would reveal existing jargon exists.
Which is relevant to your point and my second thought, which is that you are right, many things we might call “new age spirituality” have the exact same jargon-coining pattern in their writing as rationalist writing does, with nearly ever author striving to elevate some metaphor to the level of word so that it can becomes a part of a wider shared approach to ontology.
This actually seems to suggest then that my story is too specific and pointing to Eliezer’s tendency to do this as a cause is maybe unfair: it may be a tendency that exists within many people, and there is something similar about the kind of people or the social incentives that are similar between rationalists and new age spiritualists that produces this behavior.
This archetype is easily distractible and does not cooperate with other instances of itself, so an entire community of people conforming to this archetype devolves into valuing abstraction and specialized jargon over solving problems.
Obviously there are exceptions to this, but as a first pass this seems pretty reasonable. For example, one thing I feel is going on with a lot of posts on LessWrong and posts in the rationalist diaspora is an attempt to write things the way Eliezer wrote them, specifically with a mind to creating new jargon to tag concepts.
My suspicion is that people see that Eliezer gained a lot of prestige via his writing, this is one of the things he does in his writing (name concepts with unusual names), and I suspect people make the (reasonable) assumption that if they do something similar maybe they will gain prestige from their writing targeted to other rationalists.
I don’t have a lot of evidence to back this up, other than to say I’ve caught myself having the same temptation at times, and I’ve thought a bit about this common pattern I see in rationalist writing and tried to formulate a theory of why it happens that accounts not only for why we see it here but also why I don’t see it as much in other writing communities.
I don’t. Integral Spirituality might have some of what you’re looking for, but only incidentally, since it’s really trying to do something else.
Overall it seems like you’re making a coherent enough point (meaningness is subjective), but I think the writing style makes that a bit hard to pick out. I can tell you’ve gotten some downvotes, and my guess is that it’s because it’s hard reading this to tell much about why you think this is or really given much in the way of specific arguments a person might engage with towards this point. I’m not saying I agree or disagree with you here, merely that I find it hard to follow your reasoning because it feels like there are many gaps in it in your writing, even if there aren’t in your head.
I see you are rather new to the site, so I say this because I want to make sure you don’t end up bouncing off because you get some downvotes. People generally respond well here to writing that is proof-like: well structured, aims to show something, and is clear about its starting assumptions (including assumptions about the audience; I felt like this post was arguing a point against an invisible other view that was never made very clear to me).
Hope that helps.
“Formally Stating the AI Alignment Problem” is probably the nicest introduction, but if you want a preprint of a more formal approach to how I think this matters (with a couple specific cases), you might like this preprint (though note I am working on getting this through to publication, have it halfway through review with a journal, and although I’ve been time constrained to make the reviewers’ suggested changes, I suspect the final version of this paper will be more like what you are looking for).
Both, although I mostly consider the former question settled (via a form of panpsychism that I point at in this post) and the latter less about the technical details of how AI could work and more about the philosophical predictions of what will likely be true of AI (mostly because it would be true of all complex, conscious things).
Also the “phenomenological” in the name sounded better to me than, say, “philosophical” or “continental” or something else, so don’t get too hung up on it: it’s mostly a marker to say something like “doing AI philosophy from a place that much resembles the philosophy of the folks who founded modern phenomenology”, i.e. my philosophical lineage is more Kierkegaard, Hegel, Schopenhauer, Husserl, and Sartre than Hume, Whitehead, Russel, and Wittgenstein.
I’ve been a bit busy with other things lately, but this is exactly the kind of thing I’m trying to do.
I suspect there are many more sources of risk that result in only being able to approach complete safety than cosmic rays, but this seems a reasonable argument for at least establishing that the limit exists so even if we disagree over whether something more easily controlled by AI design is a source of risk we don’t get confused and think that if we eliminate all risk from the design that we suddenly get perfect safety.
This is a very outside view on these ideas. I think from the inside there’s a lot that often separates obviously bogus ideas from possibly real ones. Ideas that might pan out are generally plausible now given the evidence available, even if they cannot be proved, whereas bogus, crank ideas generally ignore what we know to claim something contradictory. This can get a bit tricky because ideas of what people consider “known” can be a little fluid, but the distinction I’m trying to draw here is between ideas that may contradict existing models but agree with what we observe and ideas that disagree with what we observe (regardless of whether they contradict existing models), the former being plausible, ahead-of-their-time ideas that might later be proven true, and the latter being clearly bogus.
(Of course sometimes, as in the case of not observing star parallax without sufficiently powerful instruments, even our observations are a limiting factor, but this does at least allow us to make specific predictions that we should expect to see something if we had more powerful instruments, and would lead us to conclude against a promising idea if we got really good observations that generated disqualifying evidence.)
What I like about this thread, and why I’m worried about people reading this post and updating away from thinking that sufficiently powerful processes that don’t look like what we think are dangerous is safe, is that it helps make clear that Rohin seems to be making an argument that hinges on leaky or even confused abstractions. I’m not sure any of the rest of us have much better abstractions to offer that aren’t leaky, and I want to encourage what Rohin does in this post of thinking through the implications of the abstractions he’s using to draw conclusions that are specific enough to be critiqued, because through a process like this we can get a clearer idea of where we have shared confusion and then work to resolve it.
Type 1 seems to be describing what I’d call a “structure” which is another way of talking about a pattern but in a certain abstract sense. For example, consider the classic mathematician joke about topologists not being able to distinguish a donut from a coffee cup because they have the same topological genus (at least, idealized donuts and coffee cups do), genus-1.
Type 2 seems to be describing what I’d call a “system”, i.e. multiple objects in relation with each other coming together to form a new object at a different level of abstraction.
Although my thinking has certainly evolved a lot since then, I wrote about an issue that required addressing this topic a couple years go, so you might find that interesting even if you’re not so interested in the topic I was addressing directly.
I suspect panpsychism is in this boat. We have lots of philosophical reasons to think it makes sense, but making it the consensus requires overcoming two related difficulties:
convincing ourselves that “consciousness” is less special and magical than we currently think it is
reducing consciousness to something easily observable
Part of the problem seems to be we don’t have the ability to adequately inspect the most complex conscious systems, and until we do it will remain possible to keep claiming “yeah, but real consciousness is special and not everything has it” because we imagine the simple pattern that strong theories of panpsychism propose explains consciousness is insufficient to explain the specialness of humans, animals, etc.
(This is not to be confused with weak theories of panpsychism, which are woo and reasonably dismissed (based on current evidence) because they propose the existence of phenomena we have not observed, like plants, rocks, and systems being as agentic as animals, but you know, in secret, or only on another plane of existence.)
this feels like it’s moving things towards optimization and indifference.
I came here to say something like “I feel like this post sets up a false dichotomy” and I think you’ve done a better job than I would have at explicating why it feels to me optimization and indifference go together and are not really in opposition, except from within the prisons of our own minds thinking that they are in opposition.
I don’t have any literature on it, but it has had that effect on me. That is, as long as I’m meditating regularly (I average about 45 minutes a day of “serious” meditation, and another 60 minutes or so of “casual” meditation) I find if I don’t get a full 10 hours I still often won’t have sleep attacks (in fact I now normally only sleep about 8 hours most nights) and I can sleep as little as 6 hours and still function mostly normally (but not doing that repeatedly, and I will almost certain have a sleep attack on those days).