I believe there is some stuff about cryonics, although not all of it flattering, and there’s certainly a lot said about AI, although not from a rationalist perspective. There’s also a lot of interesting stuff on YouTube produced by people focused on sharing knowledge that might be lumped broadly under “education” but that’s really giving short shrift to what they’re doing. I don’t have any handy links, though, but maybe that at least gets you looking in useful directions.
I don’t think there’s anything like this for broader rationality topics outside some podcasts, but maybe something like them could be made into video documentaries (which I assume is the format you’re interested in) in the future.
Yes, specific sciences study small slivers of what we experience, and philosophy ponders the big picture, helping to spawn another sliver to study. Still don’t see how it provides answers, just helps crystallize questions.
It sounds like a disagreement on whether A contains B means B is an A or B is not an A. That is, whether or not that, say, physics, which is contained within the realm of study we call philosophy, although carefully cordoned off with certain assumptions from the rest of it, is still philosophy or whether philosophy is the stuff that isn’t broken down into a smaller part, because to my way of thinking physics is largely philosophy of the material and so by example we have a case where philosophy provides answers.
I know I often sound like a broken record, but I’d say this just keeps coming back to the fundamental uncertainty we have about the relationship between reality as it is and as we know it and the impossibility of bringing those two into perfect, provable alignment. This is further complicated by the issue of whether or not the thing we’re dealing with, moral facts, exist or, if they do exist, exist mind-independently, and question to which it so far seems we are unlikely to find a solution for unless we can find a synthesis over our existing notions of morality such that we are able to becomes deconfused about what we were previously trying to point at with the handle “moral”.
I object rather strongly to this categorization. This feels strongly to me like a misunderstanding borne of having only encountered analytic philosophy in rather limited circumstances and having assumed the notion of the “separate magisterium” that the analytic tradition developed as it broke from the rest of Western philosophy.
Many people doing philosophy, myself included, think of it more as the “mother” discipline from which we might specialize into other disciplines once we have the ground well understood enough to cleave off a part of reality for a time being while we work with that small part so as to avoid constantly facing the complete, overwhelming complexity of facing all of reality at once. What is today philosophy is perhaps tomorrow a more narrow field of study, except it seems in those cases where we touch so closely upon fundamental uncertainty that we cannot hope to create a useful abstraction, like physics or chemistry, to let us manipulate some small part of the world accurately without worrying about the rest of it.
This seems fairly unlikely to me except insofar as AI acts as a filter that forces us to refine our understanding. The examples you provide arguably didn’t make anything easier, just made what was already there more apparent to more people. This won’t help resolve the fundamental issues, though, although it may at least make more people aware of them (something, I’ll add, I hope to make more progress on at least within the community of folks already doing this work, let alone within a wider audience, because I continue to see, especially as goes epistemology, dangerous misunderstandings or ignorances of key ideas that pose a threat to successfully achieving AI alignment).
Unfortunately many philosophical problems may not have solutions of a form that allow us to construct something that definitely is what we want, but rather only permits us to say something is probably not what we want due to the fundamental ungroundability of our beliefs. My suspicion is that you are right, the problem is even harder than anyone currently realizes, and the best we can hope for is to winnow away as much stuff that obviously doesn’t work while still leaving us with lots of uncertainty about whether or not we can succeed at our safety objectives.
My interpretation of anthropic arguments is that they are reasoning the same way as we do in the multi-world interpretation of quantum mechanics, so I think quantum immortality falls under what you’re asking for.
To make this model and little richer and share something of how I think of it, I tend to think of the risk of any particular powerful AI the way I think of risk in deploying software.
I work in site reliability/operations, and so we tend to deal with things we model as having aleatory uncertainty like holding constant a risk that any particular system will fail unexpected for some reason (hardware failure, cosmic rays, unexpected code execution path, etc.), but I also know that most of the risk comes right at the beginning when I first turn something on (turn on new hardware, deploy new code, etc.). A very simple model of this is something like f(x)=e−x+c where most of the risk of failure happens right at the start and beyond that there’s little to no risk of failure, so running for months doesn’t represent a 95% risk; almost all of the 5% risk is eaten up right at the start because the probability distribution function is shaped such that all the mass is under the curve at the beginning.
I really appreciate you sharing a word for this distinction. I remember being in a discussion about the possibility of indefinite lifespans way back on the extropians mailing list, and this one person was making an argument about it being impossible due to accumulation of aleatory risk using life insurance actuarial models as a starting point. Their argument was fine as far as it went, but it created a lot of confusion when it seemed there was disagreement on just where the uncertainty lay, and I recall trying to disentangle that model confusion lead to a lot of hurt feelings. I think having some term like this to help separate the uncertainty about the model, the uncertainty due to random effects, and the uncertainty about the model implying certain level of uncertainty due to random effects would have helped tremendously.
A point you make that I think deserves more emphasis is the “eye of the beholder” part you use in the title.
Wireheading is something that exists because we have a particular meaning we assign to a reward. This is true whether we are the one observing the actions we might label wireheading or the one to whom it is happening (assuming we can observe our own wireheading).
For example, addicts are often not unaware that they are doing something, like shooting heroin, that will directly make them feel good at the expense of other things, and then they rationally choose to feel good because it’s what they want. From the inside it doesn’t feel like wireheading, it feels like getting what you want. It only looks like wireheading from the outside if we pass judgement on an agent’s choice of values such that we deem the agent’s values to be out of alignment with the objective, a la goodharting. In the case of the heroin addict, they are wireheading from an evolutionary perspective (both the actual evolutionary perspective and the reification of that perspective in people judging a person to be “wasting their life on drugs”).
As I say in another comment here, this leads us to realize there is nothing so special about any particular value we might hold so long as we consider only the value. The value of values, then, must exist in their relation to put the world in a particular state, but even how much we value putting the world in particular states itself comes from values, and so we start to see the self-referential nature of it all that leads to a grounding problem for values. So put another way, wireheading only exists so long as you think you can terminate your values in something true.
Right. I think we can even go a step further and say there’s nothing so special about why we might want to satisfy any particular value, whether it has the wirehead structure or not. That is, not only is wireheading in the eye of the beholder, but so is whether or not we are suffering from goodharting in general!
Well, guess what you do have to remember all the time, as if you’d uncontrollably pressed “repeat” on the memory player? You.
I really like this way of phrasing it. We don’t live in the past or the future, we live right here right now in this very moment experiencing reality as it is. And the self is created by a kind of noticing what’s happening and reifying it into a thing by remembering what we experienced just moments ago. So the more we spend time focused on anything other than what’s happening right here, right now, and the things that affect the conditions of the here and now, the more we’re distracted and ignoring what has the most impact on our lives.
Of course that’s easier said than done! But it’s the essential wisdom about how to be awake to our lives and live them to their fullest.
I don’t know, this doesn’t jive with my experience of abstractions.
Yes, structuring code with abstractions rather than just directly doing the thing you’re trying to do makes the code more structurally complex and yes sometimes it is unnecessary and yes more structural complexity means it’s harder to tell what any individual chunk of code does in isolation, but I think your example suggests you’re engaging with abstractions very differently from I do.
When I write code and employ abstraction, it’s usually not that I just think “oh, how could I make this more clever”, it’s that I think, “geez, I’m doing the same thing over and over again here, duplicating effort; I should abstract this away so I only have to say something about what’s different rather than repeatedly doing what’s the same”. Some people might call this removing boilerplate code, and that’s sort of what’s going on, but I think of boilerplate as more a legacy of programming languages where for toolchain reasons (basically every language prior to so-called 4th gen languages) or design reasons (4th gen languages like Python that deliberately prevent you from doing certain things) you needed to write code that lacked certain kinds of abstractions (what we frequently call metaprogramming). Instead I think of this as the natural evolution of the maxim “Don’t Repeat Yourself” (DRY) towards code that is more maintainable.
Because when I really think about why I code with abstractions, it’s not to show off or be efficient with my lines of code or even to just make things pretty, it’s to write code that I can maintain and work with later. Well designed abstractions provide clear boundaries and separation of concerns that make it easy to modify code to do new things as requirement change and refactor parts of the code. Combined with behavioral test driven development, I can write tests to the expected behavior of these concerns, and know I can trust the tests to let me change the code and still pass so long as the behavior doesn’t change, and to let me know if I accidentally break the behavior I wanted in the code.
Yes, I often don’t do it perfectly, but when it works it’s beautiful. My experience is that mainly the people who dislike it are new grads who have spent all their time coding toys who haven’t much had to deal with large, complex systems; everyone seems to understand that learning about system-specific abstractions is just naturally what we must do to be kind to ourselves and future programmers who will work with this code. To do otherwise is to do the future a disservice.
Not really an answer, but there’s also the point that with superintelligence humans don’t have to do the things they otherwise could do because we built such a general tool that it eliminates the need for other tools. This is pretty appealing if, like me, you want to be free to do things you want to do even if you don’t have to do them, rather than having to do things because you need to do them to satisfy some value.
I think this comment does a better job of explaining the notion of fairness you’re trying to point at than other words here.
I don’t think we disagree.
To my mind what seems unfair about some problems is that they propose predictors that, to the best of our knowledge, are physically impossible, like a Newcomb Omega that never makes a mistake, although these are only unfair in the sense that they depict scenarios we won’t ever encounter (perfect predictors), not that they ask us something mathematically unfair.
Other more mundane types of unfairness, like where a predictor simply demands something so specific that no general algorithm could always find a way to satisfy it, seem more fair to me because they are the sorts of things we actually encounter in the real world. If you haven’t encountered this sort of thing, just spend some time with a toddler, and you will be quickly disabused of the notion that there could not exist an agent which demands impossible things.
I think some people do, or at least try to, but my impression of the state of computer-assisted proofs and formal verification methods for programs is that they are still not very good because the problem is incredibly complex and we’ve basically only made it to the level of having FORTRAN-level tools. This is to say, we’re a bit better off than we used to be doing formal verification with assembly-level tools where you had to specify absolutely everything in very low-level terms, but mostly in ways that just make it easier to do that low-level work rather than having many useful abstractions to help us perform formal verification without having to understand the details of (almost) everything all the time.
To continue the analogy, things will get a little more exciting as we get C and then C++ level tools, but I think things won’t really explode and be appealing to many folks who don’t desperately need to do formal verification until we get to something like the Python/Ruby-level in terms of tooling.
This does suggest something interesting, though: if someone thinks more widely using formal verification is important, especially in AI, then a straight-forward approach is to work on improving formal verification tools to a point that they can build up the abstractions that will help people work with them.
I agree with this point. Looking at the things that have won over time it eventually got to feel like it wasn’t worth bothering to submit anything because the winners were going to end up mostly being folks who would have done their work anyway and meet certain levels of prestige. In this way I do sort of feel like the prize failed because it was set up in a way that rewarded work that would have happened anyway and failed to motivate work that wouldn’t have happened otherwise. Maybe it’s only in my mind that the value of a prize like this is to increase work on the margin rather than recognize outstanding work that would have otherwise been done, but I feel like beyond the first round it’s been a prize of the form “here’s money for the best stuff on AI alignment in the last x months” rather than “here’s money to make AI alignment research happen that would otherwise not have happened”. That made me much less interested in it, to the point I put the prize out of my mind until I saw this post reminding me of it today.
Nice. I’m surprised at the lack of comments and votes; maybe this just didn’t engage for most people?
This is the sort of approach I don’t personally like very much, i.e. laying out a whole lot of steps to take with instructions along the way. I tend to prefer more a “here’s the one or two things to do that capture the essence of what you’re after, then you fill in the details” approach for myself, but it seems like a lot of people like to work in a way like you lay out here. Having more of these kinds of tools seems useful, since some seem to resonate with some people more than others. Besides, I think no matter what our specific methods they all do the same thing anyway: if you’ve not see it already, you might like my writing on the general pattern of personal growth cycles where I talk about that. I see the general pattern expressed pretty clearly in what you lay out here with reasonable amounts of detail about how to carry out each part of the process.