I do agree that I can imagine a very intelligent machine that is completely unable to gather information about its own status. Edge case!
I think it’s less edge than it might at first seem. Even far more complex and powerful GPT-like models may be incapable of self-awareness, and we may deliberately create such systems for AI safety reasons.
Using the definition proposed in the article, I can easily imagine an entity with an “internal observer” which has full knowledge of its own conditions, still without the ability to feel pleasure or pain.
I think it would have to never have any preferences for its experiences (or anticipated future experiences), not just pleasure or pain. It is conceivable though, so I agree that my supposition misses the mark.
How would you define self-awareness?
As in the post, self-awareness to me seems vaguely defined. I tend to think of both self-awareness and sapience as being in principle more like scales of many factors than binary categories.
In practice we can easily identify capabilities that separate us from animals in reasoning, communicating, and modelling, and proudly proclaim ourselves to be sapient where all the rest aren’t. This seems pretty clear cut, but various computer systems seem to be crossing the animal-human gap in some aspects of “sapience” and already surpassed humans in some others.
Self-awareness seems to be harder to find a clear boundary, and really scales seem more appropriate here. There are various “self-aware” behaviour clues, and they seem to cover a range of different capabilities. In general I’d probably call it being able to model their own actions, experiences, and capabilities in some manner, and update those models.
Cats seem to be above whatever subjective “boundary” I have in mind, even though most of them fail the mirror test. I suspect the mirror test is further toward “sapience”, such as being able to form a new model that predicts the actions of the reflection perfectly in terms of their own actions as viewed from the outside. I suspect that cats do have a pretty decently detailed self-model, but mostly can’t do that last rather complex transformation to match up the outside view to their internal model.
I’m not sure why your Venn diagram doesn’t have “sentient” lying entirely within “conscious”, given that your definitions have the latter as a logical requirement for the former. Part of your pink impossible area seems to exclude that, but that pink area also includes other things that you say are impossible without given any reason in the text.
Why is sapience without sentience or self-awareness marked “impossible”? It seems plausible for a computer program to be able to receive, process, and send complex world models without being able to model itself in them. GPT-like systems (now or in the future) may well fall into this category. Even if such a system can model “GPT-like systems”, there’s probably not yet any “self” concept to be able to apply models like “the output I am about to produce will be the output of a GPT-like system”.
The remaining “impossibility” of sapience and sentience without self-awareness does seem implausible, but I don’t know that it’s logically impossible. It would take some pretty major blind spot in what sort of models you can form to be able to experience sensations, be able to form complex models, and yet not be able to model self in many ways.
It may be that sentience is literally just the intersection of self-awareness and consciousness, or that both of these may collapse into just a scale of self-awareness. Consciousness may be nothing more or less than self-awareness. Maybe pZombies can’t exist, even though we can imagine such things.
I suspect that it’s even worse: that even the concept of correlation of difficulty is irrelevant and misleading. Your illustrations show a range of values for “difficulty for humans” and “difficulty for computers” of around the same scale.
My thesis is that this is completely illusory. I suspect that problems are not 1-dimensional, that their (computational) difficulties can be measured on multiple scales. I further expect that these scales cover many orders of magnitude, and that the range of difficulty that humans find “easy” to “very difficult” covers in most cases one order of magnitude or less. On a logarithmic scale from 0 to 10 for “visual object recognition” capability for example, humans might find 8 easy and 9 difficult, while on a scale of 0 to 10 for “symbolic manipulation” humans might find 0.5 easy (3 items) and 1.5 (30 items) very difficult.
At any given time in our computing technology development, the same narrow range effect occurs, but unlike human ranges they change substantially over time. We build generally faster computers, and all the points move down. We build GPUs, and problems in the “embarrassingly parallel” scales move down even more. We invent better algorithms, and some other set of graphs has its points move down.
Note that the problems (points) in each scale can still have a near-perfect correlation between difficulty for computers and difficulty for humans. In any one scale, problems may lie on a nearly perfectly straight line. Unlike the diagrams in the post though, at any given time there is essentially nothing in the middle. For any given scale, virtually all the data points are off the charts in three of the four quadrants.
The bottom left off-the-scale quadrant is the Boring quadrant, trivial for both humans and computers and of interest to nobody. The top right is the Impossible quadrant, of interest only to science fiction writers. The other two are the Moravec quadrants, trivial for one and essentially impossible for the other.
Over time, the downward motion of points due to technological progress means that now and then the line for some small subclass class of problems briefly overlaps the “human capability” range. Then we get some Interesting problems that are not Moravec! But even in this class, the vast majority of problems are still Boring or Impossible and so invisible. The other classes still have lots of Moravec problems so the “paradox” still holds.
Someone on the internet may also have fairly high confidence (>50%) of getting 5x returns on one of two or three bets on a roulette table, but that doesn’t make it a good idea.
Crypto markets are rather more opaque and subject to both mistakes and various forms of fraud than roulette tables. Even when certain bets might be fundamentally sound, the various steps in the Rube Goldberg financial contraptions you have to work through to carry them out each add their own risks, usually large enough to wipe out the expected value of the bet.
To put it another way: just as people sometimes say “if you’ve never missed a flight, you’re spending too much time in airports,” I think that if we never have a comment section that devolves into chaos and requires moderator intervention, we’re staying way too far away from a domain where it’s really important to be developing sanity-inducing social technology.
Wait what? People actually say that first thing? The expected utility loss due to consequences of missing a flight are usually vastly greater than the time wasted by aiming to get there earlier. If people do say that, I suspect they must be the jet-setting elites who fly more than a hundred times in their life.
Apart from a terrible analogy, your point is perhaps well made. The utility loss from (very occasional!) chaotic messes that need moderators to take action may well be outweighed by the benefits of examining the Sanity Devouring Pit more closely without falling in.
On the other hand, I see that quite a few of the comments to this post take issue with the specific somewhat-political examples given, and not with the concept they were intended to illustrate. I felt the pull of the Pit myself, before reminding myself that the examples are not themselves the concept and that refuting an example has very little weight on whether the concept is useful.
Is the concept useful? Well, the adjective doesn’t seem useful. All options are fabricated, in that they have been created. The connotation is also “a fabrication”, meaning a deliberate untruth. The untruth aspect is fine: all options are untrue to some extent, in that they are based on flawed models that never correspond exactly with reality. Are “fabricated” options deliberately untrue? It seems more likely to me that the crucial distinction is just that the model behind it is more critically flawed, and whether that is accidental or deliberate is irrelevant.
With some reservations about the name, it does seem to be a useful concept, with the proviso that in practice these things seem to be on a scale rather than a dichotomy.
It is related in the sense that if your prior for sensitivity is uniform, then the posterior is that beta distribution.
In my case I did not have a uniform prior on sensitivity, and did have a rough prior distribution over a few other factors I thought relevant, because reality is messy. Certainly don’t take it as “this is the correct value”, and the approach I took almost certainly has some major holes in it even given the weasel-words I used.
(humans are much more likely than AIs to extrapolate into the “actively evil” zone, rather than the “lethally indifferent”)
It seems to me that use of the term “actively evil” is itself guided by being part of our training data.
Lots of things called “actively evil” possibly achieve that designation just because they’re things that humans have already done and have been judged evil. Now actions of this type are well-known to be evil, so humans choosing them can really only be through an active choice to do it anyway, presumably because it’s viewed as necessary to some goal that supersedes that socially cached judgement.
I don’t see why an AI couldn’t reason in the same way: knowing (in some sense) that humans judge certain actions and outcomes as evil, disregarding that judgement and doing it anyway due to being on a path to some instrumental or terminal goal. I think that would be actively evil in the same sense that many humans can be said to be actively evil.
Do you mean that the space of possible actions that an AI explores might be so much larger than those explored by all humans in history combined, that it just by chance doesn’t implement any of the ones similar enough to known evil? I think that’s implausible unless the AI was actively avoiding known evil, and therefore at least somewhat aligned already.
Apart from that, it’s possible we just differ on the use of the term “lethally indifferent”. I take it to mean “doesn’t know the consequences of its actions to other sentient beings”, like a tsunami or a narrowly focused paperclipper that doesn’t have a model of other agents. I suspect maybe you mean “knows but doesn’t care”, while I would describe that as “actively evil”.
I hadn’t actually read the review, but yes I meant that the sample must have had 29 people who were known (through other means) to be positive for SARS-cov-2, and all tested positive.
Can you say more about how you got 96%?
Educated guessing, really. I did a few simple models with a spreadsheet for various prior probabilities including some that were at each end of being (subjectively, to me) reasonable. Only the prior for “this study was fabricated from start to finish but got through peer review anyway” made very much difference in the final outcome. (If you have 10% or more weight on that, or various other “their data can’t be trusted” priors then you likely want to adjust the figure downward)
So with a rough guess at a prior distribution, I can look at the outcomes from the point of view of “what single value has the same end effect on evidence weight as this distribution”. I make it sound fancy, but it’s really just “if there was a 30th really positive test subject in these dozen or so possible worlds that I’m treating as roughly equally likely, and I only include possible worlds where the validation detected all of the first 29 cases, how often does that 30th test come up positive?” That come out at close to 96%.
Yes, and to go further: the value that humans bring to these economic activities, and on which they are bottlenecked, is almost entirely their mental capabilities.
There are a few jobs where people joke about just doing things that a trained monkey could do, but it’s only funny because a trained monkey couldn’t actually do them, especially when things don’t go entirely to plan. There are plenty of jobs that rely on social competence in dealing with other humans, but that’s still mostly mental capability.
There are also plenty of jobs that couldn’t (at least at first) be replaced by smart robots with AGI, but they’re not really so relevant to questions such as limits on economic growth.
Yes, there are whole constellations of frictionless, spherical cows in here.
I think at best you could say that accepting an offer of employment for $X/hr implies that you value your time less than $X/hr, but even this best-case interpretation is only true for the first few hours per week or whatever. Having enough income to avoid being homeless and worried about food is likely very much more valuable than whatever income comes in on top of that. What’s more, employment turns non-employment time into a scarcer resource.
Someone could work for $15/hr, but quite consistently value their remaining time at $100/hr due to a combination of decreasing utility of money and value of an increasingly scarce resource.
Likewise someone could work for $100/hr and it could still be consistent for them to value an hour of their time at $15/hr.
Real life is messy.
There are lots of assumptions in even assigning a monetary value to time-spent-now. However, I think you’re doing your analysis a huge disservice by explicitly ignoring discounting and investment. Over timescales of multiple decades, these dominate most other things!
I’m not even referring expressly to financial investment and capital production, but also to things like time investment in family, discounting due to compounding uncertainty and risks over time, and other factors. If an hour spent now is only worth as much as an hour at the end of someone’s career, they’ve pretty much failed in all of these things, because an hour at the end of someone’s career really shouldn’t be worth as much to them as an hour now.
An end-of-career hour may be worth more to somebody else, but that’s highly variable across professions.
You have to go a long way out from the concept of Boltzmann brains before “inability to have any coherent thoughts” ceases to be overwhelmingly dominant in probability to the point where you can’t even reasonably estimate how many orders of magnitude of zeroes precede the first nonzero digit in the probability of the rest.
Realistically, if this isn’t a fundamental property of whatever you’re talking about, then you’re not talking about Boltzmann brains at all.
The confidence interval in the Cepheid analysis does not inspire confidence.
Usually when a test claims “100% sensitivity”, it’s based on all members of some sample with the disease testing positive. The lower end of the 95% interval is a lower bound on the true sensitivity such that there would still be at least 5% chance of getting no false negatives.
That’s where it starts to look dodgy: Normally it would be 2.5% to cover upper and lower tails of the distribution, but there is no tail below zero false negatives. It looks like they used 2.5% anyway, incorrectly, so it’s really a 97.5% confidence interval. The other problem is that the positive sample size must have been only 29 people. That’s disturbingly small for a test that may be applied a billion times, and seriously makes me question their validation study that reported it.
There are a number of assumptions you can use to turn this into an effective false negative rate for Bayesian update purposes. You may have priors on the distribution of true sensitivities, priors on study validity, and so on. They don’t matter very much, since they mostly yield a distribution with an odds ratio geometric mean around the 15-40 range anyway. If I had to pick a single number based only on seeing their end result, I’d go with 96% sensitivity under their study conditions, whatever those were.
I’d lower my estimate for real life tests, since real life testing isn’t usually nearly as carefully controlled as a validation study, but I don’t know how much to lower it.
The future very rarely goes the way we predict.
That said, I never thought that flying cars were a reasonable expectation for the near future, mainly because the failure modes are terribly bad and keeping them from happening at intolerable rates is incredibly expensive. Even if we were very rich, they were powered by Mr Fusion, and piloted by infallible software, occasional mechanical defects alone would make them not something I’d want to use every day.
We do have nanotech already, just not the “build everything for free” magical wish fulfillment nanotech. More energy wouldn’t have helped that much, there are lots of very real problems at that scale that we still know little about solving. We may get further toward magical wish fulfillment nanotech in time, but it will take a lot of brainpower not horsepower.
Yes, the relative scale of future utility makes no difference in short-term decisions, though noting that short-term to an immortal here can still mean “in the next 10^50 years”!
It might make a difference in the case where someone who thought that they were immortal becomes uncertain of whether what they already experienced was real. That’s the sort of additional problem you get with uncertainty over risk though, not really a problem with bounded utility itself.
A large part of the stagnation in energy consumption seems to be that the cost of energy hasn’t gone down much. The sixties and seventies marked a point where identifying and exploiting natural energy sources (mostly coal and oil) started to increase more rapidly in capital cost per unit energy produced. It seems that the last fifty years have been the beginning of the end for that whole class of energy production, which fueled the past two hundred years of growth. We are only now seeing the results of investment into alternatives.
Solar power in particular has plummeted in cost by many orders of magnitude, which is truly amazing. There are signs that it may become cheaper still. Nuclear power may become both cheaper and safer, but regardless of whether regulation costs too much now, it might not have a very much lower cost floor. We don’t know, and it might be worth finding out.
The interesting thing about solar power in particular is that it is pretty much purely capital-based. Unlike oil wells and coal mines relying on specific deposits that become exhausted, the production of energy from sunlight increases almost linearly with the amount you invest in it up to a small but significant fraction of the planet’s surface, and will produce essentially forever. There’s an upper bound based on maintenance and replacement of the equipment, but that limit seems likely to be at least 10 times larger than our current total energy production.
That’s just one of multiple possible energy sources that we’re working on. There are certainly major transition problems, but in the long run I think we’re now starting to head out of energy stagnation.
There are lots of different ways to approach this question, but they do in fact end up in the same place: that it does make sense to assign blame for actions. This is not the same as saying that they should be punished for it, so let’s set aside punishment, rehabilitation, deterrence, and other surrounding factors.
How does it make sense to blame a person for not doing what they are incapable of doing?
If I have an oven with a thermostat that stops working without warning one day, I can certainly blame the oven for burning my meal. Ovens in general of its model and make and age are in general capable of maintaining the correct temperature, but this one was faulty. If it functioned correctly, then my meal would not have been burned. It had no choice to do otherwise, but that doesn’t stop it from being faulty. To a large extent blaming people just means the same thing: proclaiming that their decision-making was faulty. If they did something immoral (or failed to even attempt to perform a moral duty), then it was their moral decision-making that has been demonstrated to be faulty.
If fixing this fault was as easy and without side effects as replacing the thermostat in an oven, then we probably would just do that. Failing that, being able to say “this person’s moral decision-making was faulty” and consequently treating them differently from those who have not demonstrated faulty moral decision-making makes sense.
Yes, splitting the confounding factors out does help. There still seem to be a few misconceptions and confounding things though.
One is that bounded doesn’t mean small. On a scale where the welfare of the entire civilization of Ancient Egypt counts for 1 point of utility, the bound might still be more than 10^100.
Yes, this does imply that after 10^70 years of civilizations covering 10^30 planet-equivalents, the importance to the immortal of the welfare of one particular region of any randomly selected planet of those 10^30 might be less than that of Ancient Egypt. Even if they’re very altruistic.
This whole idea seems to be utterly divorced from what utility means. Fundamentally, utility is based on an ordering of preferences over outcomes. It makes sense to say that you don’t know what the actual outcomes will be, that’s part of decision under risk. It even makes sense to say that you don’t know much about the distribution of outcomes, that’s decision under uncertainty.
The phrasing here seems to be a confused form of decision making under uncertainty. Instead of the agent saying “I don’t know what the distribution of outcomes will be”, it’s phrased as “I don’t know what my utility function is”.
I think things will be much clearer when phrased in terms of decision making under uncertainty: “I know what my utility function is, but I don’t know what the probability distribution of outcomes is”.
Okay, so you are claiming that these tiny fragments of a particular discussion within a particular subdivision of a geographical fragment of the rationalist community represents evidence for the much broader claim you made about the rationalist community as a whole.
Well okay, you provided evidence so weaksauce as to be nearly water. If this was the best evidence you could provide, then I should update away from your claim, because I would have expected you to be able to provide much stronger evidence than that if it were true.