Thanks! This is an interesting angle I wasn’t much thinking about.
AnthonyC
I anticipate this will lead to some interesting phrasing choices around the multiple meanings of “conception” as the discussions on what and how and whether AI’s ‘really’ think continue to evolve.
There’s a story about trained dolphins. The trainer gave them fish for doing tricks, which worked great. Then they decided to only give them fish for novel tricks. The dolphins, trained under the old method, ran through all the tricks they knew, got frustrated for a while, then displayed a whole bunch of new tricks all at once.
Among animals, RL can teach specific skills but also reduces creativity in novel contexts. You can train creative problem solving, but in most cases, when you want control of outcomes, that’s not what you do. The training for creativity is harder, and less predictable, and requires more understanding and effort from the trainer.
Among humans, there is often a level where the more capable find supposedly simple questions harder, often because they can see all the places where the question assumes a framework that is not quite as ironclad as the asker thinks. Sometimes this is useful. More often it is a pain for both parties. Frequently the result is that the answerer learns to suppress their intelligence instead of using it.
In other words—this post seems likely to be about what this not-an-AI-expert should expect to happen.
He makes some bizarre statements, such as that if you have a rare gene that might protect you from the AI having enough data to have ‘a good read’ on you, and that genetic variation will ‘protect you from high predictability.’
You know, even if this were true… if you’re a less unpredictable entity in a world where sufficiently power AI wants to increase predictability, there are many simple and obvious classes of interventions that reliably achieve that. Mostly, those interventions look nothing like freedom, and you’re not going to like them.
I’m sympathetic to the point of view that this is necessary, though I wouldn’t call it “the” answer—I don’t think we can have high enough confidence that it is sufficient. That said, while you mention the reasons for skepticism of applying existing legal frameworks (which I agree with!), I think the hard step is writing the proposed new rules down.
What does a clear legal mandate look like? What are the requirements, which we are capable of writing down with legally-enforceable precision, that would (or at least could) be adequate without being predictably thrown out by courts or delayed so long they end up not mattering? How many people exist who are capable of making the necessary evaluations, and is the government capable of hiring enough of them?
There’s no reason for me to think that my personal preferences (e.g. that my descendants exist) are related to the “right thing to do”, and so there’s no reason for me to think that optimizing the world for the “right things” will fulfil my preference.
This, and several of the passages in your original post such as, “I agree such a definition of moral value would be hard to justify,” seem to imply some assumption of moral realism that I sometimes encounter as well, but have never really found convincing arguments for. I would say that the successionists you’re talking to are making a category error, and I would not much trust their understanding of ‘should’-ness outside normal day-to-day contexts.
In other words: it sounds like you don’t want to be replaced under any conditions you can foresee.. You have judged. What else is there?
I can’t really imagine a scenario where I “should” or would be ok with currently-existing-humans going extinct, though that doesn’t mean none could exist. I can, however, imagine a future where humanity chooses to cease (most?) natural biological reproduction in favor of other methods of bringing new life into the world, whether biological or artificial, which I could endorse (especially if we become biologically or otherwise immortal as individuals). I can further imagine being ok with those remaining biological humans each changing (gradually or suddenly) various aspects of their bodies, their minds, and the substrates their minds run on, until they are no longer meat-based and/or no longer ‘human’ in various ways most people currently understand the term.
I know this can all be adequately explained by perfectly normal human motivations, but there’s still a small part of me that wonders if some of the unfortunate changes are being influenced by some of the very factors (persuasion, deception, sandbagging, etc.) that are potentially so worrying.
Whenever something like this comes up, I have a standard reply that looks like this: “Ok, sure, neither of us can define a ‘salad,’ but would you agree that we can can define an ordinal scale of ‘saladness’ and mostly agree on the relative orderings of different things along it? Good, then I think we’re done here, all that’s left is the social convention about where along that axis we stop using the word.”
Also, I recently had (a self-aware and joking version of) this exact conversation where someone said ‘purple’ isn’t real. Again, simple answer: From a physics perspective, obviously there are no ‘real’ color categories. From a psychological perspective, obviously we use a three-axis color representation, where ‘purple’ is as well-defined as any other combination. The answer is entirely about why you’re asking the question.
This does not imply that the simulation is run entirely in linear time, or at a constant frame rate (or equivalent), or that details are determined a priori instead of post hoc. It is plausible such a system could run a usually-convincing-enough simulation at lower fidelity, back-calculate details as needed, and modify memories to ignore what would have been inconsistencies when doing so is necessary or just more useful/tractable. ‘Full detail simulation at all times’ is not a prerequisite for never being able to find and notice a flaw, or for getting many kinds of adequate high level macroscopic outputs.
In other words: If I want to convince you something is a real tree, it needs to look and feel like a tree, but it doesn’t need an exact, well-defined wave-function. Classical approximations at tens of microns scale are about the limit of unaided human perception. If you pull out a magnifying glass or a scanning electron microscope, then you can fill in little pieces of the remaining whole, but you still aren’t probing the whole tree down to the Planck scale.
That’s true, and you’re right, the way I wrote my comment overstates the case. Every individual election is complicated, and there’s a lot more than one axis of variation differentiation candidates and voters. The whole process of Harris becoming the candidate made this particular election weird in a number of ways. And as a share of the electorate, there are many fewer swing voters than there used to be a few decades ago, and not conveniently sorted into large, coherent blocks.
And yet, it’s also true that as few as ~120,000 votes in WI, MI, and PA could have swung the result, three moderate states that have flipped back and forth across each of the past four presidential elections. Only slightly more for several other combinations of states. It’s not some deep mystery who lives in the rust belt, and what positions on issues a few tens of thousands of voters who are on the fence might care about. It’s not like those issues are uncorrelated, either. And if you look at the last handful of elections, a similar OOM of voters in a similar set of states could have swung things either way, each time.
And it’s true that Harris underperformed Biden-2024 by vote share in every state but Utah (and 37.7% vs 37.8% does not matter to the outcome in any plausible scenario). If I’m reading the numbers correctly she also received fewer votes numerically than Biden in all but 6 states.
So yes: I can very easily imagine scenarios where you’re right, and the fact that we don’t meet the theoretical assumptions necessary for the median voter theorem to apply means we can’t assume an approximation of it in practice. It’s even possible, if the Dems had really started organizing sustained and wide-ranging GOTV campaigns fifteen years ago, that there could be the kinds of blue wave elections I keep getting told are demographic inevitabilities just around the corner, as long as they keep moving further towards the current set of progressive policy goals. But what I cannot imagine is that, in July 2024, in the country as it actually existed, Harris wouldn’t have done better by prioritizing positions (many of which she actually already said she held!) that a relative handful of people in an actual handful of states consistently say they care the most about, and explaining why Trump’s usually-only-vaguely-gestured-at plans would make many of their situations worse. Would it have been enough? I don’t know. But it is a better plan than what happened, if what you want is to win elections in order to govern.
Apparently the new ChatGPT model is obsessed with the immaculate conception of Mary
I mean, “shoggoth” is not that far off from biblically accurate angels… ;-)
I’d say that in most contexts in normal human life, (3) is the thing that makes this less of an issue for (1) and (2). If the thing I’m hearing about it real, I’ll probably keep hearing about it, and from more sources. If I come across 100 new crazy-seeming ideas and decide to indulge them 1% of the time, and so do many other people, that’s usually, probably enough to amplify the ones that (seem to) pan out. By the time I hear about the thing from 2, 5, or 20 sources, I will start to suspect it’s worth thinking about at a higher level.
Exactly. More fundamentally, that is not a probability graph, it’s a probability density graph, and we’re not shown the line beyond 2032 but just have to assume the integral from 2100-->infinity is >10% of the integral from 0-->infinity. Infinity is far enough away that the decay doesn’t even need to be all that slow for the total to be that high.
I second what both @faul_sname and @Noosphere89 said. I’d add: Consider ease and speed of integration. Organizational inertia can be a very big bottleneck, and companies often think in FTEs. Ultimately, no, I don’t think it makes sense to have anything like 1:1 replacement of human workers with AI agents. But, as a process occurring in stages over time, if you can do that, then you get a huge up-front payoff, and you can use part of the payoff to do the work of refactoring tasks/jobs/products/companies/industries to better take advantage of what else AI lets you do differently or instead.
“Ok, because I have Replacement Agent AI v1 I was able to fire all the people with job titles A-D, now I can hire a dozen people to figure out how to use AI to do the same for job titles E through Q, and then another dozen to reorganize all the work that was being done by A-Q into more effective chunks appropriate for the AI, and then three AI engineers to figure out how to automate the two dozen people I just had to hire...”
This was really interesting, thanks! Sorry for the wall of text. TL:DR version:
I think these examples reflect, not quite exactly willingness to question truly fundamental principles, but an attempt at identification of a long-term vector of moral trends, propagated forward through examples. I also find it some combination of suspicious/comforting/concerning that none of these are likely to be unfamiliar (at least as hypotheses) to anyone who has spent much time on LW or around futurists and transhumanists (who are probably overrepresented in the available sources regarding what humans think the world will be like in 300 years).
To add: I’m glad you mentioned in a comment that you removed examples you thought would lead to distracting object-level debates, but I think at minimum you should mention that in the post itself. It means I can’t trust anything else I think about the response list, because it’s been pre-filtered to only include things that aren’t fully taboo in this particular community. I’m curious if you think the ones you removed would align with the general principles I try to point at in this comment, or if they have any other common trends with the ones you published?
Longer version:
My initial response is, good work, although… maybe my reading habits are just too eclectic to have a fair intuition about things, but all of these are familiar to me, in the sense that I have seen works and communities that openly question them. It doesn’t mean the models are wrong—you specified not being questioned by a ‘large’ group. The even-harder-than-this problem I’ve yet to see models handle well is genuine whitespace analysis of some set of writings and concepts. Don’t get me wrong, in many ways I’m glad the models aren’t good at this yet. But that seems like where this line of inquiry is leading? I’m not even sure if that’s fundamentally important for addressing the concerns in question—I’ve been known to say that humans have been debating details of the same set of fundamental moral principles for as far back as we have records. And also, keep in mind that within the still-small-but-growing-and-large-enough-for-AI-to-easily-recognize community of EA there are or have been open debates about things like “Should we sterilize the biosphere?” and “Obviously different species have non-zero, but finite and different, levels of intrinsic moral worth more, so does that mean they might be more important than human welfare?” It’s really hard to find a taboo that’s actually not talked about semi-publicly in at least some searchable forums.
I do kinda wish we got to see the meta-reasoning behind how the models picked these out. My overall sense is that to the degree moral progress is a thing at all, it entails a lot of the same factors as other kinds of progress. A lot of our implementations of moral principles are constrained by necessity, practicality, and prejudice. Over time, as human capabilities advance, we get to remove more of the epicycles and make the remaining core principles more generally applicable.
For example, I expect at some point in the next 300 years (plausibly much much sooner) humanity will have the means to end biological aging. This ends the civilizational necessity of biological reproduction at relatively young ages, and probably also the practical genetic problems caused by incest. This creates a fairly obvious set of paths for “Love as thou wilt, but only have kids when you can be sure you or someone else will give them the opportunity for a fulfilling life such that they will predictably agree they would have wanted to be created” to overcome our remaining prejudices and disgust responses and become more dominant.
Also, any taboo against discussing something that is fundamentally a measurable or testable property of the world is something I consider unlikely to last into the far future, though taboos against discussing particular responses to particular answers to those questions might last longer.
@jbash made the good point that some of these would have been less taboo 300 years ago. I think that also fits the mold. 500 years ago Copernicus (let alone the ancient Greeks millennia prior) faced weaker taboos against heliocentrism than Galileo in part because in his time the church was stronger and could tolerate more dissent. And 300 years ago questioning democracy was less taboo than now in part because there were still plenty of strong monarchs around making sure people weren’t questioning them, and that didn’t really reverse until the democracies were strong but still had to worry about the fascists and communists.
Credit cards are kind of an alternative to small claims court, and there are various reputational and other reasons that allow ordinary business to continue even if it is not in practice enforced by law.
True, but FWIW this essentially puts unintelligible enforcement in the hands of banks instead of the police. Which is probably a net improvement, especially under current conditions. But it does have its own costs. My wife is on the board of a nonprofit that last year got a donation, then the donor’s spouse didn’t recognize the charge and disputed it. The donor confirmed verbally and in writing, both to the nonprofit and to the credit card company, that the charge was valid. The nonprofit provided all requested documentation and another copy of the donor’s written confirmation. The credit card company refused both to reinstate the charge and to reverse the fee.
I walked around the neighborhood and took some photos.
As far as I’m concerned, this is almost literally zero evidence of anything, in any inhabited area, except to confirm or deny very specific, narrow claims. To assume otherwise you’d have to look my own photos from my last few years of traveling and believe no one ever goes to national parks and the visitation numbers and reports people write of crowding and bad behavior are all lies.
As things stand today, if AGI is created (aligned or not) in the US, it won’t be by the USG or agents of the USG. I’ll be by a private or public company. Depending on the path to get there, there will be more or less USG influence of some sort. But if we’re going to assume the AGI is aligned to something deliberate, I wouldn’t assume AGI built in the US is aligned to the current administration, or at least significantly less so than the degree to which I’d assume AGI built in China by a Chinese company would be aligned to the current CCP.
For more concrete reasons regarding national ideals, the US has a stronger tradition of self-determination and shifting values over time, plausibly reducing risk of lock-in. It has a stronger tradition (modern conservative politics notwithstanding) of immigration and openness.
In other words, it matters a lot whether the aligned US-built AGI is aligned to the Trump administration, the Constitution, the combined writings of the US founding fathers and renowned leaders and thinkers, the current consensus of the leadership at Google or OpenAI, the overall gestalt opinions of the English-language internet, or something else. I don’t have enough understanding to make a similar list of possibilities for China, but some of the things I’d expect it would include don’t seem terrible. For example, I don’t think a genuinely-aligned Confucian sovereign AGI is anywhere near the worst outcome we could get.
I won’t comment on your specific startup, but I wonder in general how an AI Safety startup becomes a successful business. What’s the business model? Who is the target customer? Why do they buy? Unless the goal is to get acquired by one of the big labs, in which case, sure, but again, why or when do they buy, and at what price? Especially since they already don’t seem to be putting much effort into solving the problem themselves despite having better tools and more money to do so than any new entrant startup.
I really, really hope at some point the Democrats will acknowledge the reason they lost is that they failed to persuade the median voter of their ideas, and/or adopt ideas that appeal to said voters. At least among those I interact with, there seems to be a denial of the idea that this is how you win elections, which is a prerequisite for governing.
My instinctive response is: weight classes are for controlled competitions where fairness is what we actually want. For social status games, if you want to enforce weight classes, you need a governing body who gets to define the classes and define the rules of the game, but the rules of social status games frequently include being not fully expressible in precise terms. This isn’t necessarily a showstopper, but it necessarily includes admitting what range of the hierarchy you’re in and cannot rise above. As I understand it, the reason the self-sorting works today is that when people compete in the wrong weight classes, it’s not fun for either side. A Jupiter Brain might theoretically be amenable to playing a social game with me on my level, but at best it would be like me playing tic-tac-toe with a little kid, where the kid is old enough to realize I’m throwing the game but not old enough to have solved the game.
Personally I’d much rather not spend my time on such games when it is possible to manage that. But I don’t always have that choice now, and probably still won’t at least sometimes in the future.