Related: alignment tax
Anirandis
Presumably it’d take less manpower to review each article that the AI’s written (i.e. read the citations & make sure the article accurately describes the subjects) than it would to write articles from scratch. I’d guess this is the case even if the claims seem plausible & fact-checking requires a somewhat detailed reading through of the sources.
Cheers for the reply! :)
integrate these ideas into your mind and it’s complaining loudly that you’re going to fast (although it doesn’t say it quite that way, I think this is a useful framing). Stepping away, focusing on other things for a while, and slowly coming back to the ideas is probably the best way to be able to engage with them in a psychologically healthy way that doesn’t overwhelm you
I do try! When thinking about this stuff starts to overwhelm me I can try to put it all on ice, usually some booze is required to be able to do that TBH.
But of course it’s also plausible that destructive conflict between aggressive civilizations leads to horrifying outcomes for us
Also, wouldn’t you expect s-risks from this to be very unlikely by virtue of (1) civilizations like this being very unlikely to have substantial measure over the universe’s resources, (2) transparency making bargaining far easier, and (3) few technologically advanced civilizations would care about humans suffering in particular as opposed to e.g. an adversary running emulations of their own species?
Since it’s my shortform, I’d quite like to just vent about some stuff.
I’m still pretty scared about a transhumanist future going quite wrong. It simply seems to me that there’s quite the conjunction of paths to “s-risk” scenarios: generally speaking, any future agent that wants to cause disvalue to us—or an empathetic agent—would bring about an outcome that’s Pretty Bad by my lights. Like, it *really* doesn’t seem impossible that some AI decides to pre-commit to doing Bad if we don’t co-operate with it; or our AI ends up in some horrifying conflict-type scenario, which could lead to Bad outcomes as hinted at here; etc. etc.
Naturally, this kind of outcome is going to be salient because it’s scary—but even then, I struggle to believe that I’m more than moderately biased. The distribution of possibilities seems somewhat trimodal: either we maintain control and create a net-positive world (hopefully we’d be able to deal with the issue of people abusing uploads of each other); we all turn to dust; or something grim happens. And the fact that some very credible people (within this community at least) also conclude that this kind of thing has reasonable probability further makes me conclude that I just need to somehow deal with these scenarios being plausible, rather than trying to convince myself that they’re unlikely. But I remain deeply uncomfortable trying to do that.
Some commentators who seem to consider such scenarios plausible, such as Paul Christiano, also subscribe to the naive view regarding energy-efficiency arguments over pleasure and suffering: that the worst possible suffering is likely no worse than the greatest possible pleasure is good. And that this may also be the case for humans. Even if this is the case, and I’m sceptical, I still feel that I’m too risk-averse. In that world I wouldn’t accept a 90% chance of eternal bliss with a 10% chance of eternal suffering. I don’t think I hold suffering-focused views; I think there’s a level of happiness that can “outweigh” even extreme suffering. But when you translate it to probabilities, I become deeply uncomfortable with even a 0.01% chance of bad stuff happening to me. Particularly when the only way to avoid this gamble is to permanently stop existing. Perhaps something an OOM or two lower and I’d be more comfortable.
I’m not immediately suicidal, to be clear. I wouldn’t classify myself as ‘at-risk’. But I nonetheless find it incredibly hard to find solace. There’s a part of me that hopes things get nuclear, just so that a worse outcome is averted. I find it incredibly hard to care about other aspects of my life; I’m totally apathetic. I started to improve and got mid-way through the first year of my computer science degree, but I’m starting to feel like it’s gotten worse. I’d quite like to finish my degree and actually meaningfully contribute to the EA movement, but I don’t know if I can at this stage. I’m guessing it’s a result of me becoming more pessimistic about the worst outcomes resulting in my personal torture, since that’s the only real change that’s occurred recently. Even before I became more pessimistic I still thought about these outcomes constantly, so I don’t think just a case of me thinking about them more.I take sertraline but it’s beyond useless. Alcohol helps, so at least there’s that. I’ve tried quitting thinking about this kind of thing—I’ve spent weeks trying to shut down any instance where I thought about it. I failed.
I don’t want to hear any over-optimistic perspectives on these issues. I’d greatly appreciate any genuine, sincerely held opinions on them (good or bad), or advice on dealing with the anxiety. But I don’t necessarily need or expect a reply; I just wanted to get this out there. Even if nobody reads it. Also, thanks a fuckton to everyone who was willing to speak to me privately about this stuff.
Sorry if this type of post isn’t allowed here, I just wanted to articulate some stuff for my own sake somewhere that I’m not going to be branded a lunatic. Hopefully LW/singularitarian views are wrong, but some of these scenarios aren’t hugely dependent on an imminent & immediate singularity. I’m glad I’ve written all of this down. I’m probably going to down a bottle or two of rum and try to forget about it all now.
Thanks for the response; I’m still somewhat confused though. The question was to do with the theoretical best/worst things possible, so I’m not entirely sure whether parallels to (relatively) minor pleasures/pains are meaningful here.
Specifically I’m confused about:
Then you end up into well, to what extent is that a debunking explanation that explains why humans in terms of their capacity to experience joy and suffering are unbiased but the reality is still biased
I’m not really sure what’s meant by “the reality” here, nor what’s meant by biased. Is the assertion that humans’ intuitive preferences are driven by the range of possible things that could happen in the ancestral environment & that this isn’t likely to match the maximum possible pleasure vs. suffering ratio in the future? If so, how does this lead one to end up concluding it’s worse (rather than better)? I’m not really sure how these arguments connect in a way that could lead one to conclude that the worst possible suffering is a quadrillion times as bad as the best bliss is good.
I’m not sure if this is the right place to ask this, but does anyone know what point Paul’s trying to make in the following part of this podcast? (Relevant section starts around 1:44:00)
Suppose you have a P probability of the best thing you can do and a one-minus P probably the worst thing you can do, what does P have to be so it’s the difference between that and the barren universe. I think most of my probability is distributed between you would need somewhere between 50% and 99% chance of good things and then put some probability or some credence on views where that number is a quadrillion times larger or something in which case it’s definitely going to dominate. A quadrillion is probably too big a number, but very big numbers. Numbers easily large enough to swamp the actual probabilities involved
[ . . . ]
I think that those arguments are a little bit complicated, how do you get at these? I think to clarify the basic position, the reason that you end up concluding it’s worse is just like conceal your intuition about how bad the worst thing that can happen to a person is vs the best thing or damn, the worst thing seems pretty bad and then the like first-pass responses, sort of have this debunking understanding, or we understand causally how it is that we ended up with this kind of preference with respect to really bad stuff versus really good stuff.
If you look at what happens over evolutionary history. What is the range of things that can happen to an organism and how should an organism be trading off like best possible versus worst possible outcomes. Then you end up into well, to what extent is that a debunking explanation that explains why humans in terms of their capacity to experience joy and suffering are unbiased but the reality is still biased versus to what extent is this then fundamentally reflected in our preferences about good and bad things. I think it’s just a really hard set of questions. I could easily imagine maybe shifting on them with much more deliberation.
It seems like an important topic but I’m a bit confused by what he’s saying here. Is the perspective he’s discussing (and puts non-negligible probability on) one that states that the worst possible suffering is a bajillion times worse than the best possible pleasure, and wouldn’t that suggest every human’s life is net-negative (even if your credence on this being the case is ~.1%)? Or is this just discussing the energy-efficiency of ‘hedonium’ and ‘dolorium’, in which case it’s of solely altruistic concern & can be dealt with by strictly limiting compute?
Also, I’m not really sure if this set of views is more “a broken bone/waterboarding is a million times as morally pressing as making a happy person”, or along the more empirical lines of “most suffering (e.g. waterboarding) is extremely light, humans can experience far far far far far^99 times worse; and pleasure doesn’t scale to the same degree.” Even a tiny chance of the second one being true is awful to contemplate.
- 24 Feb 2022 23:55 UTC; 1 point) 's comment on Open Thread: Spring 2022 by (EA Forum;
we ask the AGI to “make us happy”, and it puts everyone paralyzed in hospital beds on dopamine drips. It’s not hard to think that after a couple hours of a good high, this would actually be a hellish existence, since human happiness is way more complex than the amount of dopamine in one’s brain (but of course, Genie in the Lamp, Mida’s Touch, etc)
This sounds much better than extinction to me! Values might be complex, yeah, but if the AI is actually programmed to maximise human happiness then I expect the high wouldn’t wear off. Being turned into a wirehead arguably kills you, but it’s a much better experience than death for the wirehead!
(I’ve actually read in a popular lesswrong post about s-risks Paul clearly saying that the risk of s-risk was 1/100th of the risk of x-risk (which makes for even less than 1/100th overall). Isn’t that extremely naive, considering the whole Genie in the Lamp paradigm? How can we be so sure that the Genie will only create hell 1 time for each 100 times it creates extinction?)
I think the kind of Bostromian scenario you’re imagining is a slightly different line of AI concern than the types that Paul & the soft takeoff crowd are concerned about. The whole genie in the lamp thing, to me at least, doesn’t seem likely to create suffering. If this hypothetical AI values humans being alive & nothing more than that, it might separate your brain in half so that it counts as 2 humans being happy, for example. I think most scenarios where you’ve got a boundless optimiser superintelligence would lead to the creation of new minds that would perfectly satisfy its utility function.
I’m way more scared about the electrode-produced smiley faces for eternity and the rest. That’s way, way worse than dying.
FWIW, it seems kinda weird to me that such an AI would keep you alive… if you had a “smile-maximiser” AI, wouldn’t it be indifferent to humans being braindead, as long as it’s able to keep them smiling?
I’d like to have Paul Christiano’s view that the “s-risk-risk” is 1⁄100 and that AGI is 30 years off
I think Paul’s view is along the lines of “1% chance of some non-insignificant amount of suffering being intentionally created”, not a 1% chance of this type of scenario.[1]
Could AGI arrive tomorrow in its present state?
I guess. But we’d need to come up with some AI model tomorrow, and this model suddenly becomes agentive and rapidly grows in power, and this model is designed with a utility function that values keeping humans alive but does not value humans flourishing… and even then, there’d likely be better ways to e.g. maximise the number of smiles in the universe, by using artificially created minds.
Eliezer has written a bit about this, but I think he considers it a mostly solved problem.
What can I do as a 30 year old from Portugal with no STEM knowledge? Start learning math and work on alignment from home?
Probably get treatment for the anxiety and try to stop thinking about scenarios that are very unlikely, albeit salient in your mind. (I know, speaking from experience, that it’s hard to do so!)
- ^
I did, coincidentally, cold e-mail Paul a while ago to try to get his model on this type of stuff & got the following response:
“I think these scenarios are plausible but not particularly likely. I don’t think that cryonics makes a huge difference to your personal probabilities, but I could imagine it increasing them a tiny bit. If you cared about suffering-maximizing outcomes a thousand times as much as extinction, then I think it would be plausible for considerations along these lines to tip the balance against cryonics (and if you cared a million times more I would expect them to dominate). I think these risks are larger if you are less scope sensitive since the main protection is the small expected fraction of resources controlled by actors who are inclined to make such threats.”
TBH it’s difficult to infer a particular probability estimate for one’s individual probability without cryonics or voluntary uploading here; it’s not completely clear just how bad a scenario would have to be (for a typical biological human) in order to fall within the class of scenarios described as ‘plausible but not particularly likely’.
- ^
I think the problem is very likely to be resolved by different mechanisms based on trust and physical control rather than cryptography.
Do you expect these mechanisms to also resolve the case where a biological human is forcibly uploaded in horrible conditions?
Lurker here; I’m still very distressed after thinking about some futurism/AI stuff & worrying about possibilities of being tortured. If anyone’s willing to have a discussion on this stuff, please PM!
I know I’ve posted similar stuff here before, but I could still do with some people to discuss infohazardous s-risk related stuff that I have anxieties with. PM me.
a
Evolution “wants” pain to be a robust feedback/control mechanism that reliably causes the desired amount of avoidance—in this case, the greatest possible amount.
I feel that there’s going to be a level of pain for which a mind of nearly any level of pain tolerance would exert 100% of its energy to avoid. I don’t think I know enough to comment on how much further than this level the brain can go, but it’s unclear why the brain would develop the capacity to process pain drastically more intense than this; pain is just a tool to avoid certain things, and it ceases to become useful past a certain point.
There are no cheap solutions that would have an upper cut-off to pain stimuli (below the point of causing unresponsiveness) without degrading the avoidance response to lower levels of pain.
I’m imagining a level of pain above that which causes unresponsiveness, I think. Perhaps I’m imagining something more extreme than your “extreme”?
It is to be expected that humans who are actively trying to cause pain (or to imagine how to do so) will succeed in causing amounts of pain beyond most anything found in nature.
Yeah, agreed.
I’m unsure that “extreme” would necessarily get a more robust response, considering that there comes a point where the pain becomes disabling.
It seems as though there might be some sort of biological “limit” insofar as there are limited peripheral nerves, the grey matter can only process so much information, etc., and there’d be a point where the brain is 100% focused on avoiding the pain (meaning there’d be no evolutionary advantage to having the capacity to process additional pain). I’m not really sure where this limit would be, though. And I don’t really know any biology so I’m plausibly completely wrong.
I think the idea is that the 4th scenario is the case, and you can’t discern whether you’re the real you or the simulated version, as the simulation is (near-) perfect. In that scenario, you should act in the same way that you’d want the simulated version to. Either (1) you’re a simulation and the real you just won $1,000,000; or (2) you’re the real you and the simulated version of you thought the same way that you did and one-boxed (meaning that you get $1,000,000 if you one-box.)
If Trump loses the election, he’s not the president anymore and the federal bureaucracy and military will stop listening to him.
He’d still be president until Biden’s inauguration though. I think most of the concern is that there’d be ~3 months of a president Trump with nothing to lose.
If anyone happens to be willing to privately discuss some potentially infohazardous stuff that’s been on my mind (and not in a good way) involving acausal trade, I’d appreciate it—PM me. It’d be nice if I can figure out whether I’m going batshit.
it’s much harder to know if you’ve got it pointed in the right direction or not
Perhaps, but the type of thing I’m describing in the post is more preventing worse-than-death outcomes even if the sign is flipped (by designing a reward function/model in such a way that it’s not going to torture everyone if that’s the case.)
This seems easier than recognising whether the sign is flipped or just designing a system that can’t experience these sign-flip type errors; I’m just unsure whether this is something that we have robust solutions for. If it turns out that someone’s figured out a reliable solution to this problem, then the only real concern is whether the AI’s developers would bother to implement it. I’d much rather risk the system going wrong and paperclipping than going wrong and turning “I have no mouth, and I must scream” into a reality.
Because you’re imagining AGI keeping us in a box? Or that there’s a substantial probability on P(humans are deliberately tortured | AGI) that this post increases?