Shiroe

Karma: 267

Shiroe 19 Jul 2022 8:18 UTC
LW: 43 AF: 11
25
AF
on: Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover
Reading AI safety articles like this one, I always find myself nodding along in agreement. The conclusions simply follow from the premises, and the premises are so reasonable. Yet by the end, I always feel futility and frustration. Anyone who wanted to argue that AI safety was a hopeless program wouldn’t need to look any further than the AI safety literature! I’m not just referring to “death with dignity”. What fills me with dread and despair is paragraphs like this:
However, optimists often take a very empiricist frame, so they are likely to be interested in what kind of ML experiments or observations about ML models might change my mind, as opposed to what kinds of arguments might change my mind. I agree it would be extremely valuable to understand what we could concretely observe that would constitute major evidence against this view. But unfortunately, it’s difficult to describe simple and realistic near-term empirical experiments that would change my beliefs very much, because models today don’t have the creativity and situational awareness to play the training game. [original emphasis]
Here is the real chasm between the AI safety movement and the ML industry/academia. One field is entirely driven by experimental results; the other is dominated so totally by theory that its own practitioners deny that there can be any meaningful empirical aspect to it, at least, not until the moment when it’s too late to make any difference.
Years ago, I read an article about an RL agent wireheading itself via memory corruption, thereby ignoring its intended task. Either this article exists and I can’t find it now, or I’m misremembering. Either way, it’s exactly the sort of research that the AI safety community should be conducting and publishing right now (i.e. propaganda with epistemic benefits). With things like GPT-3 around nowadays, I bet one could even devise experiments where artificial agents learn to actually deceive humans (via Mechanical Turk, perhaps?). Imagine how much attention such an experiment could generate once journalists pick it up!
EDIT: This post is very close to what I have in mind.

Shiroe 24 Sep 2022 7:39 UTC
20 points
3
on: Announcing $5,000 bounty for ending malaria
It may help to link to this for context.
Also, what is your impression of Stop Gene Drives? Do their arguments about risks to humans seem in good faith, or is “humans don’t deserve to play god!” more like their real motive?

Shiroe 22 Sep 2022 11:48 UTC
18 points
12
in reply to: jmh’s comment on: Gene drives: why the wait?
That’s true that it could set a bad precedent. But it also could set a bad precedent to normalize letting millions of people die horribly just to avoid setting a bad precedent. It’s not immediately clear to me which is worse in the very-long-run.

Shiroe 10 Aug 2022 6:47 UTC
14 points
5
on: Proposal: Consider not using distance-direction-dimension words in abstract discussions
Contrasting this post with techniques like Word2vec, which do map concepts into spatial dimensions. Every word is assigned a vector and associations are learned via backprop by predicting nearby text. This allows you to perform conceptual arithmetic like “Brother”—“Man” + “Woman”, giving a result which is a vector very close to (in literal spatial terms) the vector for “Sister”.

Shiroe 9 Sep 2022 15:55 UTC
12 points
11
in reply to: Steven Byrnes’s comment on: Thoughts on AGI consciousness / sentience
The space of possible minds/algorithms is so vast, and that problem is so open-ended, that it would be a remarkable coincidence if such an AGI had a consciousness that was anything like ours. Most details of our experience are just accidents of evolution and history.

Does an airplane have a consciousness like a bird? “Design an airplane” sounds like a more specific goal, but in the space of all possible minds/algorithms that goal’s solutions are quite undetermined, just like flight.

Shiroe 19 Jul 2022 10:25 UTC
10 points
2
on: Help ARC evaluate capabilities of current language models
In case that I get distracted and fail to come back to this, I just want to say that I think this type of project is extremely valuable and should be the main focus of the AI safety/alignment movement (IMHO). From my perspective, most effort seems to be going into writing about arguments for why a future AGI will be a disaster by default, as well as some highly theoretical ideas for preventing this from happening. This type of discourse ignores current ML systems, understandably, because future AGI will be qualitatively different from our current models.

However, the problem with this a priori approach is that it alienates the people working in the industry, who are ultimately the ones we need to be in conversation with if alignment is ever going to become a popular issue. What we really need are experiments like what you’re doing, i.e. actually getting a real AI to do something nasty on camera. This helps us learn to deal with nasty AI, but I think far more importantly it puts the AI safety conversation in the same experimental setting as the rest of the field.

Shiroe 6 Oct 2022 23:59 UTC
9 points
−5
on: So, geez there’s a lot of AI content these days
I’m fine with everything on LW ultimately being tied to alignment. Hardcore materialism being used as a working assumption seems like a good pragmatic measure as well. But ideally there should also be room for foundational discussions like “how do we know our utility function?” and “what does it mean for something to be aligned?” Having trapped priors on foundational issues seems dangerous to me.

Shiroe 26 Sep 2022 20:53 UTC
9 points
14
in reply to: jmh’s comment on: Gene drives: why the wait?

Work to offer the solutions and let them make their own, informed choice.

The problem is that the bureaucrats who make the decision of whether gene drives are allowed aren’t the same people as the ones who are dying from malaria. Every day that you postpone the eradication of malaria by trying to convince bureaucrats, over a thousand people will die from the disease in question. Most of them, many of whom are infants, had no ability to meaningfully affect their political situation.

Shiroe 22 Aug 2022 20:43 UTC
8 points
0
on: Stable Diffusion has been released
Tell me more why you think the impact on society will be positive.

Shiroe 24 Jul 2022 9:34 UTC
8 points
0
in reply to: Thane Ruthenis’s comment on: Which values are stable under ontology shifts?
Okay. Let me back up a bit. You had said:
Either there are some concrete things corresponding to personal identity/preference satisfaction/etc., or we value not these things but some actually-existing correlates of these things, or acting like we value these things is a heuristic that instrumentally helps us arrive at good outcomes. Either way, ontology shifts don’t do anything bad to our values.
Leading to this claim, which is very alluring to me, even if I disagree with it:
Or, the other way around, perhaps “values” are defined by being robust to ontology shifts.
You gave the example that us learning that our world is physical didn’t make us value humans less, even if we no longer believe that they have immortal souls. That’s true, we still value humans. But the problem is that the first group of agents (medieval “we”) are a different group than the second group (post-industrial “we”). Each group has a different concept of “human”. The two concepts roughly map to the same targets in the territory, but the semantic meaning is different in a way that is crucial to the first group of agents. The first group weren’t convinced that they were wrong. They simply got replaced by a new group who inherited their title.
That might sound dramatic, but ask a truly committed religious person if they can imagine themselves not believing in God. They will not be able to imagine it. This is because, in fact, they would no longer be the same agency in the sense that is crucial to them. Society would legally consider them the same person, and their apostate future-self might claim the same title as them and even deny that the change really mattered, but the original entity who was asked, the theistic agent, would no longer recognize any heir as legitimate at that point. From its point of view, it has simply died. That is why the theist cannot even imagine becoming an apostate, even if they grant that there’s a greater than zero chance of becoming an apostate at all times.
It’s true what you say, if you’re talking about people i.e. biological human organisms. Such entities will always be doing something in the world up until the very moment they physically expire, including having their whole world view shattered and living in the aftermath of that. However, pre-world-view-shattering them would not recognize post-world-view-shattering them as a legitimate heir. The person might be the same, but it’s a different agency in control.
Similar things can be said about populations and governments.

Shiroe 6 Jan 2024 15:37 UTC
7 points
1
in reply to: PhilosophicalSoul’s comment on: Deep atheism and AI risk

The suffering and evil present in the world has no bearing on God’s existence. I’ve always failed to buy into that idea. Sure, it sucks. But it has no bearing on the metaphysical reality of a God. If God does not save children—yikes I guess? What difference does it make? A creator as powerful as has been hypothesised can do whatever he wants; any arguments from rationalism be damned.

Of course, the existence of pointless suffering isn’t an argument against the existence of a god. But it is an old argument against the existence of a god who deserves to be worshipped with sincerity. We might even admit that there is a cruel deity, and still say non serviam, which I think is a more definite act of atheism than merely doubting any deity’s existence.

Shiroe 16 Sep 2022 7:44 UTC
6 points
1
on: ACT-1: Transformer for Actions
How should this affect one’s decision to specialize in UI design versus other areas of software engineering? Will there be fewer GUIs in the future, or will the “audience” simply cease to be humans?

Shiroe 9 Sep 2022 4:04 UTC
6 points
5
on: Monitoring for deceptive alignment
Great work! I hope more people take your direction, with concrete experiments and monitoring real systems as they evolve. The concern that doing this will backfire somehow simply must be dismissed as untimely perfectionism. It’s too late at this point to shun iteration. We simply don’t have time left for a Long Reflection about AI alignment, even if we did have the coordination to pull that off.

[Question] How does anthropic reasoning and illusionism/eliminitivism interact?

Shiroe5 Oct 2022 22:31 UTC

5 points

18 comments1 min readLW link

Shiroe 10 Sep 2022 7:32 UTC
5 points
2
on: AI alignment with humans… but with which humans?
While it’s true that AI alignment raises difficult ethical questions, there’s still a lot of low-hanging fruit to keep us busy. Nobody wants an AI that tortures everyone to death.

Shiroe 27 Dec 2023 14:53 UTC
4 points
0
in reply to: RogerDearnaley’s comment on: 5. Moral Value for Sentient Animals? Alas, No
Pearce has the idea of “gradients of bliss”, which he uses to try to address the problem you raised about insensitivity to pain being hazardous. He thinks that even if all of the valences are positive, the animal can still be motivated to avoid danger if doing so yields an even greater positive valence than the alternatives. So the prey animals are happy to be eaten, but much more happy to run away.

To me, this seems possible in principle. When I feel happy, I’m still motivated at some low level to do things that will make me even happier, even though I was already happy to begin with. But actually implementing “gradients of bliss” in biology seems like a post-ASI feat of engineering.

(By the way, your idea of predation-induced unconsciousness isn’t one I had heard before, it’s interesting.)

Shiroe 21 Dec 2023 23:04 UTC
4 points
3
in reply to: jimrandomh’s comment on: Here’s the exit.
Same. I feel somewhat jealous of people who can have a visceral in-body emotional reaction to X-risks. For most of my life I’ve been trying to convince my lizard brain to feel emotions that reflect my beliefs about the future, but it’s never cooperated with me.

Shiroe 2 Oct 2022 12:24 UTC
4 points
0
on: Open & Welcome Thread—Oct 2022
Does anyone know of work dealing with the interaction between anthropic reasoning and illusionism/elimitivism?

Shiroe 10 Sep 2022 18:40 UTC
4 points
1
in reply to: the gears to ascension’s comment on: ethics and anthropics of homomorphically encrypted computations

the information defining a self preserving agent must not be lost into entropy, and any attempt to reduce suffering by ending a life when that life would have continued to try to survive is fundamentally a violation that any safe ai system would try to prevent.

Very strongly disagree. If a future version of myself was convinced that it deserved to be tortured forever, I would infinitely prefer that my future self be terminated than have its (“my”) new values satisfied.

Shiroe 10 Sep 2022 2:21 UTC
4 points
1
on: My emotional reaction to the current funding situation
It will be interesting to see if EA succumbs to rot, or whether its principles are strong enough to scale.

Shiroe

[Question] How does an­thropic rea­son­ing and illu­sion­ism/​elimini­tivism in­ter­act?

[Question] How does anthropic reasoning and illusionism/eliminitivism interact?