No77e

Karma: 136

No77e 6 Jun 2022 19:49 UTC
9 points
1
on: AGI Ruin: A List of Lethalities
The first thing generally, or CEV specifically, is unworkable because the complexity of what needs to be aligned or meta-aligned for our Real Actual Values is far out of reach for our FIRST TRY at AGI. Yes I mean specifically that the dataset, meta-learning algorithm, and what needs to be learned, is far out of reach for our first try. It’s not just non-hand-codable, it is unteachable on-the-first-try because the thing you are trying to teach is too weird and complicated.

Why is CEV so difficult? And if CEV is impossible to learn first try, why not shoot for something less ambitious? Value is fragile, OK, but aren’t there easier utopias?

Many humans would be able to distinguish utopia from dystopia if they saw them, and humanity’s only advantage over an AI is that the brain has “evolution presets”.

Humans are relatively dumb, so why can’t even a relatively dumb AI learn the same ability to distinguish utopias from dystopias?

To anyone reading: don’t interpret these questions as disagreement. If someone doesn’t, for example, understand a mathematical proof, they might express disagreement with the proof while knowing full well that they haven’t discovered a mistake in it and that they are simply confused.
What links here?
- No77e's comment on All AGI Safety questions welcome (especially basic ones) [~monthly thread] by mwatkins (12 Jan 2023 10:10 UTC; 6 points)

No77e 7 Jun 2022 11:50 UTC
27 points
9
on: AGI Safety FAQ / all-dumb-questions-allowed thread
Should a “ask dumb questions about AGI safety” thread be recurring? Surely people will continue to come up with more questions in the years to come, and the same dynamics outlined in the OP will repeat. Perhaps this post could continue to be the go-to page, but it would become enormous (but if there were recurring posts they’d lose the FAQ function somewhat. Perhaps recurring posts and a FAQ post?).

No77e 7 Jun 2022 15:21 UTC
2 points
0
in reply to: No77e’s comment on: AGI Ruin: A List of Lethalities
Why not shoot for something less ambitious?
I’ll give myself a provisional answer. I’m not sure if it satisfies me, but it’s enough to make me pause: Anything short of CEV might leave open an unacceptably high chance of fates worse than death.

No77e 9 Jun 2022 11:09 UTC
3 points
in reply to: AnnaSalamon’s comment on: Comment reply: my low-quality thoughts on why CFAR didn’t get farther with a “real/efficacious art of rationality”
One is thinking about how to build aligned intelligence in a machine, the other is thinking about how to build aligned intelligence in humans and groups of humans.
Is this true though? Teaching rationality improves capability in people but shouldn’t necessarily align them. People are not AIs, but their morality doesn’t need to converge under reflection.

And even if the argument is “people are already aligned with people”, you still are working on capabilities when dealing with people and on alignment when dealing with AIs.

Teaching rationality looks more similar to AI capabilities research than AI alignment research to me.

No77e 11 Jun 2022 7:39 UTC
1 point
in reply to: AnnaSalamon’s comment on: Comment reply: my low-quality thoughts on why CFAR didn’t get farther with a “real/efficacious art of rationality”
Ah, I see your point now, and it makes sense. If I had to summarize it (and reword it in a way that appeals to my intuition), I’d say that the choice of seeking the truth is not just about “this helps me,” but about “this is what I want/ought to do/choose”. Not just about capabilities. I don’t think I disagree at this point, although perhaps I should think about it more.
I had the suspicion that my question would be met with something at least a bit removed inference-wise from where I was starting, since my model seemed like the most natural one, and so I expected someone who routinely thinks about this topic to have updated away from it rather than not having thought about it.
Regarding the last paragraph: I already believed your line “increasing a person’s ability to see and reason and care (vs rationalizing and blaming-to-distract-themselves and so on) probably helps with ethical conduct.” It didn’t seem to bear on the argument in this case because it looks like you are getting alignment for free by improving capabilities (if you reason with my previous model, otherwise it looks like your truth-alignment efforts somehow spill over to other values, which is still getting something for free due to how humans are built I’d guess).

Also… now that I think about it, what Harry was doing with Draco in HPMOR looks a lot like aligning rather than improving capabilities, and there were good spill-over effects (which were almost the whole point in that case perhaps).

No77e 31 Oct 2022 19:24 UTC
4 points
0
on: publishing alignment research and infohazards
This looks like something that would be useful also for alignment orgs, if they want to organize their research in siloes, as Yudkowsky often suggests (if they haven’t already implemented systems like this one).

No77e 2 Nov 2022 10:11 UTC
4 points
0
on: AI as a Civilizational Risk Part 4/6: Bioweapons and Philosophy of Modification
Can someone explain to me why Pasha’s posts are downvoted so much? I don’t think they are great, but this level of negative karma seems disproportioned to me.

No77e 2 Nov 2022 10:47 UTC
9 points
1
on: All AGI Safety questions welcome (especially basic ones) [~monthly thread]
Why is research into decision theories relevant to alignment?
What links here?
- No77e's comment on All AGI Safety questions welcome (especially basic ones) [~monthly thread] by mwatkins (12 Jan 2023 10:10 UTC; 6 points)

No77e 2 Nov 2022 14:02 UTC
3 points
1
in reply to: plex’s comment on: All AGI Safety questions welcome (especially basic ones) [~monthly thread]
Thanks for the answer. It clarifies a little bit, but I still feel like I don’t fully grasp its relevance to alignment. I have the impression that there’s more to the story than just that?

Is checking that a state of the world is not dystopian easier than constructing a non-dystopian state?

No77e27 Dec 2022 20:57 UTC

5 points

3 comments1 min readLW link

Is recursive self-alignment possible?

No77e3 Jan 2023 9:15 UTC

5 points

5 comments1 min readLW link

No77e 3 Jan 2023 9:29 UTC
6 points
0
on: Is recursive self-alignment possible?
I publish posts like this one to clarify my doubts about alignment. I don’t pay attention to whether I’m beating a dead horse or if there’s previous literature about my questions or ideas. Do you think this is an OK practice? One pro is that people like me learn faster, and one con is that it may pollute the site with lower-quality posts.

No77e 3 Jan 2023 9:31 UTC
3 points
0
on: Is recursive self-alignment possible?
I use Eliezer Yudkowsky in my example because it makes the most sense. Don’t read anything else into it, please.

What’s wrong with the paperclips scenario?

No77e7 Jan 2023 17:58 UTC

31 points

11 comments1 min readLW link

No77e 7 Jan 2023 18:01 UTC
1 point
0
on: What’s wrong with the paperclips scenario?
The last Twitter reply links to a talk from MIRI which I haven’t watched. I wouldn’t be surprised if MIRI also used this metaphor in the past, but I can’t recall examples off the top of my head right now.

No77e 7 Jan 2023 18:45 UTC
2 points
0
in reply to: DragonGod’s comment on: What’s wrong with the paperclips scenario?
Do you mean that no one will actually create exactly a paperclips maximizer or no agent of that kind? I.e. with goals such as “collect stamps”, or “generate images”? Because I think Eliezer meant to object to that class of examples, rather than only that specific one, but I’m not sure.

No77e 7 Jan 2023 18:51 UTC
5 points
0
in reply to: Jeremy Gillen’s comment on: What’s wrong with the paperclips scenario?
Yes, this makes a lot of sense, thank you.

No77e 7 Jan 2023 20:16 UTC
6 points
2
in reply to: DragonGod’s comment on: What’s wrong with the paperclips scenario?
I agree with you here, although something like “predict the next token” seems more and more likely. Although I’m not sure if this is in the same class of goals as paperclip maximizing in this context, and if the kind of failure it could lead to would be similar or not.

Could evolution produce something truly aligned with its own optimization standards? What would an answer to this mean for AI alignment?

No77e8 Jan 2023 11:04 UTC

3 points

4 comments1 min readLW link

No77e 8 Jan 2023 20:01 UTC
3 points
7
on: Open & Welcome Thread—January 2023
For some reason I don’t get e-mail notifications when someone replies to my posts or comments. My e-mail is verified and I’ve set all notifications to “immediately”. Here’s what my e-mail settings look like:

No77e

Is check­ing that a state of the world is not dystopian eas­ier than con­struct­ing a non-dystopian state?

Is re­cur­sive self-al­ign­ment pos­si­ble?

What’s wrong with the pa­per­clips sce­nario?

Could evolu­tion pro­duce some­thing truly al­igned with its own op­ti­miza­tion stan­dards? What would an an­swer to this mean for AI al­ign­ment?

Is checking that a state of the world is not dystopian easier than constructing a non-dystopian state?

Is recursive self-alignment possible?

What’s wrong with the paperclips scenario?

Could evolution produce something truly aligned with its own optimization standards? What would an answer to this mean for AI alignment?