No77e

Karma: 136

What’s wrong with the paperclips scenario?

No77e7 Jan 2023 17:58 UTC

31 points

11 comments1 min readLW link

No77e 7 Jun 2022 11:50 UTC
27 points
9
on: AGI Safety FAQ / all-dumb-questions-allowed thread
Should a “ask dumb questions about AGI safety” thread be recurring? Surely people will continue to come up with more questions in the years to come, and the same dynamics outlined in the OP will repeat. Perhaps this post could continue to be the go-to page, but it would become enormous (but if there were recurring posts they’d lose the FAQ function somewhat. Perhaps recurring posts and a FAQ post?).

No77e 24 Apr 2024 20:54 UTC
9 points
7
on: Magic by forgetting
Even if you manage to truly forget about the disease, there must exist a mind “somewhere in the universe” that is exactly the same as yours except without knowledge of the disease. This seems quite unlikely to me, because you having the disease has interacted causally with the rest of your mind a lot by when you decide to erase its memory. What you’d really need to do is to undo all the consequences of these interactions, which seems a lot harder to do. You’d really need to transform your mind into another one that you somehow know is present “somewhere in the multiverse” which seems also really hard to know.

No77e 2 Nov 2022 10:47 UTC
9 points
1
on: All AGI Safety questions welcome (especially basic ones) [~monthly thread]
Why is research into decision theories relevant to alignment?
What links here?
- No77e's comment on All AGI Safety questions welcome (especially basic ones) [~monthly thread] by mwatkins (12 Jan 2023 10:10 UTC; 6 points)

No77e 6 Jun 2022 19:49 UTC
9 points
1
on: AGI Ruin: A List of Lethalities
The first thing generally, or CEV specifically, is unworkable because the complexity of what needs to be aligned or meta-aligned for our Real Actual Values is far out of reach for our FIRST TRY at AGI. Yes I mean specifically that the dataset, meta-learning algorithm, and what needs to be learned, is far out of reach for our first try. It’s not just non-hand-codable, it is unteachable on-the-first-try because the thing you are trying to teach is too weird and complicated.

Why is CEV so difficult? And if CEV is impossible to learn first try, why not shoot for something less ambitious? Value is fragile, OK, but aren’t there easier utopias?

Many humans would be able to distinguish utopia from dystopia if they saw them, and humanity’s only advantage over an AI is that the brain has “evolution presets”.

Humans are relatively dumb, so why can’t even a relatively dumb AI learn the same ability to distinguish utopias from dystopias?

To anyone reading: don’t interpret these questions as disagreement. If someone doesn’t, for example, understand a mathematical proof, they might express disagreement with the proof while knowing full well that they haven’t discovered a mistake in it and that they are simply confused.
What links here?
- No77e's comment on All AGI Safety questions welcome (especially basic ones) [~monthly thread] by mwatkins (12 Jan 2023 10:10 UTC; 6 points)

No77e 25 Apr 2024 12:52 UTC
7 points
3
in reply to: EGI’s comment on: The first future and the best future
From a purely utilitarian standpoint, I’m inclined to think that the cost of delaying is dwarfed by the number of future lives saved by getting a better outcome, assuming that delaying does increase the chance of a better future.
That said, after we know there’s “no chance” of extinction risk, I don’t think delaying would likely yield better future outcomes. On the contrary, I suspect getting the coordination necessary to delay means it’s likely that we’re giving up freedoms in a way that may reduce the value of the median future and increase the chance of stuff like totalitarian lock-in, which decreases the value of the average future overall.

I think you’re correct that there’s also to balance the “other existential risks exist” consideration in the calculation, although I don’t expect it to be clear-cut.

[Question] If digital goods in virtual worlds increase GDP, do we actually become richer?

No77e19 Apr 2024 10:06 UTC

6 points

10 comments1 min readLW link

No77e 4 Mar 2023 10:59 UTC
6 points
3
in reply to: Razied’s comment on: Robin Hanson’s latest AI risk position statement
The comment about tool-AI vs agent-AI is just ignorant (or incredibly dismissive) of mesa-optimizers and the fact that being asked to predict what an agent would do immediately instantiates such an agent inside the tool-AI. It’s obvious that a tool-AI is safer than an explicitely agentic one, but not for arbitrary levels of intelligence.
This seems way too confident to me given the level of generality of your statement. And to be clear, my view is that this could easily happen in LLMs based on transformers, but what other architectures? If you just talk about how a generic “tool-AI” would or would not behave, it seems to me that you are operating on a level of abstraction far too high to be able to make such specific statements with confidence.

No77e 12 Jan 2023 10:10 UTC
6 points
0
on: All AGI Safety questions welcome (especially basic ones) [~monthly thread]
I’m going to re-ask all my questions that I don’t think have received a satisfactory answer. Some of them are probably basic, some other maybe less so:

No77e 7 Jan 2023 20:16 UTC
6 points
2
in reply to: DragonGod’s comment on: What’s wrong with the paperclips scenario?
I agree with you here, although something like “predict the next token” seems more and more likely. Although I’m not sure if this is in the same class of goals as paperclip maximizing in this context, and if the kind of failure it could lead to would be similar or not.

No77e 3 Jan 2023 9:29 UTC
6 points
0
on: Is recursive self-alignment possible?
I publish posts like this one to clarify my doubts about alignment. I don’t pay attention to whether I’m beating a dead horse or if there’s previous literature about my questions or ideas. Do you think this is an OK practice? One pro is that people like me learn faster, and one con is that it may pollute the site with lower-quality posts.

Is checking that a state of the world is not dystopian easier than constructing a non-dystopian state?

No77e27 Dec 2022 20:57 UTC

5 points

3 comments1 min readLW link

Is recursive self-alignment possible?

No77e3 Jan 2023 9:15 UTC

5 points

5 comments1 min readLW link

No77e 7 Jan 2023 18:51 UTC
5 points
0
in reply to: Jeremy Gillen’s comment on: What’s wrong with the paperclips scenario?
Yes, this makes a lot of sense, thank you.

No77e 26 Mar 2024 20:14 UTC
4 points
0
in reply to: Steven Byrnes’s comment on: Modern Transformers are AGI, and Human-Level
One way in which “spending a whole lot of time working with a system / idea / domain, and getting to know it and understand it and manipulate it better and better over the course of time” could be solved automatically is just by having a truly huge context window. Example of an experiment: teach a particular branch of math to an LLM that has never seen that branch of math.
Maybe humans have just the equivalent of a sort of huge content window spanning selected stuff from their entire lifetimes, and so this kind of learning is possible for them.

No77e 2 Nov 2022 10:11 UTC
4 points
0
on: AI as a Civilizational Risk Part 4/6: Bioweapons and Philosophy of Modification
Can someone explain to me why Pasha’s posts are downvoted so much? I don’t think they are great, but this level of negative karma seems disproportioned to me.

No77e 31 Oct 2022 19:24 UTC
4 points
0
on: publishing alignment research and infohazards
This looks like something that would be useful also for alignment orgs, if they want to organize their research in siloes, as Yudkowsky often suggests (if they haven’t already implemented systems like this one).

Could evolution produce something truly aligned with its own optimization standards? What would an answer to this mean for AI alignment?

No77e8 Jan 2023 11:04 UTC

3 points

4 comments1 min readLW link

No77e 8 Jan 2023 20:01 UTC
3 points
7
on: Open & Welcome Thread—January 2023
For some reason I don’t get e-mail notifications when someone replies to my posts or comments. My e-mail is verified and I’ve set all notifications to “immediately”. Here’s what my e-mail settings look like:

No77e 3 Jan 2023 9:31 UTC
3 points
0
on: Is recursive self-alignment possible?
I use Eliezer Yudkowsky in my example because it makes the most sense. Don’t read anything else into it, please.

No77e

What’s wrong with the pa­per­clips sce­nario?

[Question] If digi­tal goods in vir­tual wor­lds in­crease GDP, do we ac­tu­ally be­come richer?

Is check­ing that a state of the world is not dystopian eas­ier than con­struct­ing a non-dystopian state?

Is re­cur­sive self-al­ign­ment pos­si­ble?

Could evolu­tion pro­duce some­thing truly al­igned with its own op­ti­miza­tion stan­dards? What would an an­swer to this mean for AI al­ign­ment?

What’s wrong with the paperclips scenario?

[Question] If digital goods in virtual worlds increase GDP, do we actually become richer?

Is checking that a state of the world is not dystopian easier than constructing a non-dystopian state?

Is recursive self-alignment possible?

Could evolution produce something truly aligned with its own optimization standards? What would an answer to this mean for AI alignment?