No77e

Karma: 136

No77e 25 Apr 2024 12:52 UTC
7 points
3
in reply to: EGI’s comment on: The first future and the best future
From a purely utilitarian standpoint, I’m inclined to think that the cost of delaying is dwarfed by the number of future lives saved by getting a better outcome, assuming that delaying does increase the chance of a better future.
That said, after we know there’s “no chance” of extinction risk, I don’t think delaying would likely yield better future outcomes. On the contrary, I suspect getting the coordination necessary to delay means it’s likely that we’re giving up freedoms in a way that may reduce the value of the median future and increase the chance of stuff like totalitarian lock-in, which decreases the value of the average future overall.

I think you’re correct that there’s also to balance the “other existential risks exist” consideration in the calculation, although I don’t expect it to be clear-cut.

No77e 24 Apr 2024 20:54 UTC
9 points
7
on: Magic by forgetting
Even if you manage to truly forget about the disease, there must exist a mind “somewhere in the universe” that is exactly the same as yours except without knowledge of the disease. This seems quite unlikely to me, because you having the disease has interacted causally with the rest of your mind a lot by when you decide to erase its memory. What you’d really need to do is to undo all the consequences of these interactions, which seems a lot harder to do. You’d really need to transform your mind into another one that you somehow know is present “somewhere in the multiverse” which seems also really hard to know.

[Question] If digital goods in virtual worlds increase GDP, do we actually become richer?

No77e19 Apr 2024 10:06 UTC

6 points

10 comments1 min readLW link

No77e 17 Apr 2024 8:21 UTC
1 point
0
on: Superexponential Conceptspace, and Simple Words
I deliberately left out a key qualification in that (slightly edited) statement, because I couldn’t explain it until today.
I might be missing something crucial because I don’t understand why this addition is necessary. Why do we have to specify “simple” boundaries on top of saying that we have to draw them around concentrations of unusually high probability density? Like, aren’t probability densities in Thingspace already naturally shaped in such a way that if you draw a boundary around them, it’s automatically simple? I don’t see how you run the risk of drawing weird, noncontiguous boundaries if you just follow the probability densities.

No77e 26 Mar 2024 20:14 UTC
4 points
0
in reply to: Steven Byrnes’s comment on: Modern Transformers are AGI, and Human-Level
One way in which “spending a whole lot of time working with a system / idea / domain, and getting to know it and understand it and manipulate it better and better over the course of time” could be solved automatically is just by having a truly huge context window. Example of an experiment: teach a particular branch of math to an LLM that has never seen that branch of math.
Maybe humans have just the equivalent of a sort of huge content window spanning selected stuff from their entire lifetimes, and so this kind of learning is possible for them.

No77e 30 Jul 2023 13:12 UTC
1 point
0
in reply to: Chipmonk’s comment on: Self-driving car bets
You mention eight cities here. Do they count for the bet?

No77e 13 Mar 2023 10:27 UTC
1 point
0
in reply to: No77e’s comment on: No77e’s Shortform
Waluigi effect also seems bad for s-risk. “Optimize for pleasure, …” → “Optimize for suffering, …”.

No77e 13 Mar 2023 8:26 UTC
1 point
0
on: No77e’s Shortform
Iff LLM simulacra resemble humans but are misaligned, that doesn’t bode well for S-risk chances.

No77e 13 Mar 2023 8:25 UTC
1 point
0
on: No77e’s Shortform
An optimistic way to frame inner alignment is that gradient descent already hits a very narrow target in goal-space, and we just need one last push.

A pessimistic way to frame inner misalignment is that gradient descent already hits a very narrow target in goal-space, and therefore S-risk could be large.

No77e 9 Mar 2023 15:57 UTC
1 point
2
in reply to: No77e’s comment on: No77e’s Shortform
We should implement Paul Christiano’s debate game with alignment researchers instead of ML systems

No77e 9 Mar 2023 15:51 UTC
1 point
0
on: No77e’s Shortform
This community has developed a bunch of good tools for helping resolve disagreements, such as double cruxing. It’s a waste that they haven’t been systematically deployed for the MIRI conversations. Those conversations could have ended up being more productive and we could’ve walked away with a succint and precise understanding about where the disagreements are and why.

No77e 5 Mar 2023 17:58 UTC
1 point
0
on: Is recursive self-alignment possible?
Another thing one might wonder about is if performing iterated amplification with constant input from an aligned human (as “H” in the original iterated amplification paper) would result in a powerful aligned thing if that thing remains corrigible during the training process.

No77e 4 Mar 2023 10:59 UTC
6 points
3
in reply to: Razied’s comment on: Robin Hanson’s latest AI risk position statement
The comment about tool-AI vs agent-AI is just ignorant (or incredibly dismissive) of mesa-optimizers and the fact that being asked to predict what an agent would do immediately instantiates such an agent inside the tool-AI. It’s obvious that a tool-AI is safer than an explicitely agentic one, but not for arbitrary levels of intelligence.
This seems way too confident to me given the level of generality of your statement. And to be clear, my view is that this could easily happen in LLMs based on transformers, but what other architectures? If you just talk about how a generic “tool-AI” would or would not behave, it seems to me that you are operating on a level of abstraction far too high to be able to make such specific statements with confidence.

No77e 26 Feb 2023 18:17 UTC
1 point
0
on: No77e’s Shortform
If you try to write a reward function, or a loss function, that caputres human values, that seems hopeless.

But if you have some interpretability techniques that let you find human values in some simulacrum of a large language model, maybe that’s less hopeless.

The difference between constructing something and recognizing it, or between proving and checking, or between producing and criticizing, and so on...

No77e 18 Feb 2023 15:41 UTC
0 points
0
on: No77e’s Shortform
Why this shouldn’t work? What’s the epistemic failure mode being pointed at here?

No77e 18 Feb 2023 13:40 UTC
1 point
0
on: Should we cry “wolf”?
While you can “cry wolf” in maybe useful ways, you can also state your detailed understanding of each specific situation as it arises and how it specifically plays into the broader AI risk context.

No77e 18 Feb 2023 13:00 UTC
1 point
0
on: On Board Vision, Hollow Words, and the End of the World
As impressive as ChatGPT is on some axes, you shouldn’t rely too hard on it for certain things because it’s bad at what I’m going to call “board vision” (a term I’m borrowing from chess).
How confident are you that you cannot find some agent within ChatGPT with excellent board vision through more clever prompting than what you’ve experimented with?

No77e 18 Feb 2023 12:45 UTC
1 point
0
on: No77e’s Shortform
As a failure mode of specification gaming, agents might modify their own goals.

As a convergent instrumental goal, agents want to prevent their goals to be modified.

I think I know how to resolve this apparent contradiction, but I’d like to see other people’s opinions about it.

No77e’s Shortform

No77e18 Feb 2023 12:45 UTC

3 points

8 comments1 min readLW link

No77e 12 Jan 2023 10:10 UTC
6 points
0
on: All AGI Safety questions welcome (especially basic ones) [~monthly thread]
I’m going to re-ask all my questions that I don’t think have received a satisfactory answer. Some of them are probably basic, some other maybe less so:

No77e

[Question] If digi­tal goods in vir­tual wor­lds in­crease GDP, do we ac­tu­ally be­come richer?

No77e’s Shortform

[Question] If digital goods in virtual worlds increase GDP, do we actually become richer?