Michael Soareverix

Karma: 78

Gold, Silver, Red: A color scheme for understanding people

Michael Soareverix13 Mar 2023 1:06 UTC

17 points

2 comments4 min readLW link

Michael Soareverix 17 Jul 2022 8:46 UTC
14 points
2
on: All AGI safety questions welcome (especially basic ones) [monthly thread]
What stops a superintelligence from instantly wireheading itself?
A paperclip maximizer, for instance, might not need to turn the universe into paperclips if it can simply access its reward float and set it to the maximum. This is assuming that it has the intelligence and means to modify itself, and it probably still poses an existential risk because it would eliminate all humans to avoid being turned off.
The terrifying thing I imagine about this possibility is that it also answers the Fermi Paradox. A paperclip maximizer seems like it would be obvious in the universe, but an AI sitting quietly on a dead planet with its reward integer set to the max is far more quiet and terrifying.

Our Existing Solutions to AGI Alignment (semi-safe)

Michael Soareverix21 Jul 2022 19:00 UTC

12 points

1 comment3 min readLW link

A Good Future (rough draft)

Michael Soareverix24 Oct 2022 20:45 UTC

10 points

5 comments3 min readLW link

Michael Soareverix 13 Mar 2023 1:11 UTC
10 points
2
on: An AI risk argument that resonates with NYTimes readers
Great post. This type of genuine comment (human-centered rather than logically abstract) seems like the best way to communicate the threat to non-technical people. I’ve tried talking about the problem to friends in social sciences and haven’t found a good way to convey how serious I feel about it and how there is no current logical prevention of this problem.

[Question] Optimizing for Agency?

Michael Soareverix14 Feb 2024 8:31 UTC

8 points

4 comments2 min readLW link

Michael Soareverix 7 Jun 2022 16:23 UTC
7 points
3
in reply to: Vaniver’s comment on: AGI Ruin: A List of Lethalities
Appreciate it! Checking this out now

The Virus—Short Story

Michael Soareverix13 Apr 2023 18:18 UTC

4 points

0 comments4 min readLW link

Michael Soareverix 18 Oct 2022 17:28 UTC
4 points
0
on: Losing the root for the tree
This post is identical to how I started thinking about life a few years ago. Every goal can be broken into subgoals.
I actually made a very simple web app a few years ago to do this: https://dynamic-goal-tree-soareverix—soareverix.repl.co/
It’s not super aesthetic, but it has the same concept of infinitely expanding goals.
Amazing post, by the way. The end gave me chills and really puts it all into perspective.

Michael Soareverix 22 Jul 2022 23:25 UTC
4 points
0
on: How to Diversify Conceptual Alignment: the Model Behind Refine
I’m someone new to the field, and I have a few ideas on it, namely penalizing a model for accessing more compute than it starts with (every scary AI story seems to start with the AI escaping containment and adding more compute to itself, causing an uncontrolled intelligence explosion). I’d like feedback on the ideas, but I have no idea where to post them or how to meaningfully contribute.
I live in America, so I don’t think I’ll be able to join the company you have in France, but I’d really like to hear where there are more opportunities to learn, discuss, formalize, and test out alignment ideas. As a company focused on this subject, is there a good place for beginners?

Musings on the Human Objective Function

Michael Soareverix15 Jul 2022 7:13 UTC

3 points

0 comments3 min readLW link

Michael Soareverix 9 Sep 2022 19:12 UTC
3 points
2
in reply to: FeepingCreature’s comment on: A rough idea for solving ELK: An approach for training generalist agents like GATO to make plans and describe them to humans clearly and honestly.
I’m not sure exactly what you mean. If we get an output that says “I am going to tell you that I am going to pick up the green crystals, but I’m really going to pick up the yellow crystals”, then that’s a pretty good scenario, since we still know its end behavior.
I think what you mean is the scenario where the agent tells us the truth the entire time it is in simulation but then lies in the real world. That is definitely a bad scenario. And this model doesn’t prevent that from happening.
There are ideas that do (deception takes additional compute vs honesty, so you can refine the agent to be as efficient as possible with its compute). However, I think the biggest space of catastrophe is basic interpretability.
We have no idea what the agent is thinking because it can’t talk with us. By allowing it to communicate and training it to communicate honestly, we seem to have a much greater chance of getting benevolent AI.
Given the timelines, we need to improve our odds as much as possible. This isn’t a perfect solution, but it does seem like it is on the path to it.

Michael Soareverix 17 Jun 2022 7:33 UTC
3 points
0
in reply to: Rob Bensinger’s comment on: A central AI alignment problem: capabilities generalization, and the sharp left turn
Very cool! So this idea has been thought of, and it doesn’t seem totally unreasonable, though it definitely isn’t a perfect solution. A neat idea is a sort of ‘laziness’ score so that it doesn’t take too many high-impact options.
It would be interesting to try to build an AI alignment testing ground, where you have a little simulated civilization and try to use AI to align properly with it, given certain commands. I might try to create it in Unity to test some of these ideas out in the (less abstract than text and slightly more real) world.

A rough idea for solving ELK: An approach for training generalist agents like GATO to make plans and describe them to humans clearly and honestly.

Michael Soareverix8 Sep 2022 15:20 UTC

2 points

2 comments2 min readLW link

Could an AI Alignment Sandbox be useful?

Michael Soareverix2 Jul 2022 5:06 UTC

2 points

1 comment1 min readLW link

Michael Soareverix 26 Feb 2023 0:48 UTC
2 points
0
in reply to: Vakus Drake’s comment on: A Good Future (rough draft)
Yeah, this makes sense. However, I can honestly see myself reverting my intelligence a bit at different junctures, the same way I like to replay video games at greater difficulty. The main reason I am scared of reverting my intelligence now is that I have no guarantee of security that something awful won’t happen to me. With my current ability, I can be pretty confident that no one is going to really take advantage of me. If I were a child again, with no protection or less intelligence, I can easily imagine coming to harm because of my naivete.
I also think singleton AI is inevitable (and desirable). This is simply because it is stable. There’s no conflict between superintelligences. I do agree with the idea of a Guardian Angel type AI, but I think it would still be an offshoot of that greater singleton entity. For the most part, I think most people would forget about the singleton AI and just perceive it as part of the universe the same way gravity is part of the universe. Guardian Angels could be a useful construct, but I don’t see why they wouldn’t be part of the central system.
Finally, I do think you’re right about not wanting to erase memories for entering a simulation. I think there would be levels, and most people would want to stay at a pretty normal level and would move to more extreme levels slowly before deciding on some place to stay.
I appreciate the comment. You’ve made me think a lot. The key idea behind this utopia is the idea of choice. You can basically go anywhere, do anything. Everyone will have different levels of comfort with the idea of altering their identity, experience, or impact. If you’d want to live exactly in the year 2023 again, there would be a physical, earth-like planet where you could do that! I think this sets a good baseline so that no one is unhappy.

Michael Soareverix 13 Feb 2023 19:07 UTC
2 points
0
on: How it feels to have your mind hacked by an AI
I’ve combined it with image generation to bring someone back from the dead and it just leaves me shaken how realistic it is. I can be surprised. It genuinely feels like a version of them

Michael Soareverix 15 Jun 2022 20:07 UTC
2 points
0
on: A central AI alignment problem: capabilities generalization, and the sharp left turn
One solution I can see for AGI is to build in some low-level discriminator that prevents the agent from collecting massive reward. If the agent is expecting to get near-infinite reward in the near future by wiping out humanity using nanotech, then we can set a solution so it decides to do something that will earn it a more finite amount of reward (like obeying our commands).
This has a parallel with drugs here on Earth. Most people are a little afraid of that type of high.
This probably isn’t an effective solution, but I’d love to hear why so I can keep refining my ideas.

Michael Soareverix 7 Jun 2022 6:49 UTC
2 points
−6
on: AGI Ruin: A List of Lethalities
I view AGI in an unusual way. I really don’t think it will be conscious or think in very unusual ways outside of its parameters. I think it will be much more of a tool, a problem-solving machine that can spit out a solution to any problem. To be honest, I imagine that one person or small organization will develop AGI and almost instantly ascend into (relative) godhood. They will develop an AI that can take over the internet, do so, and then calmly organize things as they see fit.
GPT-3, DALLE-E 2, Google Translate… these are all very much human-operated tools rather than self-aware agents. Honestly, I don’t see a particular advantage to building a self-aware agent. To me, AGI is just a generalizable system that can solve any problem you present it with. The wielder of the system is in charge of alignment. It’s like if you had DALL-E 2 20 years ago… what do you ask it to draw? It doesn’t have any reason to expand itself outside of its computer (maybe for more processing power? that seems like an unusual leap). You could probably draw some great deepfakes of world leaders and that wouldn’t be aligned with humanity, but the human is still in charge. The only problem would be asking it something like “an image designed to crash the human visual system” and getting an output that doesn’t align with what you actually wanted, because you are now in a coma.
So, I see AGI as more of a tool than a self-aware agent. A tool that can do anything, but not one that acts on its own.
I’m new to this site, but I’d love some feedback (especially if I’m totally wrong).
-Soareverix

Michael Soareverix 3 Mar 2023 21:06 UTC
1 point
0
on: AI Governance & Strategy: Priorities, talent gaps, & opportunities
Hey Akash, I sent you a message about my summer career plans and how I can bring AI Alignment into that. I’m a senior in college who has a few relevant skills and I’d really like to connect with some professionals in the field. I’d love to connect or learn from you!