Michael Soareverix

Karma: 99

Detecting AI Agent Failure Modes in Simulations

Michael Soareverix11 Feb 2025 11:10 UTC

17 points

0 comments8 min readLW link

Pivotal Acts are easier than Alignment?

Michael Soareverix21 Jul 2024 12:15 UTC

2 points

4 comments1 min readLW link

[Question] Optimizing for Agency?

Michael Soareverix14 Feb 2024 8:31 UTC

10 points

9 comments2 min readLW link

The Virus—Short Story

Michael Soareverix13 Apr 2023 18:18 UTC

4 points

0 comments4 min readLW link

Gold, Silver, Red: A color scheme for understanding people

Michael Soareverix13 Mar 2023 1:06 UTC

17 points

2 comments4 min readLW link

A Good Future (rough draft)

Michael Soareverix24 Oct 2022 20:45 UTC

10 points

5 comments3 min readLW link

A rough idea for solving ELK: An approach for training generalist agents like GATO to make plans and describe them to humans clearly and honestly.

Michael Soareverix8 Sep 2022 15:20 UTC

2 points

2 comments2 min readLW link

Our Existing Solutions to AGI Alignment (semi-safe)

Michael Soareverix21 Jul 2022 19:00 UTC

12 points

1 comment3 min readLW link

Musings on the Human Objective Function

Michael Soareverix15 Jul 2022 7:13 UTC

3 points

0 comments3 min readLW link

Three Minimum Pivotal Acts Possible by Narrow AI

Michael Soareverix12 Jul 2022 9:51 UTC

0 points

4 comments2 min readLW link

Could an AI Alignment Sandbox be useful?

Michael Soareverix2 Jul 2022 5:06 UTC

2 points

1 comment1 min readLW link