RSS

Michael Soareverix

Karma: 99

De­tect­ing AI Agent Failure Modes in Simulations

Michael SoareverixFeb 11, 2025, 11:10 AM
17 points
0 comments8 min readLW link

Pivotal Acts are eas­ier than Align­ment?

Michael SoareverixJul 21, 2024, 12:15 PM
2 points
4 comments1 min readLW link

[Question] Op­ti­miz­ing for Agency?

Michael SoareverixFeb 14, 2024, 8:31 AM
10 points
9 comments2 min readLW link

The Virus—Short Story

Michael SoareverixApr 13, 2023, 6:18 PM
4 points
0 comments4 min readLW link

Gold, Silver, Red: A color scheme for un­der­stand­ing people

Michael SoareverixMar 13, 2023, 1:06 AM
17 points
2 comments4 min readLW link

A Good Fu­ture (rough draft)

Michael SoareverixOct 24, 2022, 8:45 PM
10 points
5 comments3 min readLW link

A rough idea for solv­ing ELK: An ap­proach for train­ing gen­er­al­ist agents like GATO to make plans and de­scribe them to hu­mans clearly and hon­estly.

Michael SoareverixSep 8, 2022, 3:20 PM
2 points
2 comments2 min readLW link

Our Ex­ist­ing Solu­tions to AGI Align­ment (semi-safe)

Michael SoareverixJul 21, 2022, 7:00 PM
12 points
1 comment3 min readLW link

Mus­ings on the Hu­man Ob­jec­tive Function

Michael SoareverixJul 15, 2022, 7:13 AM
3 points
0 comments3 min readLW link

Three Min­i­mum Pivotal Acts Pos­si­ble by Nar­row AI

Michael SoareverixJul 12, 2022, 9:51 AM
0 points
4 comments2 min readLW link

Could an AI Align­ment Sand­box be use­ful?

Michael SoareverixJul 2, 2022, 5:06 AM
2 points
1 comment1 min readLW link