RSS

VojtaKovarik

Karma: 685

My original background is in mathematics (analysis, topology, Banach spaces) and game theory (imperfect information games). Nowadays, I do AI alignment research (mostly systemic risks, sometimes pondering about “consequentionalist reasoning”).

[Question] What is the pur­pose and ap­pli­ca­tion of AI De­bate?

VojtaKovarik4 Apr 2024 0:38 UTC
13 points
9 comments1 min readLW link

Ex­tinc­tion Risks from AI: In­visi­ble to Science?

21 Feb 2024 18:07 UTC
24 points
7 comments1 min readLW link
(arxiv.org)

Ex­tinc­tion-level Good­hart’s Law as a Prop­erty of the Environment

21 Feb 2024 17:56 UTC
23 points
0 comments10 min readLW link

Dy­nam­ics Cru­cial to AI Risk Seem to Make for Com­pli­cated Models

21 Feb 2024 17:54 UTC
18 points
0 comments9 min readLW link

Which Model Prop­er­ties are Ne­c­es­sary for Eval­u­at­ing an Ar­gu­ment?

21 Feb 2024 17:52 UTC
17 points
2 comments7 min readLW link

Weak vs Quan­ti­ta­tive Ex­tinc­tion-level Good­hart’s Law

21 Feb 2024 17:38 UTC
17 points
1 comment2 min readLW link

Vo­j­taKo­varik’s Shortform

VojtaKovarik4 Feb 2024 20:57 UTC
5 points
5 comments1 min readLW link

My Align­ment “Plan”: Avoid Strong Op­ti­mi­sa­tion and Align Economy

VojtaKovarik31 Jan 2024 17:03 UTC
24 points
9 comments7 min readLW link

Con­trol vs Selec­tion: Civil­i­sa­tion is best at con­trol, but nav­i­gat­ing AGI re­quires selection

VojtaKovarik30 Jan 2024 19:06 UTC
7 points
1 comment1 min readLW link

AI Aware­ness through In­ter­ac­tion with Blatantly Alien Models

VojtaKovarik28 Jul 2023 8:41 UTC
7 points
5 comments3 min readLW link

Fun­da­men­tally Fuzzy Con­cepts Can’t Have Crisp Defi­ni­tions: Co­op­er­a­tion and Align­ment vs Math and Physics

VojtaKovarik21 Jul 2023 21:03 UTC
12 points
18 comments3 min readLW link

Re­cur­sive Mid­dle Man­ager Hell: AI Edition

VojtaKovarik4 May 2023 20:08 UTC
30 points
11 comments2 min readLW link

OpenAI could help X-risk by wa­ger­ing itself

VojtaKovarik20 Apr 2023 14:51 UTC
31 points
16 comments1 min readLW link

Le­gi­t­imis­ing AI Red-Team­ing by Public

VojtaKovarik19 Apr 2023 14:05 UTC
10 points
7 comments3 min readLW link

[Question] How do you al­ign your emo­tions through up­dates and ex­is­ten­tial un­cer­tainty?

VojtaKovarik17 Apr 2023 20:46 UTC
4 points
10 comments1 min readLW link

For­mal­iz­ing Ob­jec­tions against Sur­ro­gate Goals

VojtaKovarik2 Sep 2021 16:24 UTC
13 points
23 comments1 min readLW link

Risk Map of AI Systems

15 Dec 2020 9:16 UTC
28 points
3 comments8 min readLW link

Values Form a Shift­ing Land­scape (and why you might care)

VojtaKovarik5 Dec 2020 23:56 UTC
28 points
6 comments4 min readLW link

AI Prob­lems Shared by Non-AI Systems

VojtaKovarik5 Dec 2020 22:15 UTC
7 points
2 comments4 min readLW link

AI Un­safety via Non-Zero-Sum Debate

VojtaKovarik3 Jul 2020 22:03 UTC
25 points
10 comments5 min readLW link