VojtaKovarik

Karma: 685

My original background is in mathematics (analysis, topology, Banach spaces) and game theory (imperfect information games). Nowadays, I do AI alignment research (mostly systemic risks, sometimes pondering about “consequentionalist reasoning”).

[Question] What is the purpose and application of AI Debate?

VojtaKovarik4 Apr 2024 0:38 UTC

13 points

9 comments1 min readLW link

Extinction Risks from AI: Invisible to Science?

VojtaKovarik, Chris van Merwijk and Ida Mattsson

21 Feb 2024 18:07 UTC

24 points

7 comments1 min readLW link

(arxiv.org)

Extinction-level Goodhart’s Law as a Property of the Environment

VojtaKovarik and Ida Mattsson

21 Feb 2024 17:56 UTC

23 points

0 comments10 min readLW link

Dynamics Crucial to AI Risk Seem to Make for Complicated Models

VojtaKovarik and Ida Mattsson

21 Feb 2024 17:54 UTC

18 points

0 comments9 min readLW link

Which Model Properties are Necessary for Evaluating an Argument?

VojtaKovarik and Ida Mattsson

21 Feb 2024 17:52 UTC

17 points

2 comments7 min readLW link

Weak vs Quantitative Extinction-level Goodhart’s Law

VojtaKovarik and Ida Mattsson

21 Feb 2024 17:38 UTC

17 points

1 comment2 min readLW link

VojtaKovarik’s Shortform

VojtaKovarik4 Feb 2024 20:57 UTC

5 points

5 comments1 min readLW link

My Alignment “Plan”: Avoid Strong Optimisation and Align Economy

VojtaKovarik31 Jan 2024 17:03 UTC

24 points

9 comments7 min readLW link

Control vs Selection: Civilisation is best at control, but navigating AGI requires selection

VojtaKovarik30 Jan 2024 19:06 UTC

7 points

1 comment1 min readLW link

AI Awareness through Interaction with Blatantly Alien Models

VojtaKovarik28 Jul 2023 8:41 UTC

7 points

5 comments3 min readLW link

Fundamentally Fuzzy Concepts Can’t Have Crisp Definitions: Cooperation and Alignment vs Math and Physics

VojtaKovarik21 Jul 2023 21:03 UTC

12 points

18 comments3 min readLW link

Recursive Middle Manager Hell: AI Edition

VojtaKovarik4 May 2023 20:08 UTC

30 points

11 comments2 min readLW link

OpenAI could help X-risk by wagering itself

VojtaKovarik20 Apr 2023 14:51 UTC

31 points

16 comments1 min readLW link

Legitimising AI Red-Teaming by Public

VojtaKovarik19 Apr 2023 14:05 UTC

10 points

7 comments3 min readLW link

[Question] How do you align your emotions through updates and existential uncertainty?

VojtaKovarik17 Apr 2023 20:46 UTC

4 points

10 comments1 min readLW link

Formalizing Objections against Surrogate Goals

VojtaKovarik2 Sep 2021 16:24 UTC

13 points

23 comments1 min readLW link

Risk Map of AI Systems

VojtaKovarik and Jan_Kulveit

15 Dec 2020 9:16 UTC

28 points

3 comments8 min readLW link

Values Form a Shifting Landscape (and why you might care)

VojtaKovarik5 Dec 2020 23:56 UTC

28 points

6 comments4 min readLW link

AI Problems Shared by Non-AI Systems

VojtaKovarik5 Dec 2020 22:15 UTC

7 points

2 comments4 min readLW link

AI Unsafety via Non-Zero-Sum Debate

VojtaKovarik3 Jul 2020 22:03 UTC

25 points

10 comments5 min readLW link