VojtaKovarik

Karma: 684

My original background is in mathematics (analysis, topology, Banach spaces) and game theory (imperfect information games). Nowadays, I do AI alignment research (mostly systemic risks, sometimes pondering about “consequentionalist reasoning”).

AI Safety Debate and Its Applications

VojtaKovarik23 Jul 2019 22:31 UTC

38 points

5 comments12 min readLW link

New paper: (When) is Truth-telling Favored in AI debate?

VojtaKovarik26 Dec 2019 19:59 UTC

32 points

7 comments5 min readLW link

(medium.com)

OpenAI could help X-risk by wagering itself

VojtaKovarik20 Apr 2023 14:51 UTC

31 points

16 comments1 min readLW link

AI Services as a Research Paradigm

VojtaKovarik20 Apr 2020 13:00 UTC

30 points

12 comments4 min readLW link

(docs.google.com)

Recursive Middle Manager Hell: AI Edition

VojtaKovarik4 May 2023 20:08 UTC

30 points

11 comments2 min readLW link

Values Form a Shifting Landscape (and why you might care)

VojtaKovarik5 Dec 2020 23:56 UTC

28 points

6 comments4 min readLW link

Risk Map of AI Systems

VojtaKovarik and Jan_Kulveit

15 Dec 2020 9:16 UTC

28 points

3 comments8 min readLW link

AI Unsafety via Non-Zero-Sum Debate

VojtaKovarik3 Jul 2020 22:03 UTC

25 points

10 comments5 min readLW link

Extinction Risks from AI: Invisible to Science?

VojtaKovarik, Chris van Merwijk and Ida Mattsson

21 Feb 2024 18:07 UTC

24 points

7 comments1 min readLW link

(arxiv.org)

My Alignment “Plan”: Avoid Strong Optimisation and Align Economy

VojtaKovarik31 Jan 2024 17:03 UTC

24 points

9 comments7 min readLW link

Extinction-level Goodhart’s Law as a Property of the Environment

VojtaKovarik and Ida Mattsson

21 Feb 2024 17:56 UTC

23 points

0 comments10 min readLW link

Dynamics Crucial to AI Risk Seem to Make for Complicated Models

VojtaKovarik and Ida Mattsson

21 Feb 2024 17:54 UTC

18 points

0 comments9 min readLW link

Which Model Properties are Necessary for Evaluating an Argument?

VojtaKovarik and Ida Mattsson

21 Feb 2024 17:52 UTC

17 points

2 comments7 min readLW link

Weak vs Quantitative Extinction-level Goodhart’s Law

VojtaKovarik and Ida Mattsson

21 Feb 2024 17:38 UTC

17 points

1 comment2 min readLW link

Deconfuse Yourself about Agency

VojtaKovarik23 Aug 2019 0:21 UTC

15 points

9 comments5 min readLW link

Formalizing Objections against Surrogate Goals

VojtaKovarik2 Sep 2021 16:24 UTC

13 points

23 comments1 min readLW link

[Question] What is the purpose and application of AI Debate?

VojtaKovarik4 Apr 2024 0:38 UTC

13 points

9 comments1 min readLW link

Fundamentally Fuzzy Concepts Can’t Have Crisp Definitions: Cooperation and Alignment vs Math and Physics

VojtaKovarik21 Jul 2023 21:03 UTC

12 points

18 comments3 min readLW link

Legitimising AI Red-Teaming by Public

VojtaKovarik19 Apr 2023 14:05 UTC

10 points

7 comments3 min readLW link

Redefining Fast Takeoff

VojtaKovarik23 Aug 2019 2:15 UTC

10 points

1 comment1 min readLW link