Tomáš Gavenčiak

Karma: 401

A researcher in CS theory, AI safety and other stuff.

Shallow review of technical AI safety, 2025

technicalities, Tomáš Gavenčiak, Stephen McAleese, peligrietzer, Stag, jordinne, ozziegooen, Violet Hour and lenz

17 Dec 2025 18:18 UTC

187 points

9 comments47 min readLW link

Sample Interesting First

Tomáš Gavenčiak18 Oct 2025 20:09 UTC

8 points

2 comments3 min readLW link

How LLM Beliefs Change During Chain-of-Thought Reasoning

Filip Sondej, Petr Kašpárek, alex-kazda and Tomáš Gavenčiak

16 Jun 2025 16:18 UTC

32 points

3 comments5 min readLW link

Apply now to Human-Aligned AI Summer School 2025

VojtaKovarik, Tomáš Gavenčiak and Jan_Kulveit

6 Jun 2025 19:31 UTC

28 points

1 comment2 min readLW link

(humanaligned.ai)

Measuring Beliefs of Language Models During Chain-of-Thought Reasoning

Baram Sosis and Tomáš Gavenčiak

18 Apr 2025 22:56 UTC

12 points

0 comments13 min readLW link

Announcing Human-aligned AI Summer School

Jan_Kulveit and Tomáš Gavenčiak

22 May 2024 8:55 UTC

51 points

0 comments1 min readLW link

(humanaligned.ai)

InterLab – a toolkit for experiments with multi-agent interactions

Tomáš Gavenčiak, Ada Böhm and Jan_Kulveit

22 Jan 2024 18:23 UTC

69 points

0 comments8 min readLW link

(acsresearch.org)

Sparsity and interpretability?

Ada Böhm, RobertKirk and Tomáš Gavenčiak

1 Jun 2020 13:25 UTC

41 points

3 comments7 min readLW link

How can Interpretability help Alignment?

RobertKirk and Tomáš Gavenčiak

23 May 2020 16:16 UTC

37 points

3 comments9 min readLW link

What is Interpretability?

RobertKirk, Tomáš Gavenčiak and Ada Böhm

17 Mar 2020 20:23 UTC

39 points

1 comment11 min readLW link