Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Tomáš Gavenčiak
Karma:
401
A researcher in CS theory, AI safety and other stuff.
All
Posts
Comments
New
Top
Old
Shallow review of technical AI safety, 2025
technicalities
,
Tomáš Gavenčiak
,
Stephen McAleese
,
peligrietzer
,
Stag
,
jordinne
,
ozziegooen
,
Violet Hour
and
lenz
17 Dec 2025 18:18 UTC
187
points
9
comments
47
min read
LW
link
Sample Interesting First
Tomáš Gavenčiak
18 Oct 2025 20:09 UTC
8
points
2
comments
3
min read
LW
link
How LLM Beliefs Change During Chain-of-Thought Reasoning
Filip Sondej
,
Petr Kašpárek
,
alex-kazda
and
Tomáš Gavenčiak
16 Jun 2025 16:18 UTC
32
points
3
comments
5
min read
LW
link
Apply now to Human-Aligned AI Summer School 2025
VojtaKovarik
,
Tomáš Gavenčiak
and
Jan_Kulveit
6 Jun 2025 19:31 UTC
28
points
1
comment
2
min read
LW
link
(humanaligned.ai)
Measuring Beliefs of Language Models During Chain-of-Thought Reasoning
Baram Sosis
and
Tomáš Gavenčiak
18 Apr 2025 22:56 UTC
12
points
0
comments
13
min read
LW
link
Announcing Human-aligned AI Summer School
Jan_Kulveit
and
Tomáš Gavenčiak
22 May 2024 8:55 UTC
51
points
0
comments
1
min read
LW
link
(humanaligned.ai)
InterLab – a toolkit for experiments with multi-agent interactions
Tomáš Gavenčiak
,
Ada Böhm
and
Jan_Kulveit
22 Jan 2024 18:23 UTC
69
points
0
comments
8
min read
LW
link
(acsresearch.org)
Sparsity and interpretability?
Ada Böhm
,
RobertKirk
and
Tomáš Gavenčiak
1 Jun 2020 13:25 UTC
41
points
3
comments
7
min read
LW
link
How can Interpretability help Alignment?
RobertKirk
and
Tomáš Gavenčiak
23 May 2020 16:16 UTC
37
points
3
comments
9
min read
LW
link
What is Interpretability?
RobertKirk
,
Tomáš Gavenčiak
and
Ada Böhm
17 Mar 2020 20:23 UTC
39
points
1
comment
11
min read
LW
link
Back to top