RSS

Govind Pimpale

Karma: 309

Fore­cast­ing Fron­tier Lan­guage Model Agent Capabilities

24 Feb 2025 16:51 UTC
35 points
0 comments5 min readLW link
(www.apolloresearch.ai)

Do mod­els know when they are be­ing eval­u­ated?

17 Feb 2025 23:13 UTC
59 points
8 comments12 min readLW link

Cur­rent safety train­ing tech­niques do not fully trans­fer to the agent setting

3 Nov 2024 19:24 UTC
158 points
9 comments5 min readLW link

~80 In­ter­est­ing Ques­tions about Foun­da­tion Model Agent Safety

28 Oct 2024 16:37 UTC
48 points
4 comments15 min readLW link

An­a­lyz­ing Deep­Mind’s Prob­a­bil­is­tic Meth­ods for Eval­u­at­ing Agent Capabilities

22 Jul 2024 16:17 UTC
69 points
0 comments16 min readLW link