RSS

smallsilo

Karma: 150

AI safety communications at FAR.AI

Previously at AISafety.info

Lay­ered AI Defenses Have Holes: Vuln­er­a­bil­ities and Key Recommendations

4 Jul 2025 0:07 UTC
13 points
1 comment4 min readLW link
(far.ai)

Avoid­ing AI De­cep­tion: Lie De­tec­tors can ei­ther In­duce Hon­esty or Evasion

5 Jun 2025 23:07 UTC
22 points
2 comments5 min readLW link
(far.ai)

Illu­sory Safety: Redteam­ing Deep­Seek R1 and the Strongest Fine-Tun­able Models of OpenAI, An­thropic, and Google

7 Feb 2025 3:57 UTC
37 points
0 comments10 min readLW link

AISafety.info Distil­la­tion Hackathon

smallsilo1 Oct 2023 18:54 UTC
2 points
0 comments1 min readLW link

Join AISafety.info’s Distil­la­tion Hackathon (Oct 6-9th)

smallsilo1 Oct 2023 18:43 UTC
21 points
0 comments2 min readLW link
(forum.effectivealtruism.org)

GPT-pow­ered EA/​LW weekly summary

smallsilo25 Aug 2023 18:19 UTC
18 points
1 comment11 min readLW link
(forum.effectivealtruism.org)

AISafety.info’s Writ­ing & Edit­ing Hackathon

smallsilo5 Aug 2023 17:14 UTC
2 points
0 comments1 min readLW link

Join AISafety.info’s Writ­ing & Edit­ing Hackathon (Aug 25-28) (Prizes to be won!)

smallsilo5 Aug 2023 14:08 UTC
19 points
3 comments1 min readLW link
(forum.effectivealtruism.org)

All AGI Safety ques­tions wel­come (es­pe­cially ba­sic ones) [July 2023]

smallsilo20 Jul 2023 20:20 UTC
38 points
41 comments2 min readLW link
(forum.effectivealtruism.org)