RSS

Ana Kapros

Karma: 13

Fea­ture-Based Anal­y­sis of Safety-Rele­vant Multi-Agent Behavior

21 Apr 2025 18:12 UTC
9 points
0 comments5 min readLW link

Com­par­ing the effec­tive­ness of top-down and bot­tom-up ac­ti­va­tion steer­ing for by­pass­ing re­fusal on harm­ful prompts

Ana Kapros12 Feb 2025 19:12 UTC
7 points
0 comments5 min readLW link