Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Ana Kapros
Karma:
13
All
Posts
Comments
New
Top
Old
Feature-Based Analysis of Safety-Relevant Multi-Agent Behavior
Maria Kapros
,
Ana Kapros
and
Perusha Moodley
21 Apr 2025 18:12 UTC
9
points
0
comments
5
min read
LW
link
Comparing the effectiveness of top-down and bottom-up activation steering for bypassing refusal on harmful prompts
Ana Kapros
12 Feb 2025 19:12 UTC
7
points
0
comments
5
min read
LW
link
Back to top