RSS

scasper(Stephen Casper)

Karma: 1,304

https://​​stephencasper.com/​​

Scal­able And Trans­fer­able Black-Box Jailbreaks For Lan­guage Models Via Per­sona Modulation

7 Nov 2023 17:59 UTC
35 points
2 comments2 min readLW link
(arxiv.org)

The 6D effect: When com­pa­nies take risks, one email can be very pow­er­ful.

scasper4 Nov 2023 20:08 UTC
235 points
40 comments3 min readLW link

An­nounc­ing the CNN In­ter­pretabil­ity Competition

scasper26 Sep 2023 16:21 UTC
15 points
0 comments4 min readLW link

Open Prob­lems and Fun­da­men­tal Limi­ta­tions of RLHF

scasper31 Jul 2023 15:31 UTC
62 points
6 comments2 min readLW link
(arxiv.org)

A Short Memo on AI In­ter­pretabil­ity Rain­bows

scasper27 Jul 2023 23:05 UTC
18 points
0 comments2 min readLW link

Ex­am­ples of Prompts that Make GPT-4 Out­put Falsehoods

22 Jul 2023 20:21 UTC
21 points
5 comments6 min readLW link