mikes

Karma: 224

Breaking Circuit Breakers

mikes and tbenthompson

14 Jul 2024 18:57 UTC

53 points

13 comments1 min readLW link

(confirmlabs.org)

Fluent dreaming for language models (AI interpretability method)

tbenthompson, mikes and Zygi Straznickas

6 Feb 2024 6:02 UTC

46 points

5 comments1 min readLW link

(arxiv.org)

Takeaways from the NeurIPS 2023 Trojan Detection Competition

mikes13 Jan 2024 12:35 UTC

20 points

2 comments1 min readLW link

(confirmlabs.org)

[Question] The literature on aluminum adjuvants is very suspicious. Small IQ tax is plausible—can any experts help me estimate it?

mikes4 Jul 2023 9:33 UTC

61 points

39 comments3 min readLW link