RSS

mikes

Karma: 136

Fluent dream­ing for lan­guage mod­els (AI in­ter­pretabil­ity method)

6 Feb 2024 6:02 UTC
39 points
4 comments1 min readLW link
(arxiv.org)

Take­aways from the NeurIPS 2023 Tro­jan De­tec­tion Competition

mikes13 Jan 2024 12:35 UTC
20 points
2 comments1 min readLW link
(confirmlabs.org)