RSS

Daniel Wu

Karma: 9

Black­BoxQuery [BBQ]-Bench: Mea­sur­ing Hy­poth­e­sis For­ma­tion and Ex­per­i­men­ta­tion Ca­pa­bil­ities in LLMs

Daniel Wu12 Jan 2026 19:36 UTC
10 points
0 comments12 min readLW link