RSS

viemccoy

Karma: 64

The Weighted Per­plex­ity Bench­mark: To­k­enizer-Nor­mal­ized Eval­u­a­tion for Lan­guage Model Comparison

7 Jul 2025 21:43 UTC
21 points
0 comments7 min readLW link
(www.morpheus.systems)

Schizobench: Doc­u­ment­ing Mag­i­cal-Think­ing Be­hav­ior in Claude 4 Opus

viemccoy23 May 2025 1:31 UTC
23 points
0 comments1 min readLW link
(metanomicon.ink)

Defense Against The Su­per-Worms

viemccoy20 Mar 2025 7:24 UTC
24 points
1 comment2 min readLW link