RSS

vedant-badoni

Karma: 33

Ex­plor­ing Re­in­force­ment Learn­ing Effects on Chain-of-Thought Legibility

6 Jan 2026 3:04 UTC
41 points
3 comments21 min readLW link