Andy Arditi comments on Finding “misaligned persona” features in open-weight models