Thanks for reading. You’re right, I’ll actually delete it until I can generate slightly better graphs. And until I’m more sure of what it’s showing
FWIW, the green means it’s steered towards being honest, red dishonest and grey ha no steering. The Triangle is when thinking stops. But yeah I needed clearer graphs and to try it on another model.
Thanks for reading. You’re right, I’ll actually delete it until I can generate slightly better graphs. And until I’m more sure of what it’s showing
FWIW, the green means it’s steered towards being honest, red dishonest and grey ha no steering. The Triangle is when thinking stops. But yeah I needed clearer graphs and to try it on another model.