Charlie Steiner comments on Environments for Measuring Deception, Resource Acquisition, and Ethical Violations

Charlie Steiner 8 Apr 2023 2:58 UTC
LW: -1 AF: -2
2
AF
Nice AI ethics dataset. It would be a shame if someone were to… fine-tune a LLM to perform well at it after some scratchpad reasoning, thus making an interesting advance in natural language ethical reasoning that might be useful for more general AI alignment if we expect transformative AI to look like 80% LLM and 20% other stuff bolted on, but might fail to generalize to alternative or successor systems that do more direct reasoning about the world.

Sorry, that sentence really got away from me.