Liron: … Turns out the answer to the symbol grounding problem is like you have a couple high dimensional vectors and their cosine similarity or whatever is the nature of meaning.
Could someone state this more clearly?
Jim: … a paper that looked at the values in one of the LLMs as inferred from prompts setting up things like trolley problems, and found first of all, that they did look like a utility function, second of all, that they got closer to following the VNM axioms as the network got bigger. And third of all, that the utility function that they seemed to represent was absolutely bonkers
I walked through this paper’s finding in detail in a previous episode of Doom Debates which IMO is one of my best episodes. Just skip straight to the chapters in the second half, timestamp 49:13:
Could someone state this more clearly?
What paper was this?
“Utility Engineering: Analyzing and Controlling Emergent Value Systems in AI”
https://www.emergent-values.ai/
I walked through this paper’s finding in detail in a previous episode of Doom Debates which IMO is one of my best episodes. Just skip straight to the chapters in the second half, timestamp 49:13: