Mitchell_Porter comments on Jim Babcock’s Mainline Doom Scenario: Human-Level AI Can’t Control Its Successor

Mitchell_Porter 9 May 2025 9:17 UTC
4 points
0
Liron: … Turns out the answer to the symbol grounding problem is like you have a couple high dimensional vectors and their cosine similarity or whatever is the nature of meaning.
Could someone state this more clearly?
Jim: … a paper that looked at the values in one of the LLMs as inferred from prompts setting up things like trolley problems, and found first of all, that they did look like a utility function, second of all, that they got closer to following the VNM axioms as the network got bigger. And third of all, that the utility function that they seemed to represent was absolutely bonkers
What paper was this?
- Liron 9 May 2025 14:15 UTC
  6 points
  0
  Parent
  “Utility Engineering: Analyzing and Controlling Emergent Value Systems in AI”
  https://www.emergent-values.ai/
  I walked through this paper’s finding in detail in a previous episode of Doom Debates which IMO is one of my best episodes. Just skip straight to the chapters in the second half, timestamp 49:13: