RogerDearnaley comments on Limits to Control Workshop

RogerDearnaley 18 May 2025 18:51 UTC
2 points
0
I think a key element here is (the distribution of) whether the AI is motivated (i.e. acts as if motivated) to want to be controlled by us, or not — i.e. something like corrigibility or value learning, and whether we can arrange that. Is the workshop expected to cover that question?
- T-bo🔸 18 May 2025 20:31 UTC
  2 points
  0
  Parent
  Thank you for your feedback. The workshop arranges a lot of time for free discussions, so the motivation of the AI to be controlled might pop up there. Under the current proposals for talks, the focus is more on the environment in which the agent evolves than the agent itself. However, discussions about the “level of trust” or “level of cooperation” needed to actually keep control is absolutely in the theme.
  On a more personal level, unless I have very strong reasons to believe in an agent’s honesty, I would not feel safe in a situation where my control depends on the agent’s cooperation. As such, I would like to understand what control look like in different situations before surrendering any control capabilities to the agent.
  Whether or not we decide to put an agent in a situation that we can’t keep under control if the agent doesn’t wish to be controlled is an interesting topic—but not on the agenda for now. If you’d like to participate and run a session on that topic, you’re more than welcome to apply!