Steven Byrnes comments on [Intro to brain-like-AGI safety] 13. Symbol grounding & human social instincts

Steven Byrnes 28 Apr 2022 20:55 UTC
3 points
0
you don’t believe there need to be independent, modular value systems that give their own reward signals for different things (your steering subsystem and thought generator and assessor subsystem are working in tandem to produce a singular reward signal)
If I’m deciding between sitting on the couch vs going to the gym, at the end of the day, my brain needs to do one thing versus another. The different considerations need to be weighed against each other to produce a final answer somehow, right? A “singular reward signal” is one solution to that problem. I haven’t heard any other solution that makes sense to me.
That said, we could view a “will lead to food?” Thought Assessor as a “independent, modular value system” of sorts, and likewise with the other Thought Assessors. (I’m not sure that’s a helpful view, it’s also misleading in some ways, I think.)
(I would call a Thought Assessor a kind of “value function”, in the RL sense. You also talk about “value systems” and “value generators”, and I’m not sure what those mean.)
What things would you critique about this view
Similar to above: if we’re building a behavior controller, we need to decide whether or not to switch behaviors at any given time, and that requires holistic consideration of the behavior’s impact on every aspect of the organism’s well-being. See § 6.5.3 where I suggest that even the run-and-tumble algorithm of a bacterium might plausibly combine food, toxins, temperature, etc. into a single metric of how-am-I-doing-right-now, whose time-derivative in turn determines the probability of tumbling. (To be clear, I don’t know much about bacteria, this is theoretical speculation.) Can you think of a way for a mobile bacteria to simultaneously avoid toxins and seek out food, that doesn’t involve combining toxin-measurement and food-measurement into a single overall environmental-quality metric? I can’t.
If you want your AGI to split its time among several drives, I don’t think that’s incompatible with “singular reward signal”. You could set up the reward function to have diminishing returns to satisfying each drive, for example. Like, if my reward is log(eating) + log(social status), I’ll almost definitely wind up spending time on each, I think.