The Observer gets invested in the macro-stories of the evolution/civilization it observes and would consider the end of any story a loss. Just like you would get annoyed if a show you’re watching on Netflix gets cancelled after one season and it’s not consolation that there are a bunch of other shows on Netflix that also got cancelled after one season. The Observer wants to see all stories unfold fully, it’s not going to let squiggle maximizers cancel them.
And regarding the naming, yeah I just couldn’t come up with anything better. Watcher? I’m open to suggestions lol.
Unfortunately, now you have to solve the fractal-story problem. Is the universe one story, or does each galaxy have it’s own? Each planet? Continent? Human? Subpersonal individual goals/plotlines? Each cell?
I see where you’re coming from, but I think any term in anything anyone writes about alignment can be picked apart ad infinitum. This can be useful to an extent, but beyond a certain point talking about meanings and definitions becomes implementation-specific. Alignment is an engineering problem first and a philosophical problem second.
For example, if RLHF is used to achieve alignment, the meaning of “story” will get solidified through thousands of examples and interactions. The AI will get reinforced to not care about cells or individuals, care about ecosystems and civilizations, and not care as much about the story-of-the-universe-as-a-whole.
If a different alignment method is used, the meaning of “story” will be conveyed differently. If the overall idea is good and doesn’t have any obvious failure modes other than simple definitions (e.g. “story” seems to be orders of magnitude simpler to define than “human happiness” or “free will”), I’d consider that a huge success and a candidate for the community to focus real alignment implementation efforts on.
The Observer gets invested in the macro-stories of the evolution/civilization it observes and would consider the end of any story a loss. Just like you would get annoyed if a show you’re watching on Netflix gets cancelled after one season and it’s not consolation that there are a bunch of other shows on Netflix that also got cancelled after one season. The Observer wants to see all stories unfold fully, it’s not going to let squiggle maximizers cancel them.
And regarding the naming, yeah I just couldn’t come up with anything better. Watcher? I’m open to suggestions lol.
Unfortunately, now you have to solve the fractal-story problem. Is the universe one story, or does each galaxy have it’s own? Each planet? Continent? Human? Subpersonal individual goals/plotlines? Each cell?
I see where you’re coming from, but I think any term in anything anyone writes about alignment can be picked apart ad infinitum. This can be useful to an extent, but beyond a certain point talking about meanings and definitions becomes implementation-specific. Alignment is an engineering problem first and a philosophical problem second.
For example, if RLHF is used to achieve alignment, the meaning of “story” will get solidified through thousands of examples and interactions. The AI will get reinforced to not care about cells or individuals, care about ecosystems and civilizations, and not care as much about the story-of-the-universe-as-a-whole.
If a different alignment method is used, the meaning of “story” will be conveyed differently. If the overall idea is good and doesn’t have any obvious failure modes other than simple definitions (e.g. “story” seems to be orders of magnitude simpler to define than “human happiness” or “free will”), I’d consider that a huge success and a candidate for the community to focus real alignment implementation efforts on.