I think you need to build in agency in order to get a good world-model (or at least, a better-than-LLM world model).
There are effectively infinitely many things about the world that one could figure out. If I cared about wrinkly shirts, I could figure out vastly more than any human has ever known about wrinkly shirts. I could find mathematical theorems in the patterns of wrinkles. I could theorize and/or run experiments on whether the wrinkliness of a wool shirt relates to the sheep’s diet. Etc.
…Or if we’re talking about e.g. possible inventions that don’t exist yet, then the combinatorial explosion of possibilities gets even worse.
I think the only solution is: an agent that cares about something or wants something, and then that wanting / caring creates value-of-information which in turn guides what to think about / pay attention to / study.
What’s the pivotal act?
Depending on what you have in mind here, the previous bullet point might be inapplicable or different, and I might or might not have other complaints too.
You can DM or email me if you want to discuss but not publicly :)
It’s funny that I’m always begging people to stop trying to reverse-engineer the neocortex, and you’re working on something that (if successful) would end up somewhere pretty similar to that, IIUC. (But hmm, I guess if a paranoid doom-pilled person was trying to reverse-engineer the neocortex, and keep the results super-secret unless they had a great theory for how sharing them would help with safe & beneficial AGI, and if they in fact had good judgment on that topic, then I guess I’d be grudgingly OK with that.)
There are effectively infinitely many things about the world that one could figure out
One way to control that is to control the training data. We don’t necessarily have to point the wm-synthesizer at the Pile indiscriminately,[1] we could assemble a dataset about a specific phenomenon we want to comprehend.
if we’re talking about e.g. possible inventions that don’t exist yet, then the combinatorial explosion of possibilities gets even worse
Human world-models are lazy: they store knowledge in the maximally “decomposed” form[2], and only synthesize specific concrete concepts when they’re needed. (E. g., “a triangular lightbulb”, which we could easily generate – which our world-models effectively “contain” – but which isn’t generated until needed.)
I expect inventions are the same thing. Given a powerful-enough world-model, we should be able to produce what we want just by using the world-model’s native functions for that. Pick the needed concepts, plug them into each other in the right way, hit “run”.
If constructing the concepts we want requires agency, the one contributing it could be the human operator, if they understand how the world-model works well enough.
Will e-mail regarding the rest.
It’s funny that I’m always begging people to stop trying to reverse-engineer the neocortex, and you’re working on something that (if successful) would end up somewhere pretty similar to that, IIUC
The irony is not lost on me. When I was reading your Foom & Doom posts, and got to this section, I did have a reaction roughly along those lines.
(But hmm, I guess if a paranoid doom-pilled person was trying to reverse-engineer the neocortex and only publish the results if they thought it would help with safe & beneficial AGI, and if they in fact had good judgment on that question, then I guess I’d be grudgingly OK with that.)
I genuinely appreciate the sanity-check and the vote of confidence here!
Uhh, well, technically I wrote that sentence as a conditional, and technically I didn’t say whether or not the condition applied to you-in-particular.
I’ll take “Steven Byrnes doesn’t consider it necessary to immediately write a top-level post titled ’Synthesizing Standalone World-Models has an unsolved technical alignment problem”.
I guess the main blockers I see are:
I think you need to build in agency in order to get a good world-model (or at least, a better-than-LLM world model).
There are effectively infinitely many things about the world that one could figure out. If I cared about wrinkly shirts, I could figure out vastly more than any human has ever known about wrinkly shirts. I could find mathematical theorems in the patterns of wrinkles. I could theorize and/or run experiments on whether the wrinkliness of a wool shirt relates to the sheep’s diet. Etc.
…Or if we’re talking about e.g. possible inventions that don’t exist yet, then the combinatorial explosion of possibilities gets even worse.
I think the only solution is: an agent that cares about something or wants something, and then that wanting / caring creates value-of-information which in turn guides what to think about / pay attention to / study.
What’s the pivotal act?
Depending on what you have in mind here, the previous bullet point might be inapplicable or different, and I might or might not have other complaints too.
You can DM or email me if you want to discuss but not publicly :)
It’s funny that I’m always begging people to stop trying to reverse-engineer the neocortex, and you’re working on something that (if successful) would end up somewhere pretty similar to that, IIUC. (But hmm, I guess if a paranoid doom-pilled person was trying to reverse-engineer the neocortex, and keep the results super-secret unless they had a great theory for how sharing them would help with safe & beneficial AGI, and if they in fact had good judgment on that topic, then I guess I’d be grudgingly OK with that.)
One way to control that is to control the training data. We don’t necessarily have to point the wm-synthesizer at the Pile indiscriminately,[1] we could assemble a dataset about a specific phenomenon we want to comprehend.
Human world-models are lazy: they store knowledge in the maximally “decomposed” form[2], and only synthesize specific concrete concepts when they’re needed. (E. g., “a triangular lightbulb”, which we could easily generate – which our world-models effectively “contain” – but which isn’t generated until needed.)
I expect inventions are the same thing. Given a powerful-enough world-model, we should be able to produce what we want just by using the world-model’s native functions for that. Pick the needed concepts, plug them into each other in the right way, hit “run”.
If constructing the concepts we want requires agency, the one contributing it could be the human operator, if they understand how the world-model works well enough.
Will e-mail regarding the rest.
The irony is not lost on me. When I was reading your Foom & Doom posts, and got to this section, I did have a reaction roughly along those lines.
I genuinely appreciate the sanity-check and the vote of confidence here!
Indeed, we might want to actively avoid that.
Perhaps something along the lines of the constructive-PID thing I sketched out.
Uhh, well, technically I wrote that sentence as a conditional, and technically I didn’t say whether or not the condition applied to you-in-particular.
…I hope you have good judgment! For that matter, I hope I myself have good judgment!! Hard to know though. ¯\_(ツ)_/¯
I’ll take “Steven Byrnes doesn’t consider it necessary to immediately write a top-level post titled ’Synthesizing Standalone World-Models has an unsolved technical alignment problem”.