What are high-level ways to formalize the dataset-assembly subproblem?
What are some heuristics for solving this subproblem?
How should we think about/model the problem of solving all three subproblems jointly?
I have read the first summary post and this one. I have only skimmed your abstraction posts etc., so maybe I am missing something here. But if you are ultimately aiming for some particular pivotal act routed through human understanding, I think you should spend at least 1-6 months trying to just solve that pivotal act (or trying multiple if it turns out that particular one seems less tractable). Look at what type of knowledge and training data are useful for your brain and go for understanding directly rather than trying to route it indirectly through an autoencoder that you don’t know how to train yet. I am reasonably confident your plan is harder than just trying to go for say adult human intelligence enhancement directly. But even if that is false, I am very confident you will ultimately save a bunch of time by investing enough time in this step. You will save time when training that autoencoder, when debugging and validating any algorithms you run on top of your autoencoder and when trying to learn concepts from your autoencoder. For example, if you go for adult genetic intelligence enhancement, you are going to run into peculiarities of genetics and I think it’s just easier to learn about them directly from a textbook optimized for pedagogy and not just short description length. This should be your first step, not your last step. Listen to Andrej Karpathy and become one with the data!
I have read the first summary post and this one. I have only skimmed your abstraction posts etc., so maybe I am missing something here. But if you are ultimately aiming for some particular pivotal act routed through human understanding, I think you should spend at least 1-6 months trying to just solve that pivotal act (or trying multiple if it turns out that particular one seems less tractable). Look at what type of knowledge and training data are useful for your brain and go for understanding directly rather than trying to route it indirectly through an autoencoder that you don’t know how to train yet. I am reasonably confident your plan is harder than just trying to go for say adult human intelligence enhancement directly. But even if that is false, I am very confident you will ultimately save a bunch of time by investing enough time in this step. You will save time when training that autoencoder, when debugging and validating any algorithms you run on top of your autoencoder and when trying to learn concepts from your autoencoder. For example, if you go for adult genetic intelligence enhancement, you are going to run into peculiarities of genetics and I think it’s just easier to learn about them directly from a textbook optimized for pedagogy and not just short description length. This should be your first step, not your last step. Listen to Andrej Karpathy and become one with the data!