Bogdan Ionut Cirstea comments on Neuroscience and Alignment

Bogdan Ionut Cirstea 19 Mar 2024 9:58 UTC
1 point
0
First is that I don’t really expect us to come up with a fully general answer to this problem in time. I wouldn’t be surprised if we had to trade off some generality for indexing on the system in front of us—this gets us some degree of non-robustness, but hopefully enough to buy us a lot more time before stuff like the problem behind deep deception breaks a lack of True Names. Hopefully then we can get the AI systems to solve the harder problem for us in the time we’ve bought, with systems more powerful than us. The relevance here is that if this is the case, then trying to generalize our findings to an entirely non-ML setting, while definitely something we want, might not be something we get, and maybe it makes sense to index lightly on a particular paradigm if the general problem seems really hard.
yes, e.g. https://www.lesswrong.com/posts/wr2SxQuRvcXeDBbNZ/bogdan-ionut-cirstea-s-shortform?commentId=GRjfMwLDFgw6qLnDv