Still under development to a large extent, but my own research is intended to be alignment/foundations research, and makes some direct predictions about deep-learning systems. Specifically, my formulation of abstraction is intended (among other things) to answer questions like “why does a system with relatively little resemblance to a human brain seem to recognize similar high-level abstractions as humans (e.g. dogs, trees, etc)?”. I also expect that even more abstract notions like “human values” will follow a similar pattern.
Still under development to a large extent, but my own research is intended to be alignment/foundations research, and makes some direct predictions about deep-learning systems. Specifically, my formulation of abstraction is intended (among other things) to answer questions like “why does a system with relatively little resemblance to a human brain seem to recognize similar high-level abstractions as humans (e.g. dogs, trees, etc)?”. I also expect that even more abstract notions like “human values” will follow a similar pattern.