My name is pronounced “YOO-ar SKULL-se” (the “e” is not silent). I’m a PhD student at Oxford University, and I was a member of the Future of Humanity Institute before it shut down. I have worked in several different areas of AI safety research. For a few highlights, see:
Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems
STARC: A General Framework For Quantifying Differences Between Reward Functions
Risks from Learned Optimization in Advanced Machine Learning Systems
Some of my recent research on the theoretical foundations of reward learning is also described in this sequence.
For a full list of all my research, see my Google Scholar.
I suppose this depends on what you mean by “most”. DNNs and CNNs have noticeable and meaningful differences in their (macroscopic) generalisation behaviour, and these differences are due to their parameter-function map. This is also true of LSTMs vs transformers, and so on. I think it’s fairly likely that these kinds of differences could have a large impact on the probability that a given type of model will learn to exhibit goal-directed behaviour in a given training setup, for example.
Do you mean the loss landscape in the limit of infinite data, or the loss landscape for a “small” amount of data? In the former case, the loss landscape determines the parameter-function map over the data distribution. In the latter case, my guess would be that the statement probably is false (though I’m not sure).
EDIT: What I wrote here is wrong; the loss landscape does not determine the parameter-function map even in the limit of infinite data (except if we consider a binary classification problem without noise, and consider the loss for each parameter assignment and input with support under the data distribution).