Previously worked as a startup founding engineer (does that mean anything?). Now, doing my best to think rigorously and contribute to technical safety.
My personal website.
Previously worked as a startup founding engineer (does that mean anything?). Now, doing my best to think rigorously and contribute to technical safety.
My personal website.
I am constantly in analysis paralysis. I personally waste way too much time trying to prematurely optimize how I am going to learn something before actually learning any piece of the issue. To remedy this, I try to tell myself that I will naturally optimize the learning process as I progress. I think this was nicely explained by Andrej Karpathy in a podcast.
Recently, I have been thinking about this a lot as I am on full-time sabbatical, learning technical alignment research. There is so much to do and a general feeling of ignorance in any direction, making it ripe for constant optimization. I would love to have an answer, but at this point I try to operate mostly on the faith that I will optimize and the more I force premature optimization, the more time I waste. At some point, I have to just Do The Work.
Tensor networks might actually be a viable alternative to typical NNs! However, if scaling is way worse (say 50%), then I highly doubt they’ll be deployed as a frontier model.
But suppose we can solve ambitious mech interp w/ tensor networks (debatable but I lean yes), then there are two regimes:
1. Low Reliability
2. High Reliability
Excited by the ambitious effort. Re: the impact argument above, I’m understanding your logic to be:
To completely replace NNs in frontier applications, scaling of TNs needs to be on par. Therefor, if we needed to replace them for TNs to be useful, we should test the scaling laws first. However, in a world where scaling is worse, TNs can still be useful by allowing for “ambitious mech interp” which would result in a High Reliability model. These two regimes aren’t mutually exclusive.
Am I following the argument correctly?
Thank you for the nudge. I hadn’t realized that I have been solely thinking about value to the collective rather than to an individual. This is most likely because I have heard this topic come up in collective benefit discourse, e.g., “These are the benefits of AI to humanity:”.
I don’t think I have done the requisite work to understand the arguments, but I, by default, do not understand the appeal of uploaded minds. Are there utility arguments around being able to access a highly productive individuals mind forever? Or, is it just a more emotional appeal to living forever?
As an exercise for myself, my simplified understanding of your argument is as follows:
A model that cannot distinguish between evaluation and deployment (i.e., not evaluation aware) does not preclude scheming abilities. Therefore, as model capabilities improve we should be skeptical of evaluations as a green light for safety cases.
After writing this, it’s hard for me to imagine a world in which it’s possible to build a complete safety case for a model before deployment. I think I am reiterating simple conclusions here, but as I’m learning, I want to double check. Am I missing anything?