I’m finishing up my PhD on tensor network algorithms at the University of Queensland, Australia, under Ian McCulloch. I’ve also proposed a new definition of wavefunction branches using quantum circuit complexity.
Predictably, I’m moving into AI safety work. See my post on graphical tensor notation for interpretability. I also attended the Machine Learning for Alignment Bootcamp in Berkeley in 2022, did a machine learning/ neuroscience internship in 2020/2021, and also wrote a post exploring the potential counterfactual impact of AI safety work.
My website: https://sites.google.com/view/jordantensor/
Contact me: jordantensor [at] gmail [dot] com Also see my CV, LinkedIn, or Twitter.
Potential dangers of future evaluations / gain-of-function research, which I’m sure you and Beth are already extremely well aware of:
Falsely evaluating a model as safe (obviously)
Choosing evaluation metrics which don’t give us enough time to react (After evaluation metrics switch would from “safe” to “not safe”, we should like to have enough time to recognize this and do something about it before we’re all dead)
Crying wolf too many times, making it more likely that no one will believe you when a danger threshold has really been crossed
Letting your methods for making future AIs scarier be too strong given the probability they will be leaked or otherwise made widely accessible. (If the methods / tools are difficult to replicate without resources)
Letting your methods for making AIs scarier be too weak, lest it’s too easy for some bad actors to go much further than you did
Failing to have a precommitment to stop this research when models are getting scary enough that it’s on balance best to stop making them scarier, even if no-one else believes you yet