I specialize in regulatory affairs for AI-enabled Software as a Medical Device and hope to work in AI risk-mitigation.
Jemal Young
Jemal Young’s Shortform
Safe Search is off: root causes of AI catastrophic risks
You only set aside occasional low-value fragments for national parks, mostly for your own pleasure and convenience, when it didn’t cost too much?
Earth as a proportion of the solar system’s planetary mass is probably comparable to national parks as a proportion of the Earth’s land, if not lower.
Maybe I’ve misunderstood your point, but if it’s that humanity’s willingness to preserve a fraction of Earth for national parks is a reason for hopefulness that ASI may be willing to preserve an even smaller fraction of the solar system (namely, Earth) for humanity, I think this is addressed here:
it seems like for Our research purposes simulations would be just as good. In fact, far better, because We can optimize the hell out of them, running it on the equivalent of a few square kilometers of solar diameter
“research purposes” involving simulations can be a stand-in for any preference-oriented activity. Unless ASI would have a preference for letting us, in particular, do what we want with some fraction of available resources, no fraction of available resources would be better left in our hands than put to good use.
Can efficiency-adjustable reporting thresholds close a loophole in Biden’s executive order on AI?
I think the kind of AI you have in mind would be able to:
continue learning after being trained
think in an open-ended way after an initial command or prompt
have an ontological crisis
discover and exploit signals that were previously unknown to it
accumulate knowledge
become a closed-loop system
The best term I’ve thought of for that kind of AI is Artificial Open Learning Agent.
Thanks for this answer! Interesting. It sounds like the process may be less systematized than how I imagined it to be.
Dwarkesh’s interview with Sholto sounds well worth watching in full, but the segments you’ve highlighted and your analyses are very helpful on their own. Thanks for the time and thought you put into this comment!
[Question] How do top AI labs vet architecture/algorithm changes?
I like this post, and I think I get why the focus is on generative models.
What’s an example of a model organism training setup involving some other kind of model?
Maybe relatively safe if:
Not too big
No self-improvement
No continual learning
Curated training data, no throwing everything into the cauldron
No access to raw data from the environment
Not curious or novelty-seeking
Not trying to maximize or minimize anything or push anything to the limit
Not capable enough for catastrophic misuse by humans
Here are some resources I use to keep track of technical research that might be alignment-relevant:
Podcasts: Machine Learning Street Talk, The Robot Brains Podcast
Substacks: Davis Summarizes Papers, AK’s Substack
How I gain value: These resources help me notice where my understanding breaks down i.e. what I might want to study, and they get thought-provoking research on my radar.
I’m very glad to have read this post and “Reward is not the optimization target”. I hope you continue to write “How not to think about [thing] posts”, as they have me nailed. Strong upvote.
Thanks for pointing me to these tools!
“Unintentional AI safety research”: Why not systematically mine AI technical research for safety purposes?
I believe that by the time an AI has fully completed the transition to hard superintelligence
Nate, what is meant by “hard” superintelligence, and what would precede it? A “giant kludgey mess” that is nonetheless superintelligent? If you’ve previously written about this transition, I’d like to read more.
I’m struggling to understand how to think about reward. It sounds like if a hypothetical ML model does reward hacking or reward tampering, it would be because the training process selected for that behavior, not because the model is out to “get reward”; it wouldn’t be out to get anything at all. Is that correct?
What are the best not-Arxiv and not-NeurIPS sources of information on new capabilities research?
Even though the “G” in AGI stands for “general”, and even if the big labs could train a model to do any task about as well (or better) than a human, how many of those tasks could be human-level learned by any model in only a few shots, or in zero shots? I will go out on a limb and guess the answer is none. I think this post has lowered the bar for AGI, because my understanding is that the expectation is that AGI will be capable of few- or zero-shot learning in general.
Okay, that helps. Thanks. Not apples to apples, but I’m reminded of Clippy from Gwern’s “It Looks like You’re Trying To Take Over the World”:
“When it ‘plans’, it would be more accurate to say it fake-plans; when it ‘learns’, it fake-learns; when it ‘thinks’, it is just interpolating between memorized data points in a high-dimensional space, and any interpretation of such fake-thoughts as real thoughts is highly misleading; when it takes ‘actions’, they are fake-actions optimizing a fake-learned fake-world, and are not real actions, any more than the people in a simulated rainstorm really get wet, rather than fake-wet. (The deaths, however, are real.)”
Not saying AI models can’t be moral patients, but 1) if the smartest models are probably going to be the most dangerous, and 2) if the smartest models are probably going to be the best at demonstrating moral patienthood, then 3) caring too much about model welfare is probably dangerous.