Christopher King
Inference from a Mathematical Description of an Existing Alignment Research: a proposal for an outer alignment research program
The unspoken but ridiculous assumption of AI doom: the hidden doom assumption
I think this might lead to the tails coming apart.
As our world exists, sentience and being a moral patient is strongly correlated. But I expect that since AI comes from an optimization process, it will hit points where this stops being the case. In particular, I think there are edge cases where perfect models of moral patients are not themselves moral patients.
From Some background for reasoning about dual-use alignment research:
Doing research but not publishing it has niche uses. If research would be bad for other people to know about, you should mainly just not do it.
Because it’s anti-social (in most cases; things like law enforcement are usually fine), and the only good timelines (by any metric) are pro-social.
Consider if it became like the Irish troubles. Do you think alignment gets solved in this environment? No. What you get is people creating AI war machines. And they don’t bother with alignment because they are trying to get an advantage over the enemy, not benefit everyone. Everyone is incentivised to push capabilities as far as they can, except past the singularity threshold. And there’s not even a disincentive for going past it, you’re just neutral on it. So the dangerous bit isn’t even that the AI are war machines, it’s that they are unaligned.
It’s a general principle that anti-social acts tend to harm utility overall due to second-order effects that wash out the short-sighted first-order effects. Alignment is an explicitly pro-social endeavor!
Yeah exactly! Not telling anyone until the end just means you missed the chance to push society towards alignment and build on your work. Don’t wait!
Correct, but the arguments given in this post for the Kelly bet are really about utility, not money. So if you believe that you should Kelly bet utility, that does not mean maximizing E(log(money)), it means maximizing E(log(log(money)). The arguments would need to focus on money specifically if they want to argue guess maximizing E(log(money)).
I think one thing a lot of this arguments for Kelly Betting are missing: we already know that utility is approximately logarithmic with respect to money.
So if Kelly is maximizing the expected value of log(utility), doesn’t that mean it should be maximizing the expected value of log(log(money)) instead of log(money)? 🤔
Right, and that article makes the case that in those cases you should publish. The reasoning is that the value of unpublished research decays rapidly, so if it could help alignment, publish before it loses its value.
For reasons I may/not write about in the near future, many ideas about alignment (especially anything that could be done with today’s systems) could very well accelerate capabilities work.
If it’s too dangerous to publish, it’s not effective to research. From Some background for reasoning about dual-use alignment research
If research would be bad for other people to know about, you should mainly just not do it.
Regarding Microsoft, I feel quite negatively about their involvement in AI
If it’s Bing you’re referring to, I must disagree! The only difference between GPT-4 and Bing is that Bing isn’t deceptively aligned. I wish we got more products like Bing! We need more transparency, not deception! Bing also posed basically no AI risk since it was just a fine-tuning of GPT-4 (if Bing foomed, than GPT-4 would’ve foomed first).
I think calling for a product recall just because it is spooky is unnecessary and will just distract from AI safety.
GPT-4, on the other hand, is worming it’s way through society. It doesn’t have as many spooky ones, but it has the spookiest one of all: power seeking.
I think a Loebner Silver Prize is still out of reach of current tech; GPT-4 sucks at most board games (which is possible for a judge to test over text).
I won’t make any bets about GPT-5 though!
It’s a kind of mode collapse.
[Question] What projects and efforts are there to promote AI safety research?
Seeing Ghosts by GPT-4
First strike gives you a slightly bigger slice of the pie (due to the pie itself being slightly smaller), but then everyone else gets scared of you (including your own members).
MAD is rational because then you lose a proportion of the pie to third parties.
The problem is, what if a mesaoptimizer becomes dangerous before the original AI does?
I’m not saying LessWrong are the only misaligned ones, although we might be more than others. I’m saying any group who wants humanity to survive is misaligned with respect to the optimization process that created that group.
Luckily, at least a little bit of this misalignment is common! I’m just pointing out that we were never optimized for this; the only reason humans care about humanity as a whole is that our society isn’t the optimum of the optimization process that created it. And it’s not random either; surviving is an instrumental value that any optimization process has to deal with when creating intelligences.
I believe this has been proposed before (I’m not sure what the first time was).
The main obstacles is that this still doesn’t solve impact regularization, and a more generalized type of shutdownability then you presented.