NicholasKees(Nicholas Kees Dupuis)

Karma: 1,147

Independent AI safety researcher

NicholasKees 15 Dec 2020 21:14 UTC
8 points
0
on: Meditations On Moloch
A monarch is an unincentivized incentivizer. He actually has the god’s-eye-view and is outside of and above every system. He has permanently won all competitions and is not competing for anything, and therefore he is perfectly free of Moloch and of the incentives that would otherwise channel his incentives into predetermined paths. Aside from a few very theoretical proposals like my Shining Garden, monarchy is the only system that does this.
It seems to me that a monarch is far from outside every system, and is highly dependent on their key supporters (generals, bureaucrats, etc.) to stay in power. Their rule is not guaranteed to be permanent, and they are forever competing to avoid being replaced by their court. The Dictator’s Handbook makes this argument better, but it seems the main difference between an authoritarian ruler and a democratically elected one is not whether or not they have to satisfy the people they rule, but rather how many of those people they need to satisfy to stay in power.

Actually, autocrats are often terrible, not because of bad luck, but because their particular flavor of incentives also gives them a race to the bottom. I highly recommend the Dictator’s Handbook, they give a much fuller explanation there. Even selling our souls to a potential Stalin won’t rid us of the great almighty Moloch!

NicholasKees 8 Feb 2021 12:22 UTC
29 points
0
on: Killing the ants
I really enjoyed this post. No assumptions are made about the moral value of insects, but rather the author just points out just how little we ever thought about it in the first place. Given that, as a species, we already tend to ignore a lot of atrocities that form a part of our daily lives, if it WERE true, beyond a reasonable doubt, that washing our sheets killed thousands of sentient creatures, I still can’t imagine we’d put in a significant effort to find an alternative. (And it certainly wouldn’t be socially acceptable to have stinky sheets!) I think it would be healthy to cultivate genuine curiosity and caring about these things, rather than ridicule people who depart from social norms. If insects do deserve moral weight, I’d like to be the sort of person who, and a part of a community that, would notice and take that seriously.

NicholasKees 27 Nov 2022 17:57 UTC
3 points
0
on: Mechanistic anomaly detection and ELK
If we are able to flag a treacherous turn as cognitively anomalous, then we can take that opportunity to shut down a system and retrain on the offending datapoint.
What do you mean by “retrain on the offending datapoint”? I would be worried about Goodhearting on this by selecting for systems which don’t set off the anomaly detector, and thereby making it a less reliable safeguard.

NicholasKees 29 Dec 2022 18:20 UTC
3 points
0
on: Why I’m Sceptical of Foom
On many useful cognitive tasks(chess, theoretical research, invention, mathematics, etc.), beginner/dumb/unskilled humans are closer to a chimpanzee/rock than peak humans (for some fields, only a small minority of humans are able to perform the task at all, or perform the task in a useful manner
This seems due to the fact that most tasks are “all or nothing”, or at least have a really steep learning curve. I don’t think that humans differ that much in intelligence, but rather that small differences result in hugely different abilities. This is part of why I expect foom. Small improvements to an AI’s cognition seem likely to deliver massive payoffs in terms of their ability to affect the world.

NicholasKees 29 Dec 2022 18:28 UTC
2 points
0
on: Against Agents as an Approach to Aligned Transformative AI
I am also completely against building powerful autonomous agents (albeit for different reasons), but to avoid doing this seems to require extremely high levels of coordination. All it takes is one lab to build a singleton capable of disempowering humanity. It would be great to stay in the “tool AI” regime for as long as possible, but how?

NicholasKees 30 Dec 2022 16:19 UTC
1 point
0
on: Human sexuality as an interesting case study of alignment
It seems to occur mostly without RL. People start wanting to have sex before they have actually had sex.
This doesn’t mean that it isn’t a byproduct of RL. Something needs to be hardcoded, but a simple reward circuit might lead to a highly complex set of desires and cognitive machinery. I think the things you are pointing to in this post sound extremely related to what Shard Theory is trying to tackle.
https://www.lesswrong.com/posts/iCfdcxiyr2Kj8m8mT/the-shard-theory-of-human-values

NicholasKees 30 Dec 2022 16:45 UTC
9 points
2
on: Human sexuality as an interesting case study of alignment
Assuming that what evolution ‘wants’ is child-bearing heterosexual sex, then human sexuality has a large number of deviations from this in practice including homosexuality, asexuality, and various paraphilias.
I don’t think this is a safe assumption. Sex also serves a social bonding function beyond procreation, and there are many theories about the potential advantages of non-heterosexual sex from an evolutionary perspective.

A couple things you might find interesting:
-Men are 33% more likely to be gay for every older brother they have: https://pubmed.ncbi.nlm.nih.gov/11534970/
-Women are more likely to be bisexual than men, which may have been advantageous for raising children: https://pubmed.ncbi.nlm.nih.gov/23563096/
- Homosexuality is extremely common in the animal kingdom (in fact the majority of giraffe sex is homosexual): https://en.wikipedia.org/wiki/List_of_mammals_displaying_homosexual_behavior

NicholasKees 30 Dec 2022 17:01 UTC
6 points
4
on: Models Don’t “Get Reward”
Wow, this post is fantastic! In particular I love the point you make about goal-directedness:
If a model is goal-directed with respect to some goal, it is because such goal-directed cognition was selected for.
Looking at our algorithms as selection processes that incentivize different types of cognition seems really important and underappreciated.

NicholasKees 30 Dec 2022 17:07 UTC
1 point
0
in reply to: tailcalled’s comment on: Human sexuality as an interesting case study of alignment
I agree that this is an important difference, but I think that “surely cannot be adaptive” ignores the power of group selection effects.

NicholasKees 30 Dec 2022 18:20 UTC
1 point
0
in reply to: Sam Ringer’s comment on: Simulators
To me this statement seems mostly tautological. Something is instrumental if it is helpful in bringing about some kind of outcome. The term “instrumental” is always (as far as I can tell) in reference to some sort of consequence based optimization.

NicholasKees 30 Dec 2022 18:33 UTC
1 point
0
in reply to: beren’s comment on: Human sexuality as an interesting case study of alignment
What is evolution’s true goal? If it’s genetic fitness, then I don’t see how this demonstrates alignment. Human sexuality is still just an imperfect proxy, and doesn’t point at the base objective at all.

I agree that it’s very interesting how robust this is to the environment we grow up in, and I would expect there to be valuable lessons here for how value formation happens (and how we can control this process in machines).

NicholasKees 3 Jan 2023 20:41 UTC
LW: 1 AF: 1
0
AF
on: Applying superintelligence without collusion
and increasing the number of actors can make collusive cooperation more difficult
An empirical counterargument to this is in the incentives human leaders face when overseeing people who might coordinate against them. When authoritarian leaders come into power they will actively purge members from their inner circles in order to keep them small. The larger the inner circle, the harder it becomes to prevent a rebellious individual from gathering the critical mass needed for a full blown coup.

Source: The Dictator’s Handbook by Bruce Bueno de Mesquita and Alastair Smith

NicholasKees 3 Jan 2023 22:00 UTC
4 points
3
in reply to: DragonGod’s comment on: [Simulators seminar sequence] #1 Background & shared assumptions
I interpret the goal as being more about figuring out how to use simulators as powerful tools to assist humans in solving alignment, and not at all shying away from the hard problems of alignment. Despite our lack of understanding of simulators, people (such as yourself) have already found them to be really useful, and I don’t think it is unreasonable to expect that as we become less confused about simulators that we learn to use them in really powerful and game-changing ways.

You gave “Google” as an example. I feel like having access to Google (or another search engine) improves my productivity by more than 100x. This seems like evidence that game-changing tools exist.

NicholasKees 5 Jan 2023 16:39 UTC
6 points
1
on: Infohazards vs Fork Hazards
Sometimes something can be infohazardous even if it’s not completely true. Even though the northwest passage didn’t really exist, it inspired many European expeditions to find it. There’s a lot of hype about AI right now, and I think the idea for a cool new capabilities idea (even if it turns out not to work well) can also do harm by inspiring people try similar things.

NicholasKees 6 Jan 2023 18:06 UTC
LW: 3 AF: 2
AF
in reply to: Linda Linsefors’s comment on: Linda Linsefors’s Shortform
GI is very efficient, if you consider that you can reuse a lot machinery that you learn, rather than needing to relearn it over and over again. https://towardsdatascience.com/what-is-better-one-general-model-or-many-specialized-models-9500d9f8751d

NicholasKees 25 Jan 2023 18:16 UTC
6 points
0
on: Pessimistic Shard Theory
I really loved the post! I wish more people took S-risks completely seriously before dismissing them, and you make some really great points.

In most of your examples, however, it seems the majority of the harm is in an inability to reason about the consequences of our actions, and if humans became smarter and better informed it seems like a lot of this would be ironed out.

I will say the hospice/euthanasia example really strikes a chord with me, but even there, isn’t it more a product of cowardice than a failure of our values?

NicholasKees 25 Jan 2023 18:22 UTC
3 points
1
on: Pessimistic Shard Theory
An example I think about a lot is the naturalistic fallacy. There is a lot horrible suffering that happens in the natural world, and a lot of people seem to be way too comfortable with that. We don’t have any really high leverage options right now to do anything about it, but it strikes me as plausible that even if we could do something about it, we wouldn’t want to. (perhaps even even make it worse by populating other planets with life https://www.youtube.com/watch?v=HpcTJW4ur54)

NicholasKees 11 Feb 2023 3:57 UTC
11 points
5
in reply to: metasemi’s comment on: Cyborgism
Thank you for this gorgeously written comment. You really capture the heart of all this so perfectly, and I completely agree with your sentiments.

NicholasKees 11 Feb 2023 16:59 UTC
LW: 10 AF: 6
3
AF
in reply to: David Scott Krueger (formerly: capybaralet)’s comment on: Cyborgism
I agree that this is important. Are you more concerned about cyborgs than other human-in-the-loop systems? To me the whole point is figuring out how to make systems where the human remains fully in control (unlike, e.g. delegating to agents), and so answering this “how to say whether a person retains control” question seems critical to doing that successfully.

NicholasKees 22 Feb 2023 21:31 UTC
1 point
0
in reply to: Eric Drexler’s comment on: Applying superintelligence without collusion
That’s a good point. There are clearly examples of systems where more is better (e.g. blockchain). There are just also other examples where this opposite seems true.