French to English you always translate as “you”. You probably mean translating from English to French where you need to make a judgement whether to use “vous” or “tu”.
martinkunev
Subjective experience is most likely physical
I think of preferences as a description of agent behavior, which means the preferences changed.
When you say “got better at achieving it’s preference” I suppose you’re thinking of preference as some goal the agent is pursuing. I find this view (assuming goal directedness) less general in its ability to describe agent behavior. It may be more useful, but if so I think we need to justify it better. I don’t exclude the possibility that there is a piece of information I don’t know about.
Goal-directedness leads toward instrumental convergence and away from corrigibility. If we are looking to solve corrigibility, I think it’s worth it to question goal-directedness.
vim
closing stuff—window/application list menu which opens with alt+tab / command+tab; dropdown menus; popup messages in sites
When was the last time you (intentionally) used your caps lock key?
yesterday
I may be the only one :)
I’d rather remap my right shift, which keyboard makers for some reason tend to make huge.
You touch on the point that people can mean several different things when talking about preferences. I think this causes most of the confusion and subsequent debates.
I wrote about this from a completely different perspective recently.
To add to the discussion, my impression is that many people in the US believe they have some moral superiority or know what is good for other people. The whole “we need a manhattan project for AI” discourse is reminiscent of calling for global domination. Also, doing things for the public good is controversial in the US as it can infringe on individual freedom.
This makes me really uncertain as to which AGI would be better (assuming somebody controls it).
Western AI is much more likely to be democratic
This sounds like “western AI is better because it is much more likely to have western values”
I don’t understand what you mean by “humanity’s values”. Also, one could maybe argue that “democratic” societies are those where actions are taken based on whether the majority of people can be manipulated to support them.
I find “indifference” poorly defined in this context, which makes me doubt totality and transitivity. I’m trying to clarify my own confusion on this.
Understanding Agent Preferences
I wrote something on a related idea a while back:
Disincentivizing deception in mesa optimizers with Model Tampering
I’ve read the sequences. I’m not sure if I’m missing something or the issues I raised are just deeper. I’ll probably ignore this topic until I have more time to dedicate.
the XOR of two boolean elements is straightforward to write down as a single-layer MLP
Isn’t this exactly what Minsky showed to be impossible? You need an additional hidden layer.
I don’t find any of this convincing at all. If anything, I’m confused.
What would a mapping look like? If it’s not physically present then we recursively get the same issue—where is the mapping for the mapping?
Where is the mapping between the concepts we experience as qualia and the physical world? Does a brain do anything at all?
A function in this context is a computational abstraction. I would say this is in the map.
they come up with different predictions of the experience you’re having
The way we figure out which one is “correct” is by comparing their predictions to what the subject says. In other words, one of those predictions is consistent with the subject’s brain’s output and this causes everbody to consider it as the “true” prediction.
There could be countless other conscious experiences in the head, but they are not grounded by the appropriate input and output (they don’t interact with the world in a reasonable way).
I think it only seems that consciousness is a natural kind and this is because there is one computation that interacts with the world in the appropriate way and manifests itself in it. The other computations are, in a sense, disconnected.
I don’t see why consciousness has to be objective other than this being our intuition (which is notorious for being wrong out of hunter-gatherer contexts). Searle’s wall is a strong argument that consciousness is as subjective as computation.
I would have appreciated an intuitive explanation of the paradox something which I got from the comments.
“at the very beginning of the reinforcement learning stage… it’s very unlikely to be deceptively aligned”
I think this is quite a strong claim (hence, I linked that article indicating that for sufficiently capable models, RL may not be required to get situational awareness).
Nothing in the optimization process forces the AI to map the string “shutdown” contained in questions to the ontological concept of a switch turning off the AI. The simplest generalization from RL on questions containing the string “shutdown” is (arguably) for the agent to learn certain behavior for question answering—e.g. the AI learns that saying certain things outloud is undesirable (instead of learning that caring about the turn off switch is undesirable). People would likely disagree on what counts as manipulating shutdown, which shows that the concept of manipulating shutdown is quite complicated so I wouldn’t expect generalizing to it to be the default.
preference for X over Y
...
”A disposition to represent X as more rewarding than Y (in the reinforcement learning sense of ‘reward’)”The talk about “giving reward to the agent” also made me think you may be making the assumption of reward being the optimization target. That being said, as far as I can tell no part of the proposal depends on the assumption.
In any case, I’ve been thinking about corrigibility for a while and I find this post helpful.
They were teaching us how to make handwriting beautiful and we had to exercice. The teacher would look at the notebooks and say stuff like “you see this letter? It’s tilted in the wrong direction. Write it again!”.
This was a compulsory part of the curriculum.
There is the possibility of misgendering somebody and them taking it seriously. Sometimes it feels like you’re walking in a minefield. It’s not conducive to a good social interaction.
I’m wondering why languages like finnish can do just fine with “hän” while english needs he/she.