~[agent foundations]
Mateusz Bagiński
The Litany of Tarrrrski is beyond wholesome!
Thank you for doing this!
I ask you, do you really think that an AI aligned to human values would refrain from doing something like this to anyone? One of the most fundamental aspects of human values is the hated outgroup. Almost everyone has somebody they’d love to see suffer. How many times has one human told another “burn in hell” and been entirely serious, believing that this was a real thing, and 100% deserved?
This shows how vague a concept “human values” are, and how different people can interpret it very differently.
I always interpreted “aligning an AI to human values” as something like “making it obedient to us, ensuring it won’t do anything that we (whatever that ‘we’ is—another point of vagueness) wouldn’t endorse, lowering suffering in the world, increasing eudaimonia in the world, reducing X-risks, bringing the world closer to something we (or smarter/wiser versions of us) would consider a protopia/utopia”
Certainly I never thought it to be a good idea to imbue the AI with my implicit biases, outgroup hatred, or whatever. I’m ~sure that people who work on alignment for a living have also seen these skulls.
I know little about CEV, but if I were to coherently extrapolate my volition, then one aspect of that would be increasing the coherence and systematicity of my moral worldview and behavior, including how (much) different shards conform to it. I would certainly trash whatever outgroup bias I have (not counting general greater fondness for the people/other things close to me).
So, yeah, solving “human values” is also a part of the problem but I don’t think that it makes the case against aligning AI.
Does anybody know what happened to Julia Galef?
(Meta) Why do you not use capital letters, unless in acronyms? I find it harder to parse.
Metacomment on speculations on who might have sabotaged NordStream.
It seems like people here mostly implicitly treat possible state actors as coherent, unified agents. But maybe it wasn’t any particular state acting as a whole but rather some small group within that state that decided to do it on their own. Even if they considered it likely to be identified after the fact, the subgroup may have judged the sabotage to be in the interest of the whole nation or maybe that particular subgroup.
(I don’t know how much fragmentation of that sort there is in any given country but I think it’s at least plausible)
I think wanting to seem like sober experts makes them kinda believe the things they expect other people to expect to hear from sober experts.
Also, there was Gato, trained on shitload of different tasks, achieving good performance on a vast majority of them, which led some to call it “subhuman AGI”.
I agree in general with the post, although I’m not quite sure how you would stich several narrow models/systems together to get an AGI. A more viable path is probably something like training it end-to-end, like Gato (needless to say, please don’t).
Why did FHI get closed down? In the end, because it did not fit in with the surrounding administrative culture. I often described Oxford like a coral reef of calcified institutions built on top of each other, a hard structure that had emerged organically and haphazardly and hence had many little nooks and crannies where colorful fish could hide and thrive. FHI was one such fish but grew too big for its hole. At that point it became either vulnerable to predators, or had to enlarge the hole, upsetting the neighbors. When an organization grows in size or influence, it needs to scale in the right way to function well internally – but it also needs to scale its relationships to the environment to match what it is.
I would love to see something like Vanessa’s LTA reading list but for devinterp.
Reading this, it reminds me of the red flags that some people (e.g. Soares) saw when interacting with SBF and, once shit hit the fan, ruminated over not having taken some appropriate action.
I think the parentheses are off here. IIUC you want to express the equality of divergences, not divergences multiplied by probabilities (which wouldn’t make sense I think).
Typo: →
Typo: it’s “Goodhart”, not “Goodheart”
First, this presupposes that for any amount of suffering there is some amount of pleasure/bliss/happiness/eudaimonia that could outweigh it. Not all LWers accept this, so it’s worth pointing that out.
But I don’t think the eternal paradise/mediocrity/hell scenario accurately represents what is likely to happen in that scenario. I’d be more worried about somebody using AGI to conquer the world and establish a stable totalitarian system built on some illiberal system, like shariah (according to Caplan, it’s totally plausible for global totalitarianism to persist indefinitely). If you get to post-scarcity, you may grant all your subjects UBI, all basic needs met, etc. (or you may not, if you decide that this policy contradicts Quran or hadith), but if your convictions are strong enough, women will still be forced to wear burkas, be basically slaves of their male kin etc. One could make an argument that abundance robustly promotes more liberal worldview, loosening of social norms, etc., but AFAIK there is no robust evidence for that.
This is meant just to illustrate that you don’t need an outgroup to impose a lot of suffering. Having a screwed up normative framework is just enough.
This family of scenarios is probably still better than AGI doom though.
I don’t quite get what actions are available in the heat engine example.
Is it just choosing a random bit from H or C (in which case we can’t see whether it’s 0 or 1) OR a specific bit from W (in which case we know whether it’s 0 or 1) and moving it to another pool?
Can’t you restate the second one as the relationship between two utility functions and such that increasing one (holding background conditions constant) is guaranteed not to decrease the other? I.e. their respective derivatives are always non-negative for every background condition.
(I skipped straight to ch7, according to your advice, so I may be missing relevant parts from the previous chapters if there are any.)
I probably agree with you on the object level regarding phenomenal consciousness.
That being said, I think it’s “more” than a meme. I witnessed at least two people not exposed to the scientific/philosophical literature on phenomenal consciousness reinvent/rediscover the concept on their own.
It seems to me that the first-person perspective we necessarily adopt makes inclines to ascribe to sensations/experiences some ineffable, seemingly irreducible quality. My guess is that we (re)perceive our perception as a meta-modality different from ordinary modalities like vision, hearing, etc, and that causes the illusion. It’s plausible that being raised in a WEIRD culture contributes to that inclination.
A butterfly conjecture: While phenomenal consciousness is an illusion, there is something to be said about the first-person perspective being an interesting feature of some minds (sufficiently sophisticated? capable of self-reflection?). It can be viewed as a computational heuristic that makes you “vulnerable” to certain illusions or biases, such as phenomenal consciousness, but also:
the difficulty to accept one-boxing in the Newcomb’s problem
mind-body dualism
the naive version of free will illusion, difficulty in accepting physicalism/determinism
(maybe) the illusion of being in control over your mind (various sources say that meditation-naive people are often surprised to discover how little control they have over their own mind when they first try meditation)
A catchy term for this line of investigation could be “computational phenomenology”.
How about a dialogue on this, with no (asymmetric) posting rate limits?
I think sigma-algebras are probably not the right algebra to base beliefs on. Something resembling linear logic might be better for reasons we’ve discussed privately; that’s very speculative of course. Ideally the right algebra should be derived from considerations arising in construction of the representation theorem, rather than attempting to force any outcome top-down.
Have you elaborated on this somewhere or can you link some resource about why linear logic is a better algebra for beliefs than sigma algebra?
We know of at least one eukaryote species that lost its mitochondria. I don’t know about mitochondria but no nucleus. Erythrocytes don’t have either, so maybe (speculating here) there are some specialized animal cells that have mitochondria but no nuclei.
I’d love to see/hear you on his podcast.