~[agent foundations]
Mateusz Bagiński
I think it might have been kinda the other way around. We wanted to systematize (put on a firm, principled grounding) a bunch of related stuff like care-based ethics, individuality, identity (and the void left by the abandonment of the concept of “soul”), etc, and for that purpose, we coined the concept of (phenomenal) consciousness.
My best and only guess is https://www.philiptrammell.com/
I think the parentheses are off here. IIUC you want to express the equality of divergences, not divergences multiplied by probabilities (which wouldn’t make sense I think).
Typo: →
In the Alice and Bob example, suppose there is a part (call it X) of the image that was initially painted green but where both Alice and Bob painted purple. Would that mean that the part of the natural latent over images A and B corresponding to X should be purple?
Bug report: I got notified about Joe Carlsmith’s most recent post twice, the second time after ~4 hours
Can you link to what “h/acc” is about/stands for?
Can new traders be “spawned”?
I’d actually love to read a dialogue on this topic between the two of you.
The principle fails even in these simple cases if we carve up the space of outcomes in a more fine-grained way. As a coin or a die falls through the air, it rotates along all three of its axes, landing in a random 3D orientation. The indifference principle suggests that the resting states of coins and dice should be uniformly distributed between zero and 360 degrees for each of the three axes of rotation. But this prediction is clearly false: dice almost never land standing up on one of their corners, for example.
The only way I can parse this is that you are conflating (1) the position of a dice/coin when it makes contact with the ground and (2) its position when it stabilizes/[comes to rest]. A dice/coin can be in any position when it touches the ground but a vast majority of those are unstable, so it doesn’t remain in it for long.
More generally, John Miller and colleagues have found training performance is an excellent predictor of test performance, even when the test set looks fairly different from the training set, across a wide variety of tasks and architectures.
Counterdatapoint to [training performance being an excellent predictor of test performance]: in this paper, GPT-3 was fine-tuned to multiply “small” (e.g., 3-digit by 3-digit) numbers, which didn’t generalize to multiplying bigger numbers.
Yeah, that’s interesting… unlike fetishes and math, this is something other animals should (?) in principle be capable of but apparently it’s a uniquely human thing.
Nah, IMO it’s a straightforward extrapolation of some subset of normal human values; not that different from what I would do
How do you define/measure “weird” (and strength of “want”, for that matter)?
I don’t have anything more concrete than “seemingly not in the category of things humans tend to intrinsically want”. ¯\_(ツ)_/¯
To try to give a concrete answer, I’d say suicide by an otherwise-healthy human is the weirdest desire I know of.
Yeah, that’s a good example and brought to my mind obvious-in-retrospect [body integrity dysphoria]/xenomelia where (otherwise seemingly psychologically normal?) people want to get rid of some part of their body. (I haven’t looked into it that much but AFAIR it’s probably something going somewhat precisely wrong with the body schema?)
I tentatively agree with this view.
This still leaves open the question: “What are some uncommon/peculiar attractor states corresponding to [people seemingly terminally valuing ‘weird’ things]?”.
Yeah, I linked a new version of this plot in the OP.
Something like “terminal”/”intrinsic”, i.e. not in service of any other desire.
ETA: the terminal/instrumental distinction probably doesn’t apply cleanly to humans but think of the difference between Alice who reads book XYZ because she really likes XYZ (in the typical ways humans like books) and Bob who reads the same book only to impress Charlie.
[Question] What are the weirdest things a human may want for their own sake?
because in those worlds all computations in the brain are necessary to do a “human mortality”
I think you meant “human morality”
Can you give pointers to where Quine and Nozick talk about this?
I fully agree that something like persistence/[continued existence in ~roughly the same shape] is the most natural/appropriate/joint-carving way to think about whatever-natural-selection-is-selecting-for in its full generality. (At least that’s the best concept I know at the moment.)
(Although there is still some sloppiness in what does it mean for a thing at time t0 to be “the same” as some other thing at time t1.)
This view is not entirely novel, see e.g., Bouchard’s PhD thesis (from 2004) or the SEP entry on “Fitness” (ctrl+F “persistence”).
I also agree that [humans are]/[humanity is] obviously massively successful on that criterion.
I’m very uncertain as to what implications this has for AI alignment.