I think one of my main contrarian instincts is to see a flat direction and worry we’ve been creeping up it, to the point that I’m actually pretty receptive to arguments for going the other way.
I take it somewhat as a sign I have this well-calibrated that your more-sleep and less-sleep paragraphs sounded about equally reasonable to me.
I remember very early in the pandemic reading an interview with someone who justified their decision to continue going to bars by pointing that they had a high-contact job that they still had to do. I noticed that this in fact made their decision worse (in terms of total societal Covid risk).
(And as the number of cases was still quite low at the time, the 100% bound on risk was much less plausibly a factor)
If you’re deciding whether or not to add the (n+1)th person, what matters is the marginal risk of that decision.
Another explanation for logarithmic thinking is Laplace’s rule of succession.
If you have N exposures and have not yet had a bad outcome, the Laplacian estimate of a bad outcome from the next exposure goes as 1/N (the marginal cost under a logarithmic rule).
Applying this to “number of contacts” rather than “number of exposures” is admittedly more strained but I could still see it playing a part.
I think the idea is that Huemer’s quote seems to itself be an effort to repair society without fully understanding it.
I don’t think this is a facile objection, either*—I think it’s very possible that “Voters, activists, and political leaders” are actually an essential part of the complex mechanism of society and if they all stopped trying to remedy problems things would get even worse.
On the other hand, you can recurse this reasoning and say that maybe bold counterintuitive philosophical prescriptions like Huemer’s are also part of the complex mechanism.
*To the quote as a standalone argument, anyway—haven’t read the essay.
Searching “real estate money laundering”, it does sound like this is a real thing. But the few pages I just read generally don’t emphasize the “overpaying in exchange for out-of-band services” mechanism—they seem to be thinking in terms of buying (with dirty money) and selling (for clean money) at market prices, and emphasize that real estate’s status as “a good investment” is an important part of why criminals use it.
(They also bring up international tax avoidance strategies. Obviously using property to “park your wealth” also relies on prices not going down and hopefully going up at a reasonable rate).
So it sounds like OP’s strategy of building more and more until the speculators stop paying would work almost equally well against these types of buyers.
I find this distinction useful as well. I suspect it’s one that many people understand implicitly and many others totally lack. Evidence of the latter: I’ve seen intelligent people be far too upset by https://en.wikipedia.org/wiki/K_Foundation_Burn_a_Million_Quid.
One (admittedly idealistic) solution would be to spread awareness of this dynamic and its toxicity. You can’t totally expunge it that way, but you could make it less prevalent (i.e. upper-middle managers probably can’t be saved, but it might get hard to find enough somewhat-competent lower-middle managers who will play along).
What would it look like to achieve an actually-meaningful level of awareness? I would say “there is a widely-known and negative-affect-laden term for the behavior of making strictly-worse choices to prove loyalty”.
Writing this, I realized that the central example of “negative-sum behavior to prove loyalty” is hazing. (I think some forms of hazing involve useful menial labor, but classic frat-style hazing is unpleasant for the pledges with no tangible benefit to anyone else). It seems conceivable to get the term self-hazing into circulation to describe cases like the one in OP, to the point that someone might notice when they’re being expected to self-haze and question whether they really want to go down that road.
Had she been the sort to do that, Omega wouldn’t have made her the offer in the first place.
I could use more clarity on what is and isn’t level three.
Supposedly at level three, saying “There’s a lion across the river” means “I’m with the popular kids who are too cool to go across the river.” But there’s more than one kind of motivation the speaker might have.
A) A felt sense that “There’s a lion across the river” would be a good thing to say (based on subconscious desire to affiliate with the cool kids, and having heard the cool kids say this)
B) A conscious calculation that saying this will ingratiate you with the cool kids, based on explicit reasoning about other things the cool kids have said, but motivated by a felt sense that those kids are cool and you want to join them
C) A conscious calculation that saying this will ingratiate you with the cool kids, motivated by a conscious calculation that gaining status among the cool kids will yield tangible benefits.
Are all three of these contained by level three? Or does an element of conscious calculation take us into level four?
(I think C) has a tendency to turn into B) and B) likewise into A), but I don’t think it’s inevitable)
The answer looks something like “if she had been planning to do that, the opaque envelope would have been empty”.
I think I know what you mean (about even-numbered pages; I’m not familiar with Manuscript), but there isn’t actually missing necessary information (unless you haven’t read HPMoR, in which case you’re definitely missing necessary information). I suppose what’s missing is unnecessary information—each scene is stripped to its bare essentials.
I like to read blog posts by people who do real statistics, but with a problem in front of me I’m very much making stuff up. It’s fun, though!
The approach I settled on was to estimate the success chance of a possible stat line by taking a weighted success rate over the data, weighted by how similar the hero’s stats are to the stats being evaluated. My rationale is that based on intuitions about the domain I would not assume linearity or independence of stats’ effects or such, but I would assume that heroes with similar stats would have similar success chances.
estimatedchance(stats) = sum(weightfactor(hero.stats, stats) * hero.succeeded) / sum(weightfactor(hero, stats))
weightfactor(hero.stats, stats) = k ^ distance(hero.stats, stats)
(Assuming 0 < k < 1, and hero.succeeded is 1 if the hero succeeded and 0 otherwise)
I tried using both Euclidean and Manhattan distances, and various values for k as well. I also tried a hacky variant of Manhattan distance that added abs(sum(statsA) - sum(statsB)) to the result, but it didn’t seem to change much.
Lastly, I tried the replacing (hero.succeeded) with (hero.succeeded—linearprediction(sum(hero.stats))) to try to isolate builds that do well relative to their stat total. linearprediction is a simple model I threw together by eyeballing the data: 40% chance to succeed with total stats of 60, 100% chance with total stats >= 95, linear in between. Could probably be improved with not too much effort, but I have to stop somewhere.
I generally found two clusters of optima, one around (8, 14, 13, 13, 8, 16)—that is, +4 CHA, +2 STR, +4 WIS—and the other around (4, 16, 13, 14, 9, 16)—that is, +2 CON, +1 INT, +3 STR, +4 WIS. The latter was generally favored by low k values, as the heroes with stats closest to that value generally did quite well but those a little farther away got less impressive. So it could be a successful strategy that doesn’t allow too much deviation, or just a fluke. Using the linear prediction didn’t seem to change things much.
If I had to pick one final answer, it’s probably (8, 14, 13, 13, 8, 16) (though there seems to be a fairly wide region of variants that tend to do pretty well—the rule seems to be ‘some CHA, some WIS, and maybe a little STR’), but I find myself drawn towards the maybe-illusory (4, 16, 13, 14, 9, 16) niche solution.
ETA: Looks like I was iterating over an incomplete list of possible builds… but it turned out not to matter much.
ETA again (couldn’t leave this alone): I tried computing log-likelihood scores for my predictors (restricting the ‘training’ set to the first half of the data and using only the second half for validation. I do find that with the right parameters some of my predictors do better than simple linear regression on sum of stats, and also better the apparently-better predictor of simple linear regression on sum of non-dex stats. But they don’t beat it by much. And it seems the better parameter values are the higher k values, meaning the (8, 14, 13, 13, 8, 16) cluster is probably the one to bet on.
I see “property assessment” on the list, but it’s worth calling out self-assessment specifically (where the owner has to sell their property if offered their self-assessed price).
Then there are those grades organizations give politicians. And media endorsements of politicians. And, for that matter, elections.
Keynesian beauty contests.
And it seems with linking to this prior post (not mine): https://www.lesswrong.com/posts/BthNiWJDagLuf2LN2/evaluating-predictions-in-hindsight
Glad to hear this is helpful for you too :)
I didn’t really follow the time-derivative idea before, and since you said it was equivalent I didn’t worry about it :p. But either it’s not really equivalent or I misunderstood the previous formulation, because I think everything works for me now.
So if we (1) decide “I will imagine yummy food”, then (2) imagine yummy food, then (3) stop imagining yummy food, we get a positive reward from the second step and a negative reward from the third step, but both of those rewards were already predicted by the first step, so there’s no RPE in either the second or third step, and therefore they don’t feel positive or negative. Unless we’re hungrier than we thought, I guess...
Well, what exactly happens if we’re hungrier than we thought?
(1) “I will imagine food”: No reward yet, expecting moderate positive reward followed by moderate negative reward.
(2) [Imagining food]: Large positive reward, but now expecting large negative reward when we stop imagining, so no RPE on previous step.
(3) [Stops imagining food]: Large negative reward as expected, no RPE for previous step.
The size of the reward can then be informative, but not actually rewarding (since it predictably nets to zero over time). The neocortex obtains hypothetical reward information form the subcortex, without actually extracting a reward—which is the thing I’ve been insisting had to be possible. Turns out we don’t need to use a separate channel! And the subcortex doesn’t have to know or care whether its receiving a genuine prediction or an exploratory imagining from the neocortex—the incentives are right either way.
(We do still need some explanation of why the neocortex can imagine (predict?) food momentarily but can’t keep doing it food forever, avoid step (3), and pocket a positive RPE after step (2). Common sense suggests one: keeping such a thing up is effortful, so you’d be paying ongoing costs for a one-time gain, and unless you can keep it up forever the reward still nets to zero in the end)
Thanks for the reply; I’ve thought it over a bunch, and I think my understanding is getting clearer.
I think one source of confusion for me is that to get any mileage out of this model I have to treat the neocortex as a black box doing trying to maximize something, but it seems like we also need to rely on the fact that it executes a particular algorithm with certain constraints.
For instance, if we think of the ‘reward predictions’ sent to the subcortex as outputs the neocortex chooses, the neocortex has no reason to keep them in sync with the rewards it actually expects to receive—instead, it should just increase the reward predictions to the maximum for some free one-time RPE and then leave it there, while engaging in an unrelated effort to maximize actual reward.
(The equation V(sprev)+=(learning rate)⋅(RPE) explains why the neocortex can’t do that, but adding a mathematical constraint to my intuitive model is not really a supported operation. If I say “the neocortex is a black box that does whatever will maximize RPE, subject to the constraint that it has to update its reward predictions according to that equation,” then I have no idea what the neocortex can and can’t do)
Adding in the basal ganglia as an ‘independent’ reward predictor seems to work. My first thought was that this would lead to an adversarial situation where the neocortex is constantly incentivized to fool the basal ganglia into predicting higher rewards, but I guess that isn’t a problem if the basal ganglia is good at its job.
Still, I feel like I’m missing a piece to be able to understand imagination as a form of prediction. Imagining eating beans to decide how rewarding they would be doesn’t seem to get any harder if I already know I don’t have any beans. And it doesn’t feel like “thoughts of eating beans” are reinforced, it feels like I gain abstract knowledge that eating beans would be rewarded.
Meanwhile, it’s quite possible to trigger physiological responses by imagining things. Certainly the response tends to be stronger if there’s an actual possibility of the imagined thing coming to pass, but it seems like there’s a floor on the effect size, where arbitrarily low probability eventually stops weakening the effect. This doesn’t seem like it stops working if you keep doing it—AIUI, not all hungry people are happier when they imagine glorious food, but they all salivate. So that’s a feedback channel separate from reward. I don’t see why there couldn’t also be similar loops entirely within the brain, but that’s harder to prove.
So when our rat thinks about salt, the amygdala detects that and alerts… idk, the hypothalamus? The part that knows it needs salt… and the rat starts salivating and feels something in its stomach that it previously learned means “my body wants the food” and concludes eating salt would be a good idea.
This might just be me not grokking predictive processing, but...
I feel like I do a version of the rat’s task all the time to decide what to have for dinner—I imagine different food options, feel which one seems most appetizing, and then push the button (on Seamless) that will make that food appear.
Introspectively, this feels to me there’s such a thing as ‘hypothetical reward’. When I imagine a particular food, I feel like I get a signal from… somewhere… that tells me whether I would feel reward if I ate that food, but does not itself constitute reward. I don’t generally feel any desire to spend time fantasizing about the food I’m waiting for.
To turn this into a brain model, this seems like the neocortex calling an API the subcortex exposes. Roughly, the neocortex can give the subcortex hypothetical sensory data and get a hypothetical reward in exchange. I suppose this is basically hypothesis two with a modification to avoid the pitfall you identify, although that’s not how I arrived at the idea.
This does require a second dimension of subcortex-to-neocortex signal alongside the reward. Is there a reason to think there isn’t one?
I’m not sure Level 3 is actually less agentic than Level 1. The Oracle does not choose which truths to speak in order to pursue goals; if they did, they’d be the Sage.