Dweomite

Karma: 890

Dweomite 21 Mar 2021 19:40 UTC
3 points
in reply to: Idan Arye’s comment on: Strong Evidence is Common
“Mark Xu” is an unusually short name, so the message-ending might actually contain most of the entropy.
The phrases “my name is Mark Xu” and “my name is Mortimer Q. Snodgrass” contain roughly the same amount of evidence, even though the second has 12 additional letters. (“Mark Xu” might be a more likely name on priors, but it’s nowhere near 2^(4.7 * 12) more likely.)

Dweomite 25 Mar 2021 3:44 UTC
1 point
on: My research methodology
I feel confused about the failure story from example 3. (First 3 bullet-points in that section.)
It sounded like: We ask for a human-comprehensible way to predict X; the computer uses a very low-level simulation plus a small bridge that predicts only and exactly X; humans can’t use the model to predict any high-level facts besides X.
But I don’t see how that leads to egregious misalignment. Shouldn’t the humans be able to notice their inability to predict high-level things they care about and send the AI back to its model-search phase? (As opposed to proceeding to evaluate policies based on this model and being tricked into a policy that fails “off-screen” somewhere.)

Dweomite 1 Apr 2021 8:11 UTC
3 points
on: Embedded World-Models
This article talks about multi-level models, where you somehow switch between cheaper models and more-accurate models depending on your needs. Would it be useful to generalize this idea to switching between multiple “same-level” models that are differentiated by something other than cheap vs. accurate?
For example, one might have one model that groups individual people together into “families”, another that groups them into “organizations”, and a third that groups them into “ideologies”. None of those models seems to be strictly “higher” than another (e.g. neither families nor ideologies are composed of each other), and different models might be useful for different problems.
One could also imagine combining all of those into one unified model, of course. But it might be wasteful to model all of them for problems where you only really care about one.
I feel like humans do something like this.
If multiple “same-level” models can coexist, then one strategy for holding onto your values while inventing new models might be to always hold onto whichever model the values were originally defined in, even if you add more models alongside it.

Dweomite 16 Apr 2021 17:51 UTC
1 point
in reply to: romeostevensit’s comment on: Specializing in Problems We Don’t Understand
Does the phrase “levels of abstraction” imply that those four problems form some kind of hierarchy? If so, could you explain how that hierarchy works?

Dweomite 16 Apr 2021 21:58 UTC
3 points
on: What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs)
Sort-of on the topic of terminology, how should “RAAP” be pronounced when spoken aloud? (If the term catches on, some pronunciation will be adopted.)
“Rap” sounds wrong because it fails to acknowledge the second A. Trading the short A for a long A yields “rape”, which probably isn’t a connotation you want. You could maybe push “rawp” (with “aw” as in “hawk”).
If you don’t like any of those, you might want to find another acronym with better phonetics.

Dweomite 29 May 2021 1:56 UTC
1 point
in reply to: Richard_Ngo’s comment on: What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs)
I’m uncertain what phonemes “raahp” denotes.

Dweomite 29 May 2021 2:28 UTC
3 points
on: Finite Factored Sets
Possible examples: After staring at the definition of a set factorization for a minute, it clicked for me when I thought about Quarto.
Quarto is a simple board game played with 16 pieces (and a 4x4 grid) where each piece is (short or tall) and (light or dark) and (round or square) and (solid or hollow). There’s exactly one piece with each combination of attributes; for example, there’s exactly one tall dark round hollow piece.
Thus, the full set of 16 pieces can be factored into {{short, tall}, {light, dark}, {round, square}, {solid, hollow}}. Similarly, given that list of attributes, you can reconstruct the full set of 16 distinct pieces.
Though I think Set is a better-known game. It has 81 cards, where each card has (one, two, or three) pictures of a (diamond, oval, or squiggle) with (solid, striped, or no) shading drawn in (red, green, or purple) ink.
(edited for formatting)

Dweomite 4 Jun 2021 4:40 UTC
1 point
on: Gears in understanding
Perhaps we could say that Gears-like models have low entropy? (Relative to the amount of territory covered.)
You can communicate the model in a small number of bits. That’s why you can re-derive a missing part (your test #3)--you only need a few key pieces to logically imply the rest.
This also implies you don’t have many degrees of freedom; [you can’t just change one detail without affecting others](https://www.lesswrong.com/posts/XTWkjCJScy2GFAgDt/dark-side-epistemology). This makes it (more likely to be) incoherent to imagine one variable being different while everything else is the same (your test #2).
Because the model itself is compact, you can also specify the current state of the system in a relatively small number of bits, inferring the remaining variables from the structure (your test #1). (Although the power here is really coming from the ”...relative to the amount of territory covered” bit. That bit seems critical to reward a single model that explains many things versus a swarm of tiny models that collectively explain the same set of things, while being individually lower-entropy but collectively higher-entropy.)
This line of thinking also reminds me of Occam’s Razor/Solomonoff Induction.

Dweomite 7 Jun 2021 20:39 UTC
4 points
on: Cryonics signup guide #1: Overview
I’m very surprised that you say informed consent requires signing a legal document AND paying a monthly fee to some non-governmental entity. Why can’t you consent using only a document?

Dweomite 10 Jun 2021 20:37 UTC
5 points
on: Public Static: What is Abstraction?
If I’m trying to predict the light entering my eyes, and there’s a brick wall six feet in front of me, it seems weird to me to say that the variables on the far side of the wall are being wiped out because the wall is “noisy” rather than, say, because the wall is “opaque”. Is there some technical sense in which the wall is “noisier” than the air?
Either satisfies your “equal conditional probability” criterion, so I don’t think it affects any of the math, but it seems like it could matter to understanding how this definition applies to the real world.

Dweomite 19 Jun 2021 23:03 UTC
3 points
in reply to: Mati_Roy’s comment on: Cryonics signup guide #1: Overview
The final section of the article says (bold added):
=============================================
If you don’t expect yourself to go through the full process right away for whatever reason, but you want to increase your chances of cryopreservation in the event of your death, you should do the following two easy things:
- Become an Alcor Associate Member (application form here), which costs $5/month, plus a $60 one-time application fee
- Sign a Declaration of Intent to Be Cryopreserved (form here)
Taken together, these constitute informed consent, making it much more likely that it will be legally possible to preserve you in case of an emergency.

Dweomite 19 Jun 2021 23:11 UTC
1 point
in reply to: jimv’s comment on: Finite Factored Sets
What elements of that game are you suggesting would correspond to a set factorization? I’m not seeing one.

Dweomite 27 Jul 2021 18:23 UTC
1 point
on: Intermittent Distillations #4
The sort of basic observations that R&D spending has increased but economic growth has remained roughly the same seems to imply the obvious conclusion that productivity is declining.
That does seem like the most likely conclusion, but another obvious interpretation of “X doesn’t change when we increase Y” would be “maybe Y doesn’t actually affect X in the first place”. For instance, maybe funding above some threshold is all eaten by parasites, or maybe the bottleneck on growth speed is something other than formal R&D (e.g. good ideas might strike randomly regardless of whether you’re a researcher or not, and the official “researchers” just “harvest” the insights that are already “in the air” thanks to the general population).
(Most of my probability mass is still on “productivity is declining.”)

Dweomite 14 Aug 2021 23:27 UTC
2 points
in reply to: JC Plessis’s comment on: The Future: Where are the Colors and the Sports?
Final Fantasy X has a sci-fi sport called “blitz ball” that is played in a spherical swimming pool. (The athletes are inexplicably immune to suffocation.) Structurally resembles sports like soccer or hockey, with two teams of players competing for control of a ball, and goals defended by goalies.
The Ian Banks novel The Player of Games includes depictions of board games, VR fighting games, and an absurdly-complicated, vaguely-described game that an alien empire uses to determine everyone’s rank in society. Some or all of those might count as “sports” depending on how you define the term, though none of them seem like central examples.

Dweomite 24 Aug 2021 18:27 UTC
6 points
in reply to: gianlucatruda’s comment on: Outline of Galef’s “Scout Mindset”
I tried the calibration exercise you linked. Skipped one question where I felt I just had no basis at all for answering, but answered all the rest, even when I felt very unsure.
When I said 95% confident, my accuracy was 100% (9/9)
When I said 85% confident, my accuracy was 83% (5/6)
When I said 75% confident, my accuracy was 71% (5/7)
When I said 65% confident, my accuracy was 60% (3/5)
At a glance, that looks like it’s within rounding error of perfect. So I was feeling pretty good about my calibration, until...
When I said 55% confident, my accuracy was 92% (11/12)
I, er, uh...what? How can I be well-calibrated at every other confidence level and then get over 90% right when I think I’m basically guessing?
Null Hypothesis: Random fluke? Quick mental calculation says winning at least 11 out of 12 coin-flips would be p < .01. Plus, this is a larger sample than any other confidence level, so if I’m not going to believe this, I probably shouldn’t believe any of the other results, either.
(Of course, from your perspective, I’m the one person out of who-knows-how-many test takers that got a weird result and self-selected to write a post about it. But from my perspective it seems pretty surprising.)
Hypothesis #1: There are certain subject areas where I feel like I know stuff, and other subject areas where I feel like I don’t know stuff, and I’m slightly over-confident in the former but drastically under-confident in the later.
This seems likely true to some extent—I gave much less confidence overall in the “country populations” test section, but my actual accuracy there was about the same as other categories. But I also said 55% twice in each of the other 3 test sections (and got all 6 of those correct), so it seems hard to draw a natural subject-area boundary that would fully explain the results.
Hypothesis #2: When I believe I don’t have any “real” knowledge, I switch mental gears to using a set of heuristics that turns out to be weirdly effective, at least on this particular test. (Maybe the test is constructed with some subtle form of bias that I’m subconsciously exploiting, but only in this mental mode?)
For example, on one question where the test asked if country X or Y had a higher population in 2019, I gave a correct, 55% confident answer on the basis of “I vaguely feel like I hear about country X a little more often than country Y, and high population seems like it would make me more likely to hear about a country, so I suppose that’s a tiny shred of Bayesian evidence for X.”
I have a hard time believing heuristics like that are 90% accurate, though.
Other hypotheses?
Possibly relevant: I also once tried playing CFAR’s calibration game, and after 30-something binary questions in that game, I had around 40% overall accuracy (i.e. worse than random chance). I think that was probably bad luck rather than actual anti-knowledge, but I concluded that I can’t use that game due to lack of relevant knowledge.

Dweomite 31 Aug 2021 21:45 UTC
1 point
on: Can you control the past?
In your “active termite blackmail” example, you say that the “it’s too late” objection still applies. That might be true as regards this year’s termites, but you specified this recurs every year. It seems to me there’s plenty of room for this year’s decision to (causally) influence next year’s chances of termites; whether you pay this year seems like strong evidence about whether you will pay next year.
(EDIT: Fixed typo.)

Dweomite 31 Aug 2021 22:33 UTC
LW: 6 AF: -2
AF
on: Can you control the past?
Suppose you run your twins scenario, and the twins both defect. You visit one of the twins to discuss the outcome.
Consider the statement: “If you had cooperated, your twin would also have cooperated, and you would have received $1M instead of $1K.” I think this is formally provable, given the premises.
Now consider the statement: “If you had cooperated, your twin would still have defected, and you would have received $0 instead of $1K.” I think this is also formally provable, given the premises. Because we have assumed a deterministic AI that we already know will defect given this particular set of inputs! Any statement that begins “if you had cooperated...” is assuming a contradiction, from which literally anything is formally provable.
You say in the post that only the cooperate-cooperate and defect-defect outcomes are on the table, because cooperate-defect is impossible by the scenario’s construction. I think that cooperate-cooperate and defect-defect aren’t both on the table, either. Only one of those outcomes is consistent with the AI program that you already copied. If we can say you don’t need to worry about cooperate-defect because it’s impossible by construction, then in precisely what sense are cooperate-cooperate and defect-defect both still “possible”?
I feel like most people have a mental model for deterministic systems (billiard balls bouncing off each other, etc.) and a separate mental model for agents. If you can get your audience to invoke both of these models at once, you have probably instantiated in their minds a combined model with some latent contradiction in it. Then, by leading your audience down a specific path of reasoning, you can use that latent contradiction to prove essentially whatever you want.
(To give a simple example, I’ve often seen people ask variations of “does (some combinatorial game) have a ⁵⁰⁄₅₀ win rate if both sides play optimally?” A combinatorial game, played optimally, has only one outcome, which must occur 100% of the time; but non-mathematicians often fail to notice this, and apply their usual model of “agents playing a game” even though the question constrained the “agents” to optimal play.)
I notice this post uses a lot of phrases like “it actually works” and “try it yourself” when talking about the twins example. Unless there’s been a recent breakthrough in mind uploading that I haven’t heard about, this wording implies empirical confirmation that I’m pretty confident you don’t have (and can’t get).
If you were forced to express your hypothetical scenarios in computer source code, instead of informal English descriptions, I think it would probably be pretty easy to run some empirical tests and see which strategies actually get better outcomes. But I don’t know, and I suspect you don’t know, how to “faithfully” represent any of these examples as source code. This leaves me suspicious that perhaps all the interesting results are just confusions, rather than facts about the universe.

Dweomite 31 Aug 2021 23:18 UTC
LW: 13 AF: 3
AF
on: Can you control the past?
I agree that figuring out what you “should have” precommitted can be fraught.
One possible response to that problem is to set aside some time to think about hypotheticals and figure out now what precommitments you would like to make, instead of waiting for those scenarios to actually happen. So the perspective is “actual you, at this exact moment”.
I sometimes suspect you could view MIRI’s decision theories as an example of this strategy.
Alice: Hey, Bob, have you seen this “Newcomb’s problem” thing?
Bob: Fascinating. As we both have unshakable faith in CDT, we can easily agree that two-boxing is correct if you are surprised by this problem, but that you should precommit to one-boxing if you have the opportunity.
Alice: I was thinking—now that we’ve realized this, why not precommit to one-boxing right now? You know, just in case. The premise of the problem is that Omega has some sort of access to our actual decision-making algorithm, so in principle we can precommit just by deciding to precommit.
Bob: That seems unobjectionable, but not very useful in expectation; we’re very unlikely to encounter this exact scenario. It seems like what we really ought to do is make a precommitment for the whole class of problems of which Newcomb’s problem is just one example.
Alice: Hm, that seems tricky to formally define. I’m not sure I can stick to the precommitment unless I understand it rigorously. Maybe if...
--Alice & Bob do a bunch of math, and eventually come up with a decision strategy that looks a lot like MIRI’s decision theory, all without ever questioning that CDT is absolutely philosophically correct?--
Possibly it’s not that simple; I’m not confident that I appreciate all the nuances of MIRI’s reasoning.

Dweomite 3 Sep 2021 21:35 UTC
1 point
in reply to: dxu’s comment on: Can you control the past?
Thanks. After thinking about your explanation for a while, I have made a small update in the direction of FDT. This example makes FDT seem parsimonious to me, because it makes a simpler precommitment.
I almost made a large update in the direction of FDT, but when I imagined explaining the reason for that update I ran into a snag. I imagined someone saying “OK, you’ve decided to precommit to one-boxing. Do you want to precommit to one-boxing when (a) Omega knows about this precommitment, or (b) Omega knows about this precommitment, AND the entangled evidence that Omega relied upon is ‘downstream’ of the precommitment itself? For example, in case (b), you would one-box if Omega read a transcript of this conversation, but not if Omega only read a meeting agenda that described how I planned to persuade you of option (a).”
But when phrased that way, it suddenly seems reasonable to reply: “I’m not sure what Omega would predict that I do if he could only see the meeting agenda. But I am sure that the meeting agenda isn’t going to change based on whether I pick (a) or (b) right now, so my choice can’t possibly alter what Omega puts into the box in that case. Thus, I see no advantage to precommiting to one-boxing in that situation.”
If Omega really did base its prediction just on the agenda (and not on, say, a scan of the source code of every living human), this reply seems correct to me. The story’s only interesting because Omega has god-like predictive abilities.
Which I guess shouldn’t be surprising, because if there were a version of Newcomb’s problem that cleanly split FDT from CDT without invoking extreme abilities on Omega’s part, I would expect that to be the standard version.
I’m left with a vague impression that FDT and CDT mostly disagree about “what rigorous mathematical model should we take this informal story-problem to be describing?” rather than “what strategy wins, given a certain rigorous mathematical model of the game?” CDT thinks you are choosing between $1K and $0, while FDT thinks you are choosing between $1K and $1M. If we could actually run the experiment, even in simulation, then that disagreement seems like it should have a simple empirical resolution; but I don’t think anyone knows how to do that. (Please correct me if I’m wrong!)

Dweomite 7 Sep 2021 23:20 UTC
1 point
in reply to: Peter_de_Blanc’s comment on: Confidence levels inside and outside an argument
Yet another counter-response is that even if the response were true, the false model could be much too high, but it can only be slightly too low, since 1-10^-9 is quite close to 1.
This is contingent upon the scale you have chosen for representing the answer. If you measure chances in log odds, they range from negative infinity to positive infinity, so any answer you come up with could have an unbounded error in either direction. See https://www.lesswrong.com/posts/QGkYCwyC7wTDyt3yT/0-and-1-are-not-probabilities
But I’m uncertain why this would be significant anyway? An asymmetry of maximum error does not necessarily imply an asymmetry of expected error.
But then, that’s before you have looked at the number.
Why does looking at the number matter?
If you have a prior expectation about what the number is likely to be, then you might reason that the true answer is likely to be closer to your prior than farther from it. But that’s essentially the answer Scott already gave in the essay—that any argument is pushing us away from our prior, and our confidence in the argument determines how far it is able to push us.
Your phrasing seems to imply you believe you are giving a different reason for thinking that the expected error is asymmetrical than the one Scott gave. If that is the case, then I don’t understand your implied reasoning.