Markvy

Karma: 21

Markvy 23 Nov 2021 21:37 UTC
5 points
in reply to: antanaclasis’s comment on: The Maker of MIND
https://www.lesswrong.com/posts/sMsvcdxbK2Xqx8EHr/just-another-day-in-utopia

Markvy 22 Jan 2022 4:46 UTC
1 point
on: Prizes for ELK proposals
Clarification request. In the writeup, you discuss the AI Bayes net and the human Bayes net as if there’s some kind of symmetry between them, but it seems to me that there’s at least one big difference.
In the case of the AI, the Bayes net is explicit, in the sense that we could print it out on a sheet of paper and try to study it once training is done, and the main reason we don’t do that is because it’s likely to be too big to make much sense of.
In the case of the human, we have no idea what the Bayes net looks like, because humans don’t have that kind of introspection ability. In fact, there’s not much difference between saying “the human uses a Bayes net” and “the human uses some arbitrary function F, and we worry the AI will figure out F and then use it to lie to us”.
Or am I actually wrong and it’s okay for a “builder” solution to assume we have access to the human Bayes net?

Markvy 22 Jan 2022 5:03 UTC
2 points
in reply to: Stephen Voris’s comment on: Prizes for ELK proposals
Man in the middle has 3 parties: Bob wants to talk to Alice, but we have Eve who wants to eavesdrop.
Here we have just 2 parties: Harry the human wants to talk to Alexa the AI, but is worried that Alexa is a liar.

Markvy 22 Jan 2022 5:46 UTC
1 point
AF
in reply to: shemetz’s comment on: Prizes for ELK proposals
I want to steal the diamond. I don’t care about the chip. I will detach the chip and leave it inside the vault and then I will run away with the diamond.
Or perhaps you say that you attached the chip to the diamond very well, so I can’t just detach it without damaging it. That’s annoying but I came prepared! I have a diamond cutter! I’ll just slice off the part of the diamond that the chip is attached to and then I will steal the rest of the diamond. Good enough for me :)

Markvy 25 Jan 2022 4:49 UTC
1 point
in reply to: Markvy’s comment on: Prizes for ELK proposals
Okay, so if the builder solution can’t access the human Bayes net directly that kills a “cheap trick” I had. But I think the idea behind the trick might still be salvageable. First, some intuition:
If the diamond was replaced with a fake, and owner asks, “is my diamond still safe?” and we’re limited to a “yes” or “no” answer, then we should say “no”. Why? Because that will improve the owner’s world model, and lead them to make better predictions, relative to hearing “yes”. (Not across the board: they will be surprised to see something shiny in the vault, whereas hearing “yes” would have prepared them better for that. But overall accuracy, weighted by how much they CARE about being right about it, should be higher for “no”.)
So: maybe we don’t want to avoid the human simulator. Maybe we want to encourage it and try to harness it to our benefit! But how to make this precise? Roughly speaking, we want our reporter to “quiz” the predictor (“what would happen if we did a chemical test on the diamond to make sure it has carbon?”) and then give the same quiz to its model of the human. The reporter should output whichever answer causes the human model to get the same answers on the reporter’s quiz as the predictor gets.
Okay that’s a bit vague but I hope it’s clear what I’m getting at. If not, I can try to clarify. (Unless the vagueness is in my thoughts rather than in my “writeup”/paragraph.) Possible problem: how on earth do we train in such a way as to incentivize the reporter to develop a good human model? Just because we’re worried it will happen by accident doesn’t mean we know how to do it on purpose! (Though if it turns out we can’t do it on purpose, maybe that means it’s not likely to happen by accident and therefore we don’t need to worry about dishonesty after all??)

Markvy 26 Jun 2022 21:11 UTC
8 points
5
on: Conversation with Eliezer: What do you want the system to do?
This feels like a rather different attitude compared to the “rocket alignment” essay. They’re maybe both compatible but the emphasis seems very different.

Markvy 1 Jul 2022 22:27 UTC
1 point
0
in reply to: Rob Bensinger’s comment on: Conversation with Eliezer: What do you want the system to do?
“Bob isn’t proposing a way to try to get less confused about some fundamental aspect of intelligence”

This might be what I missed. I thought he might be. (E.g., “let’s suppose we have” sounds to me like a brainstorming “mood” than a solution proposal.)

Markvy 16 Dec 2022 3:59 UTC
3 points
1
in reply to: g1’s comment on: Okay, I feel it now
Clear to me

Markvy 7 Mar 2024 14:15 UTC
1 point
0
in reply to: robo’s comment on: The Solution to Sleeping Beauty
Here’s how I think of what the list is. Sleeping Beauty writes a diary entry each day she wakes up. (“Nice weather today. I wonder how the coin landed.”). She would like to add today’s date, but can’t due to amnesia. After the experiment ends, she goes back to annotate each diary entry with what day it was written, and also the coin flip result, which she also now knows.

The experiment is lots of fun, so she signs up for it many times. The Python list corresponds to the dates she wrote in her dairy.

Markvy 6 Apr 2024 14:57 UTC
3 points
3
in reply to: Yair Halberstadt’s comment on: Should you refuse this bet in Technicolor Sleeping Beauty?
I think this is much easier to analyze if you think about your plans before the experiment starts, like on Sunday. In fact, let’s pretend we are going to write down a game plan on Sunday, and we will simply consult that plan wherever we wake up and do what it says. This sidesteps the whole half vs third debate, since both sides agree about how things look better the experiment begins.

Furthermore, let’s say we’re going to participate in this experiment 100 times, just so I don’t have to deal with annoying fractions. Now, consider the following tentative game plan: agree to the bet if and only if the room is red. Let’s see what happens. Out of 100 experiments, 50 will result in just one awakening. In 25 of them you will refuse the bet (costing you zero dollars), and in the other twenty five you will accept, which costs you $7500. So far so good. (Actually pretty bad, since we just lose lots of money.) The other 50 will result in two awakenings. Here, we don’t need to worry about probabilities anymore. It is guaranteed we will see a red room once and a blue room once. Thus, we will agree to the bet once, and thus the bet will be in effect. So we will win $200 fifty times, for a total of $10k. Once we subtract what we lost when the coin landed for one awakening, our net profit is $2500, or an average of $25 dollars per experiment.

Markvy 14 Apr 2024 16:37 UTC
1 point
0
in reply to: Ape in the coat’s comment on: The Solution to Sleeping Beauty
I hope it’s okay if I chime in (or butt in). I’ve been vaguely trying to follow along with this series, albeit without trying too hard to think through whether I agree or disagree with the math. This is the first time that what you’ve written has caused to go “what?!?”

First of all, that can’t possibly be right. Second of all, it goes against everything you’ve been saying for the entire series. Or maybe I’m misunderstanding what you meant. Let me try rephrasing.

(One meta note on this whole series that makes it hard for me to follow sometimes: you use abbreviations like “Monday” as shorthand for “a Monday awakening happens” and expect people to mentally keep track that this is definitely not shorthand for “today is Monday” … I can barely keep track of whether heads means one awakening or two… maybe should have labeled the two sides of the coin ONE and TWO instead is heads and tails)

Suppose someone who has never heard of the experiment happens to call sleeping beauty on her cell phone during the experiment and ask her “hey, my watch died and now I don’t know what day it is; could you tell me whether today is Monday or Tuesday?” (This is probably a breach of protocol and they should have confiscated her phone until the end, but let’s ignore that.).

Are you saying that she has no good way to reason mathematically about that question? Suppose they told her “I’ll pay you a hundred bucks if it turns out you’re right, and it costs you nothing to be wrong, please just give me your best guess”. Are you saying there’s no way for her to make a good guess? If you’re not saying that, then since probabilities are more basic than utilities, shouldn’t she also have a credence?

In fact, let’s try a somewhat ad-hoc and mostly unprincipled way to formalize this. Let’s say there’s a one percent chance per day that her friend forgets what day it is and decides to call her to ask. (One percent sounds like a lot but her friend is pretty weird) Then there’s a 2% chance of it happening if there are two awakenings, and one percent if there’s only one awakening. If there are two awakenings then Monday and Tuesday are equally likely; if there’s only one awakening then it’s definitely Monday. Thus, given that her friend is on the phone, today is more likely to be Monday than Tuesday.

Okay, maybe that’s cheating… I sneaked in a Rare Event. Suppose we make it more common? Suppose her friend forgets what day it is 10% off the time. The logic still goes through: given that her friend is calling, today is more likely to be Monday than Tuesday.

Okay, 10% is still too rare. Let’s try 100%. This seems a bit confusing now. From her friends perspective, Monday is just as good as Tuesday for coming down with amnesia. But from sleeping beauty’s perspective, GIVEN THAT the experiment is not over yet, today is more likely to be Monday than Tuesday. This is true even though she might be woken up both days.

Or is everything I just wrote nonsensical?

Markvy 15 Apr 2024 0:29 UTC
2 points
0
in reply to: Ape in the coat’s comment on: The Solution to Sleeping Beauty
I tried to formalize the three cases you list in the previous comment. The first one was indeed easy. The second one looks “obvious” from symmetry considerations but actually formalizing seems harder than expected. I don’t know how to do it. I don’t yet see why the second should be possible while the third is impossible.

Markvy 16 Apr 2024 18:41 UTC
2 points
0
in reply to: Ape in the coat’s comment on: The Solution to Sleeping Beauty
This makes me uncomfortable. From the perspective of sleeping beauty, who just woke up, the statement “today is Monday” is either true or false (she just doesn’t know which one). Yet you claim she can’t meaningfully assign it a probability. This feels wrong, and yet, if I try to claim that the probability is, say, ²⁄₃, then you will ask me “in what sample space?” and I don’t know the answer.

What seems clear is that the sample space is not the usual sleeping beauty sample space; it has to run metaphorically “skew” to it somehow.

If the question were “did the coin land on heads” then it’s clear that this is question is of the form “what world am I in?”. Namely, “am I in a world where the coin landed on heads, or not?”

Likewise if we ask “does a Tuesday awakening happen?”… that maps easily to question about the coin, so it’s safe.

But there should be a way to ask about today as well, I think. Let’s try something naive first and see where it breaks. P(today is Monday | heads) = 100% is fine. (Or is that tails? I keep forgetting.) P(today is Monday | tails) = 50% is fine too. (Or maybe it’s not? Maybe this is where I’m going working? Needs a bit of work but I suspect I could formalize that one if I had to.) But if those are both fine, we should be able to combine them, like so: heads and tails are mutually exclusive and one of them must happen, so: P(today is Monday) = P(heads) • P(today is Monday | heads) + P(tails) • P(today is Monday | tails) = 0.5 + .25 = 0.75 Okay, I was expecting to get ²⁄₃ here. Odd. More to the point, this felt like cheating and I can’t put my finger on why. maybe need to think more later

Markvy 18 Apr 2024 15:53 UTC
2 points
0
in reply to: Ape in the coat’s comment on: The Solution to Sleeping Beauty
Ah, so I’ve reinvented the Lewis model. And I suppose that means I’ve inherited its problem where being told that today is Monday makes me think the coin is most likely heads. Oops. And I was just about to claim that there are no contradictions. Sigh.

Okay, I’m starting to understand your claim. To assign a number to P(today is Monday) we basically have two choices. We could just Make Stuff Up and say that it’s 53% or whatever. Or we could at least attempt to do Actual Math. And if our attempt at actual math is coherent enough, then there’s an implicit probability model lurking there, which we can then try to reverse engineer, similar to how you found the Lewis model lurking just beneath the surface of my attempt at math. And once the model is in hand, we can start deriving consequences from it, and Io and behold, before long we have a contradiction, like the Lewis model claiming we can predict the result of a coin flip that hasn’t even happened yet just because we know today is Monday.

And I see now why I personally find the Lewis model so tempting… I was trying to find “small” perturbations of the experiment where “today is Monday” clearly has a well defined probability. But I kept trying use Rare Events to do it, and these change the problem even if the Rare Event is not Observed. (Like, “supposing that my house gets hit by a tornado tomorrow, what is the probability that today is Monday” is fine. Come to think of it, that doesn’t follow Lewis model. Whatever, it’s still fine.)

As for why I find this uncomfortable: I knew that not any string of English words gets a probability, but I was naïve enough to think that all statements that are either true or false get one. And in particular I was hoping they this sequence of posts which kept saying “don’t worry about anthropics, just be careful with the basics and you’ll get the right answer” would show how to answer all possible variations of these “sleep study” questions… instead it turns out that it answers half the questions (the half that ask about the coin) while the other half is shown to be hopeless… and the reason why it’s hopeless really does seem to have an anthropics flavor to it.

Markvy 23 Apr 2024 14:04 UTC
2 points
0
in reply to: Ape in the coat’s comment on: The Solution to Sleeping Beauty
Re: no coherent “stable” truth value: indeed. But still… if she wonders out loud “what day is it?” at the very moment she says that, it has an answer. An experimenter who overhears her knows the answer. It seems to me that you “resolve” this tension is that the two of them are technically asking a different question, even though they are using the same words. But still… how surprised should she be if she were to learn that today is Monday? It seems that taking your stance to its conclusion, the answer would be “zero surprise: she knew for sure she would wake up on Monday so no need to be surprised it happened”

And even if she were to learn that the coin landed tails, so she knows that this is just one of a total of two awakenings, she should have zero surprise upon learning the day of the week, since she now knows both awakenings must happen. Which seems to violate conservation of expected evidence, except you already said that the there’s no coherent probabilities here for that particular question, so that’s fine too.

This makes sense, but I’m not used to it. For instance, I’m used to these questions having the same answer:
1. P(today is Monday)?
2. P(today is Monday | the sleep lab gets hit by a tornado)
Yet here, the second question is fine (assuming tornadoes are rare enough that we can ignore the chance of two on consecutive days) while the first makes no sense because we can’t even define “today”

It makes sense but it’s very disorienting, like incompleteness theorem level of disorientation or even more

Markvy 1 May 2024 23:15 UTC
2 points
0
in reply to: Ape in the coat’s comment on: The Solution to Sleeping Beauty
Thanks :) the recalibration may take a while… my intuition is still fighting ;)