Ian Televan

Karma: 43

Ian Televan 2 Aug 2021 21:38 UTC
1 point
0
in reply to: Abe Dillon’s comment on: There’s No Fire Alarm for Artificial General Intelligence
from random import *
runs = 100000
S = runs
for _ in range(runs):
while(randint(1,20) != 1):
S += 1
print(S/runs)
>>> 20.05751

Ian Televan 31 Jul 2021 19:56 UTC
9 points
0
on: How to teach things well
In my experience teachers tend to only give examples of typical members of a category. I wish they’d also give examples along the category border, both positive and negative. Something like: “this seems to have nothing to do with quadratic equations, but it actually does, this is why” and “this problem looks like it can be solved using quadratic equations but this is misleading because XYZ”. This is obvious in subjects like geography, (when you want to describe where China is, don’t give a bunch of points around Beijing as examples, but instead draw the border and maybe tell about ongoing territorial conflicts) but for some reason less obvious in concept-heavy subjects like mathematics.
Another point on my wishlist: create sufficient room for ambition. Give bonus points for optional but hard exercises. Tell about some problems that even world’s top experts don’t know how to solve.

Ian Televan 30 Jul 2021 19:24 UTC
1 point
0
on: How to learn from conversations
Thank you very much for posting this! I’ve been thinking about this topic for a while now and feel like this is criminally overlooked. There are so many resources on how to teach other people effectively, but virtually none on how to learn things effectively from other people (not just from textbooks). Yet we are often surrounded by people who know something that we currently don’t and who might not know much about teaching or how to explain things well. Knowing what questions to ask and how to ask them makes these people into great teachers—while you reap the benefits! - this feels like a superpower.

Ian Televan 15 Jul 2021 20:22 UTC
4 points
0
AF
in reply to: abramdemski’s comment on: Decision Theory
While I agree that the algorithm might output 5, I don’t share the intuition that it’s something that wasn’t ‘supposed’ to happen, so I’m not sure what problem it was meant to demonstrate. I thought of a few ways to interpret it, but I’m not sure which one, if any, was the intended interpretation:
a) The algorithm is defined to compute argmax, but it doesn’t output argmax because of false antecedents.
- but I would say that it’s not actually defined to compute argmax, therefore the fact that it doesn’t output argmax is not a problem.
b) Regardless of the output, the algorithm uses reasoning from false antecedents, which seems nonsensical from the perspective of someone who uses intuitive conditionals, which impedes its reasoning.
- it may indeed seem nonsensical, but if ‘seeming nonsensical’ doesn’t actually impede its ability to select actions wich highest utility (when it’s actually defined to compute argmax), then I would say that it’s also not a problem. Furthermore, wouldn’t MUDT be perfectly satisfied with the tuple $p_{1} : (x = 0, y = 10, A () = 10, U () = 10)$ ? It also uses ‘nonsensical’ reasoning ‘A()=5 ⇒ U()=0’ but still outputs action with highest utility.
c) Even when the use of false antecedents doesn’t impede its reasoning, the way it arrives at its conclusions is counterintuitive to humans, which means that we’re more likely to make a catastrophic mistake when reasoning about how the agent reasons.
- Maybe? I don’t have access to other people’s intuitions, but when I read the example, I didn’t have any intuitive feeling of what the algorithm would do, so instead I just calculated all assignments $(x, y) \in {0, 5, 10}^{2}$ , eliminated all inconsistent ones and proceeded from there. And this issue wouldn’t be unique to false antecedents, there are other perfectly valid pieces of logic that might nonetheless seem counterintuitive to humans, for example the puzzle with islanders and blue eyes.
Yet, we reason informally from false antecedents all the time, EG thinking about what would happen if
When I try to examine my own reasoning, I find that when I do so, I’m just selectively blind to certain details and so don’t notice any problems. For example: suppose the environment calculates “U=10 if action = A; U=0 if action = B” and I, being a utility maximizer, am deciding between actions A and B. Then I might imagine something like “I chose A and got 10 utils”, and “I chose B and got 0 utils”—ergo, I should choose A.
But actually, if I had thought deeper about the second case, I would also think “hm, because I’m determined to choose the action with highest reward I would not choose B. And yet I chose B. This is logically impossible! OH NO THIS TIMELINE IS INCONSISTENT!”—so I couldn’t actually coherently reason about what could happen if I chose B. And yet, I would still be left with the only consistent timeline where I choose A, which I would promptly follow, and get my maximum of 10 utils.
The problem is also “solved” if the agent thinks only about the environment, ignoring its knowledge about its own source code.
The idea with reversing the outputs and taking the assignment that is valid for both versions of the algorithm seemed to me to be closer to the notion “but what would actually happen if you actually acted differently”, i.e. avoiding seemingly nonsensical reasoning while preserving self-reflection. But I’m not sure when, if ever, this principle can be generalized.

Ian Televan 8 Jul 2021 23:34 UTC
LW: 3 AF: 2
0
AF
on: Decision Theory
I don’t quite follow why ⁵⁄₁₀ example presents a problem.
Conditionals with false antecedents seem nonsensical from the perspective of natural language, but why is this a problem for the formal agent? Since the algorithm as presented doesn’t actually try to maximize utility, everything seems to be alright. In particular, there are 4 valid assignments: $p_{1} : (x = 0, y = 10, A () = 10, U () = 10)$ , $p_{2} : (x = 5, y = 0, A () = 5, U () = 5)$ , $p_{3} : (x = 5, y = 10, A () = 10, U () = 10)$ , $p_{4} : (x = 10, y = 10, A () = 10, U () = 10)$
The algorithm doesn’t try to select an assignment with largest $U ()$ , but rather just outputs $5$ if there’s a valid assignment with $x > y$ , and $10$ otherwise. Only $p_{2}$ fulfills the condition, so it outputs $5$ . $p_{1}$ and $p_{4}$ also seem nonsensical because of false antecedents but with attached utility $U () = 10$ - would that be a problem too?
For this particular problem, you could get rid of assignments with nonsensical values by also considering an algorithm with reversed outputs and then taking the intersection of valid assignments, since only $(x = 5, y = 10)$ satisfies both algorithms.

Ian Televan 7 May 2021 22:44 UTC
3 points
0
on: A Semitechnical Introductory Dialogue on Solomonoff Induction
Could someone explain why this doesn’t degenerate into an entirely circular concept when we postulate a stronger compiler; or why it doesn’t become entirely dependent on the choice of the compiler?
1. There are many programs that output identical sequences. That’s a waste. Make it so that no two different programs have the same output.
2. There are many sequences that when fed into the compiler don’t result in valid programs. That’s a waste. Make it so that every binary sequence represents a valid program.
Now we have a set of sequences that we’d like to encode: S = { $ε$ , 0, 1, 00, 01, … }, a set of sequences that are interpreted by the compiler as programs: P = { $ε$ , 0, 1, 00, 01, … } and the compiler which is a bijection from P to S. It better not turn out to be the identity function.. And that’s with the best possible compiler. If we postulate a reasonable but much weaker compiler then the programs that encode the sequences become on average longer than the sequences themselves!
The only way out of this that I see is to weight elements of S by their frequencies in our universe and/or by how much we care about them, and then let the compiler be a function that minimizes this frequency-importance score. In fact, this compiler starts looking more and more like an encoder (?!). The difficult part then seems to me to be the choice of the optimal encoder, and not the Solomonoff induction itself.
Edit: Of course, when there’s a 1 to 1 mapping, then selecting the shortest program is trivial. So in a way, if we make the Solomonoff induction trivial then the only thing that’s left is the choice of the compiler. But why isn’t this still a problem with weaker, traditional compilers?

Ian Televan 22 Apr 2021 22:32 UTC
1 point
0
on: Rationality: Appreciating Cognitive Algorithms
I thought of a slightly different exception for the use of “rational”: when we talk about conclusions that someone else would draw from their experiences, which are different from ours. “It’s rational for Truman Burbank to believe that he has a normal life.”
Or if I had an extraordinary experience which I couldn’t communicate with enough fidelity to you, then it might be rational for you not to believe me. Conversely, if you had the experience and tried to tell me, I might answer with “Based only on the information that I received from you, which is possibly different from what you meant to communicate, it’s rational for me not to believe the conclusion.” There I might want to highlight the issue with fidelity of communication as a possible explanation for the discrepancy (the alternative being, for example, that the conclusion is unwarranted even if the account of the event is true and compete).

Ian Televan 22 Apr 2021 19:02 UTC
6 points
0
on: Double Illusion of Transparency
Richard Feynman once said that if you really understand something in physics you should be able to explain it to your grandmother. I believed him.
Curiously enough, there is a recording of an interview with him where he argues almost exactly the opposite, namely that he can’t explain something in sufficient detail to laypeople because of the long inferential distance.

Ian Televan 9 Apr 2021 23:00 UTC
1 point
0
on: The Allais Paradox
It seems that the mistake that people commit is imagining the the second scenario is a choice between 0.34*24000 = 8160 and 0.33*27000 = 8910. Yes, if that was the case, then you could imagine a utility function that is approximately linear in the region 8160 to 8910, but sufficiently concave in the region 24000 to 27000 s.t. the difference between 8160 and 8910 feels greater than between 24000 and 27000… But that’s not the actual scenario with which we are presented. We don’t actually get to see 8160 or 8910. The slopes of the utility function in the first and second scenarios are identical.
“Oh, these silly economists are back at it again, asserting that my utility function ought to be linear, lest I’m irrational. Ugh, how annoying! I have to explain again, for the n-th time, that my function actually changes the slope in such a way that my intuitions make sense. So there!” ← No, that’s not what they’re saying! If you actually think this through carefully enough, you’ll realize that there is no monotonically increasing utility function, no matter the shape, that justifies 1A > 1B and 2A < 2B simultaneously.

Ian Televan 8 Apr 2021 21:01 UTC
1 point
0
on: Where Recursive Justification Hits Bottom
But is the Occam’s Razor really circular? The hypothesis “there is no pattern” is strictly simpler than “there is this particular pattern”, for any value of ‘this particular’.. Occam’s Razor may expect simplicity in the world, but it is not the simplest strategy itself.
Edit: I’m talking about the hypothesis itself, as a logic sequence of some kind, not that, which the hypothesis asserts. It asserts maxentropy—the most complex world.

Ian Televan 7 Apr 2021 14:29 UTC
2 points
0
in reply to: TAG’s comment on: Excluding the Supernatural
Originally I thought of an exception where the thing that we don’t know was a constructive question. e.g. given more or less complete knowledge or material science, how to we construct a decent bridge? But it’s an obvious limitation, no self-proclaimed reductionist would actually try to apply reductionism in such situation.
It seems to me that you’re describing a reverse scenario: suppose we have an already constructed object, and want to figure out how works—can reductionism still be used? I’d still say yes.
Take an airplane, for example. Knowing relevant laws of physics and looking at just the airplane, you can’t actually say predict whether it’s going to fly to New Your or Chicago. You need to incorporate the pilot into the model. And the pilot is influenced by human psychology, economics, etc. So on one hand you have the airplane as a concrete physical object, and one the other hand you have the role that airplanes of that type play in human society. BUT! By looking at just the physical properties, you can still infer a great deal about how it’s used.
This too applies to money. Physical manifestations are not actually completely arbitrary—they are either valuable in themselves—hides, grain, salt etc. or they have properties which make them suitable as value tokens—relatively durable and difficult to counterfeit either through scarcity of raw materials or difficulty in manufacturing. There is not as much to say about the physical properties of money compared to airplanes, but the difference is quantitative, not qualitative.
So we’re left with questions about human society. How do humans actually use these objects? Well, it’s often impractical to apply reductionism but it’s still possible in principle. We just don’t know enough yet, or it would be computationally intractable, or it would be unethical etc. And of course, a lot has already been learned though application of reductionism to human psychology.

Ian Televan 7 Apr 2021 0:32 UTC
3 points
0
on: A Technical Explanation of Technical Explanation
Something felt off about this example and I think I can put my finger on it now.
My model of the world gives the event with the blue tentacle probability ~0. So when you ask me to imagine it, and I do so, what it feels like to me like I’m coming up with a new model to explain it, which gives a higher probability to that outcome than my current model does. This seems to be the root of the apparent contradiction, it appears that I’m violating the invariant. But I don’t think that that’s what actually happening. Consider this fictional exchange:
EY: Imagine that you have this particular gaussian model. Now suppose that you find yourself in a situation that is 50 SD’s away from the median. How do you explain it?
Me: Well, my hypothesis is that...
EY: Wrong! That scenario is too unlikely, if the model has something to say about, then it must be wrong and irrational.
Me: No! You asked me to suppose this incredibly unlikely scenario, which is exactly what I did. I didn’t conclude “EY is asking me to consider something that’s too unlikely, ah, he’s trying to trick me, therefore I am not going to imagine the scenario on the count that it’s impossible!” because this is an impossible conclusion from inside the model.
I have limited resources, so I just don’t bother pre-computing all details of my model that are too unlikely to matter. But if this scenario actually came up in real life, I would be able to fill in the missing details retroactively. That doesn’t mean that my model assumes more than 100% total probability, because I’m already reserving a bit of probability mass for unknown unknowns. And I needn’t worry about such scenarios now, because they’re too unlikely and there too many similarly unlikely scenarios. I just can’t be meaningfully concerned about them all.

Ian Televan 5 Apr 2021 12:21 UTC
1 point
0
in reply to: TAG’s comment on: Excluding the Supernatural
Care to elaborate? Also, that’s not really an exception, but a boundary—it’s exactly what you would expect if there are finitely many layers of composition i.e. the world is not like an infinite fractal.

Ian Televan 5 Apr 2021 0:20 UTC
1 point
0
in reply to: TAG’s comment on: Excluding the Supernatural
Of course it doesn’t work for problems where the objects in question are already fundamental and cannot be reduces any further. But that’s what I meant in the original post—reductionist frameworks would fail to produce any new insights if we were already at the fundamental level.

Ian Televan 1 Apr 2021 13:15 UTC
1 point
0
on: Excluding the Supernatural
If reductionism was wrong then I would expect reductionist approaches to be ineffective. Every attempt at gaining knowledge using a reductionist framework would fail do discover anything new, except by accident on very rare occasions. Or experiments would fail to replicate because the conservation of energy was routinely violated in unpredictable ways.

Ian Televan 1 Apr 2021 1:31 UTC
2 points
0
on: Belief in the Implied Invisible
Conservation laws or not, you ought to believe in the existence of the photon because you continue having the evidence of its existence—it’s your memory of having fired the photon! Your memory is entangled with the state of the universe, not perfectly, but still, it’s Bayesian evidence. And if your memory got erased, then indeed, you’d better stop believing that the photon exists.

Ian Televan 28 Mar 2021 9:16 UTC
2 points
0
in reply to: Yoav Ravid’s comment on: Dissolving the Question
That seems unlikely. There is already a certain difficulty in showing that illusion of free will is an illusion. “It seems like you have free will, but actually, it doesn’t seem.”—The seeming is self-evident, so what does it mean to say that something actually doesn’t seem if it feels like it seems. As far as I understand it, it’s not like it doesn’t really seem so, but you’re mistaken about it and think that it actually seems so, and then mindfulness meditation clears up that mistake for you and you stop thinking that it seems that you have free will. Instead, you observe that seeming itself just disappears. It stops seeming that you have free will.
So now we come to your suggestion: “It seems(level 2.) like the seeming(lvl 1.) disappears, but actually, it doesn’t seem(lvl 2.) like the seeming(lvl 1.) disappears.”—but once again, the seeming(lvl 2.) is self-evident. So you’d need to come up with some extraordinary circumstances which are associated with more mental clarity to show that that seeming(lvl 2.) also disappears. But this is unlikely, because the concept of free will is already incoherent, so more mental clarity shouldn’t point you towards it.

Ian Televan 22 Mar 2021 18:40 UTC
1 point
0
on: Dissolving the Question
As Sam Harris points out, the illusion of free will is itself an illusion. It doesn’t actually feel like you have free will if you look closely enough. So then why are we mistaken about things when we don’t examine them closely enough? Seems like a too-open-ended question.

Ian Televan 21 Mar 2021 21:45 UTC
2 points
0
in reply to: Ian Televan’s comment on: Beautiful Probability
Update: a) is just wrong and b) is right, but unsatisfying because it doesn’t address the underlying intuition which says that the stopping criterion ought to matter. I’m very glad that I decided to investigate this issue in full detail and run my own simulations instead of just accepting some general principle from either side.
MacKay presents it as a conflict between frequentism vs bayesianism and argues why frequentism is wrong. But I started out with a bayesian model and still felt that motivated stopping would have some influence. I’m going to try to articulate the best argument why the stopping criterion must matter and then explain why it fails.
First of all the scenario doesn’t describe exactly what the stopping criterion was. So I made up one: The (second) researcher treats patients and gets the results one at a time. He has some particular threshold for the probability that the treatment is >60% effective and he is going to stop and report the results the moment the probability reaches the threshold. He derives this probability by calculating a beta distribution for the data and integrating it from 0.6 to 1. (for those who are unfamiliar with the beta distribution, I recommend this excellent video by 3Blue1Brown) In this case the likelihood of seeing the data given underlying probability $x$ is given by beta $f (x) = (\frac{100}{70}) x^{70} (1 - x)^{30}$ , and the probability that treatment is >60% effective is $α := \int_{0.6}^{1} f (x) d x$ .
Now the argument: motivated stopping ensures that we don’t just get 70 successes and 30 failures. We have an additional constraint that after each of the 99 outcomes for treatment the probability is strictly $< α$ and only after the 100th patient it reaches $α$ . Surely then, we must modify $f (x)$ to reflect this constraint. And if the true probability was really >60%, then surely there are many Everett branches where the probability reaches $α$ before we ever get to the 100th patient. If it really took so long, then it must be because it’s actually less likely that the true probability is >60%.
And indeed, the likelihood of seeing 70 successes and 30 failures with such stopping criterion is less than is initially given by $f (x)$ . BUT! The constraint is independent of the probability $x$ ! It is purely about the order in which the outcomes appear. In other words, it changes the constant $(\frac{100}{70})$ , which originally indicated the total number of all different ways to order 70 positive and 30 negative instances. And this constant reduces the likelihood for every probability equally! It doesn’t reduce it more in universes where $x > 0.6$ compared to where $x \leq 0.6$ . This means that the shape of the original distribution stays the same, only the amplitude changes. But because we condition on seeing 70 successes and 30 failures anyway, this means that the area under the curve must be equal to 1. So we have to re-normalize $f (x)$ , and it comes out as $f (x) = (\frac{100}{70}) x^{70} (1 - x)^{30}$ again!
Another way to think about it is that the stopping criterion is not entangled with the actual underlying probability in a given universe. There is zero mutual information between the stopping criterion and $x$ . And yes, if this was not the case, if for example, the researcher had decided that he would also treat one more patient after reaching the threshold $α$ and only publish the results if this patient recovered (but not mention them in the report), then it would absolutely affect the results, because a positive outcome for the patient is more likely in universes where $x > 0.6$ . But then it also wouldn’t be purely about his state of mind, we would have an additional data point.

Ian Televan 18 Mar 2021 12:31 UTC
1 point
0
on: Beautiful Probability
Fixing my predictions now, before going to investigate this issue further (I have Mackay’s book within the hand’s reach and would also like to run some Monte-Carlo simulations to check the results; going to post the resolution later):
a) It seems that we ought to treat the results differently, because the second researcher in effect admits to p-hacking his results. b) But on the other hand, what if we modify the scenario slightly: suppose we get the results from both researchers 1 patient at a time. Surely we ought to update the priors by the same amount each time? And so by the time we get the 100th individual result from each researcher, the priors should be the same, even if we then find out that they had different stopping criteria.
My prediction is that argument a) turns out to be right and argument b) contains some subtle mistake.