anon85

Karma: 18

anon85 1 Jun 2015 17:52 UTC
2 points
on: Approximating Solomonoff Induction
People here seem to really like Solomonoff induction, but I don’t think it’s all that relevant to learning in practice due to computational complexity.

Solomonoff induction is not computable. Trying to “approximate” it, by coming up with hypotheses similar to it, is probably also not computable.

If you replace Solomonoff induction with induction over programs that halt quickly or induction over circuits, it becomes computable, but it is still NP-hard. Again, approximating this is probably also NP-hard, depending on your definition of approximation.

Next, if you replace boolean circuits with neural nets, it is still hard to find the best neural net to fit the data. MCMC and gradient descent only find local optima. I mean, the fact that neural nets didn’t give us strong AI back in the 70s demonstrates that they are not doing anything close to Solomonoff induction.

It’s not even clear that a learning program must approximate Bayesian inference. There are things like PAC learning that don’t do that at all.

anon85 2 Jun 2015 9:03 UTC
0 points
in reply to: Houshalter’s comment on: Approximating Solomonoff Induction

But in practice, SGD is extremely good at optimizing NNs, and the local optima issue isn’t a huge problem.

That’s not even true. In practice, it’s the best we’ve got, but it’s still terrible in most interesting settings (or else you could solve NP-hard problems in practice, which you can’t).

As to why we can have decent machine learning and not AGI, I don’t know.

It’s because the neural net algorithms are not even close to finding the optimal neural net in complex situations.

Approximating SI isn’t sufficient for one, you need to act on the models you find.

That’s trivial to do. It’s not the problem here.

Everything approximates Bayesian inference, it’s just a matter of how ideal the approximation is.

This might be true in some sense, but not in a meaningful one. PAC learning, for instance, is fundamentally non-Bayesian. Saying that PAC learning approximates Bayesian inference is the same as saying that Bayesian inference approximates PAC learning. It’s not a very meaningful statement.

People on LW tend to be hard-core Bayesians who have never even heard of PAC learning, which is an entire branch of learning theory. I find it rather strange.

anon85 2 Jun 2015 9:07 UTC
0 points
in reply to: Houshalter’s comment on: Approximating Solomonoff Induction
Tell me, did Eliezer even address PAC learning in his writing? If not, I would say that he’s being over-confident and ignorant in stating that Bayesian probability is all there is and everything else is a mere approximation.

anon85 2 Jun 2015 21:37 UTC
0 points
in reply to: Richard_Kennaway’s comment on: Approximating Solomonoff Induction
I suppose that in some sense, the Bayesian-vs-PAC comparison is the learning analogue of the average-case vs. worst-case comparison (where Bayesian=average case, PAC=worst case). It’s not a perfect analogy, but in both cases a big question is what if you don’t know the distribution/prior?

In the learning version, I think the problem is even more severe—I don’t know if an analogue of de-randomization works anymore, and I don’t think the “P=BPP” conjecture has an analogue.

The learning version has some additional complications, in that Bayesian inference is computationally intractable in many cases.

anon85 3 Jun 2015 3:26 UTC
2 points
in reply to: Wei Dai’s comment on: Approximating Solomonoff Induction
PAC-learning has no concept of prior or even of likelihood, and it allows you to learn regardless. If by “Bayesianism” you mean “learning”, then sure, PAC-learning is a type of Bayesianism. But I don’t see why it’s useful to view it that way (Bayes’s rule is never used, for example).

anon85 3 Jun 2015 3:29 UTC
0 points
in reply to: Manfred’s comment on: Approximating Solomonoff Induction

Here’s how I see it: You’re updating (approximately) over a limited space of hypotheses that might not contain the true hypothesis, and then this idea that the best model in your space can still be approximately correct is expressible both on Bayesian and on frequentiist grounds (the approximate update over models being equivalent to an approximate update over predictions when you expect the universe to be modelable, and also the best model having a good frequency of success over the long run if the real universe is drawn from a sufficiently nice distribution).

The “update” doesn’t use Bayes’s rule; there’s no prior; there’s no concept of belief. Why should we still consider it Bayesian? I mean, if you consider any learning to be an approximation of Bayesian updating, then sure, PAC-learning qualifies. But that begs the question, doesn’t it?

anon85 3 Jun 2015 4:33 UTC
1 point
in reply to: Houshalter’s comment on: Approximating Solomonoff Induction

I disagree. You can find the optimal NN and it still might not be very good. For example, imagine feeding all the pixels of an image into a big NN. No matter how good the optimization, it will do way worse than one which exploits the structure of images. Like convolutional NNs, which have massive regularity and repeat the same pattern many times across the image (an edge detector on one part of an image is the same at another part.)

If you can find the optimal NN, that basically lets you solve circuit minimization, an NP-hard task. This will allow you to find the best computationally-tractable hypothesis for any problem, which is similar to Solomonoff induction for practical purposes. It will certainly be a huge improvement over current NN approaches, and it may indeed lead to AGI. Unfortunately, it’s probably impossible.

It’s really not. Typical reinforcement learning is much more primitive than AIXI. AIXI, as best I understand it, actually simulates every hypothesis forward and picks the series of actions that lead to the best expected reward.

I was only trying to say that if you’re finding the best NN, then simulating them is easy. I agree that this is not the full AIXI. I guess I misunderstood you—I thought you were trying to say that the reason NN doesn’t give us AGI is because they are hard to simulate.

anon85 3 Jun 2015 18:32 UTC
6 points
in reply to: shminux’s comment on: A Proposal for Defeating Moloch in the Prison Industrial Complex
Meh, probably not:

http://stevenpinker.com/files/pinker/files/pinker_comments_on_lead_removal_and_declining_crime.pdf

There are reasons to be skeptical of any claim based on correlations between such widely separated variables as lead exposure (the cause) and crime (the effect). Consuming lead does not instantly turn someone into a criminal in the way that consuming vitamin C cures scurvy. It affects the child’s developing brain, which makes the child duller and more impulsive, which, in some children, and under the right circumstances, leads them to grow up to make short-sighted and risky choices, which, in some children and under the right circumstances, leads them to commit crimes, which, if enough young people act in the same way and at the same time, affects the crime rate. The lead hypothesis correlates the first and last link in this chain, but it would be more convincing if there were evidence about the intervening links. Such correlations should be far stronger than the one they report: presumably most kids with lead are more impulsive, whereas only a minority of impulsive young adults commit crimes. If they are right we should see very strong changes in IQ, school achievement, impulsiveness, childhood aggressiveness, lack of conscientiousness (one of the “Big Five” personality traits) that mirror the trends in lead exposure, with a suitable time delay. Those trends should be much stronger than the time-lagged correlation of lead with crime itself, which is only indirectly related to impulsiveness, an effect that is necessarily diluted by other causes such as policing and incarceration. I am skeptical that such trends exist, though I may not be aware of such studies.

...

Also, the parallelism in curves for lead and time-shifted crime seem too good to be true, since the lead hypothesis assumes that the effects of lead exposure are greatest in childhood. But 23 years after the first lower-lead cohort, only a small fraction of the crime-prone cohort should be lead-free; there are still all those lead-laden young adults who have many years of crime ahead of them. Only gradually should the crime-prone demographic sector be increasingly populated by lead-free kids. The time-shifted curve for crime should be an attenuated, smeared version of the curve for lead, not a perfect copy of it. Also, the effects of age on crime are not sharply peaked, with a spike around the 23rd birthday, and a sharp falloff—it’s a very gentle bulge spread out over the 15-30 age range. So you would not expect such a perfect time-shifted overlap as you might, for example, for first-grade reading performance, where the measurement is so restricted in time.

Finally, the most general reason for skepticism about a causal hypothesis based on epidemiological correlations between a widely separated cause and effect is that across times and places, many things tend to go together. Neighborhoods next to smoggy freeways also tend to be poorer, more poorly policed, more poorly schooled, less stable, more dependent on contraband economies, and so on. It’s all too easy to find spurious correlations in this tangle – which is why so many epidemiological studies of the cause and prevention of disease (this gives you cancer; that prevents it) fail to replicate.

anon85 3 Jun 2015 22:59 UTC
2 points
in reply to: OrphanWilde’s comment on: A Proposal for Defeating Moloch in the Prison Industrial Complex
Suppose you’re Bayesian, and you’re calculating

P(lead causes crime | data) = P(data | lead causes crime) * P(lead causes crime) / P(data).

What Pinker is saying is that P(data | lead causes crime) is not as high as you’d think, because if lead really does cause crime, we should not expect the crime curve to be a time-shifted version of the lead curve. It’s probably still true that P(data | lead causes crime) > P(data), so that you should update in the direction of lead causes crime, but this update should probably be smaller than you thought before reading that paragraph.

anon85 3 Jun 2015 23:30 UTC
0 points
in reply to: Unnamed’s comment on: A Proposal for Defeating Moloch in the Prison Industrial Complex

He refers to this graph, but not any of the research on the various other predictions that you could derive from the hypothesis that lead caused much of the hump in crime over the past 60 years.

Can you name some of these predictions? Can you link to some of the research? What exactly are you referring to?

anon85 4 Jun 2015 1:15 UTC
0 points
in reply to: Unnamed’s comment on: A Proposal for Defeating Moloch in the Prison Industrial Complex
Do you think you could link to that paper?

A Larger vs. smaller cities comparison sounds like it has ample room for confounding factors, no?

The other outcomes attributed to lead sound like they correlate with crime rates, so this isn’t independent evidence for the lead hypothesis.

anon85 4 Jun 2015 16:42 UTC
0 points
in reply to: Richard_Kennaway’s comment on: Approximating Solomonoff Induction

I suppose the Bayesian answer to that is that a probability distribution is a description of one’s knowledge, and that in principle, every state of knowledge, including total ignorance, can be represented as a prior distribution. In practice, one may not know how to do that. Fundamentalist Bayesians say that that is a weakness in our knowledge, while everyone else, from weak Bayesians to Sunday Bayesians, crypto-frequentists, and ardent frequentists, say it’s a weakness of Bayesian reasoning. Not being a statistician, I don’t need to take a view, although I incline against arguments deducing impossibility from ignorance.

I don’t have any strong disagreements there. But consider: if we can learn well even without assuming any distribution or prior, isn’t that worth exploring? The fact that there is an alternative to Bayesianism—one that we can prove works (in some well-defined settings), and isn’t just naive frequentism—is pretty fascinating, isn’t it?

What are you contrasting with learning?

I’m contrasting randomized vs. deterministic algorithms, which Eliezer discussed in your linked article, with Bayesian vs. PAC learning models. The randomized vs. deterministic question shouldn’t really be considered learning, unless you want to call things like primality testing “learning”.

anon85 5 Jun 2015 4:09 UTC
4 points
on: SSC Discussion: No Time Like The Present For AI Safety Work
I think point 1 is very misleading, because while most people agree with it, hypothetically a person might assign 99% chance of humanity blowing itself up before strong AI, and < 1% chance of strong AI before the year 3000. Surely even Scott Alexander will agree that this person may not want to worry about AI right now (unless we get into Pascal’s mugging arguments).

I think most of the strong AI debate comes from people believing in different timelines for it. People who think strong AI is not a problem think we are very far from it (at least conceptually, but probably also in terms of time). People who worry about AI are usually pretty confident that strong AI will happen this century.

anon85 6 Jun 2015 1:45 UTC
6 points
in reply to: Houshalter’s comment on: SSC Discussion: No Time Like The Present For AI Safety Work
Yeah, you’re probably right. I was probably just biased because the timeline is my main source of disagreement with AI danger folks.

anon85 7 Jun 2015 22:14 UTC
1 point
in reply to: pianoforte611’s comment on: Taking Effective Altruism Seriously

The first is: more wasteful economically. This seems pretty robust, investments in sub-Saharan Africa have historically generated much less wealth than investments in other countries. Moreover wealth continues to grow via reinvestment.

It’s not clear what you mean by this. Do you mean investments in Africa have generated less wealth for the investor? That might be true, but it doesn’t mean they have generated less wealth overall. How would you measure this?

he second is: more wasteful ethically. This is harder to defend, but I think it is a reasonable conclusion though 90% confidence is a bit silly. While more wealth does result in decreased marginal returns on utility, it also results in faster growth. It’s harder to say which effect dominates. Giving to sub-Saharans is a tradeoff between long term growth in wealth and short term utils. As people get more wealthy, they give more (in absolute terms) to charity. Therefore on the margin is better to increase the amount of wealth in the world (which will increase the amount that people give).

I believe the price of saving a QALY has been increasing much faster than the growth of capital. (Does anyone have a source?) This means it is most effective to donate money now.

On a meta level, arguments against donating now are probably partly motivated by wishful thinking by people who don’t feel like donating money, and should be scrutinized heavily.

anon85 8 Jun 2015 12:11 UTC
1 point
in reply to: pianoforte611’s comment on: Taking Effective Altruism Seriously

The rate of return on the stock market is around 10%

You didn’t adjust for inflation; it’s actually around 6 or 7%.

This is much faster than the rate of growth of sub-Saharan economies.

Depends on the country:

http://en.wikipedia.org/wiki/Gross_domestic_product#/media/File:Gdp_real_growth_rate_2007_CIA_Factbook.PNG

Actually foreign aid might have a negative rate of return since most of the transfers are consumed rather than reinvested. Which isn’t a problem per say—eventually you have to convert capital into QALYs even if that means you stop growing it (if you are an effective altruist). The question is how much, and when?

Yes, I agree. This is what I was getting at.

Robin Hanson did, and there has been some back and forth there which I highly recommend (so as not to retread over old arguments).

Thanks for the link! I will read through it.

(Edit: I read through it. It didn’t say anything I didn’t already know. In particular, it never argues that investing now to donate later is good in practice; it only argues this under the assumption that if QALY/dollar remains constant. This is obvious, though.)

Even if QALYs per dollar decrease exponentially and faster than the growth of capital (which you’ve asserted without argument—I simply think that no one knows)

That seems to me to be almost certainly true (e.g. malnutrition and disease have decreased a lot over the last 50 years, and without them there are less ways to buy cheap QALYs). However, you’re right that I didn’t actually research this.

there is still the issue of whether investment followed by donation (to high marginal QALY causes), is more effective than direct donation.

Huh? If we’re assuming QALY/dollar decreases faster than your dollars increase, then doesn’t it follow that you should buy QALYs now? I don’t understand your point here.

anon85 8 Jun 2015 16:04 UTC
0 points
in reply to: pianoforte611’s comment on: Taking Effective Altruism Seriously

You cannot cherry pick a single year (a pretty non-representative year given the recession) in which the growth of a few sub-Saharan African countries was faster than the average growth of the stock market.

I didn’t cherry-pick anything; that was the first google image result, so it’s the one I linked to. I didn’t think it’s any different from a typical year. Is it? If so, what was special that year? If you’re concerned that the US was in a recession, you can simply compare sub-Saharan Africa to the typical 6-7% stock market returns instead of comparing to the GDP growth of the US in that year.

So what you are arguing is that the most efficient use of money to gain QALYs (not the average) has decreased exponentially and faster than the growth of capital over time?

Yes!

That seems very difficult to argue while taking into account increased knowledge and technology. But I have no idea how to calculate that.

I don’t claim to be able to exactly calculate it, but some quick back-of-the-envelope calculations suggest that it is true. For example, consider this from slatestarcodex:

http://slatestarcodex.com/2013/04/05/investment-and-inefficient-charity/

[...] in the 1960s, the most cost-effective charity was childhood vaccinations, but now so many people have donated to this cause that 80% of children are vaccinated and the remainder are unreachable for really good reasons (like they’re in violent tribal areas of Afghanistan or something) and not just because no one wants to pay for them. In the 1960s, iodizing salt might have been the highest-utility intervention, but now most of the low-iodine areas have been identified and corrected. While there is still much to be done, we have run out of interventions quite as easy and cost-effective as those. And one day, God willing, we will end malaria and maybe we will never see a charity as effective as the Against Malaria Fund again.

While I don’t have the exact numbers, this seems to me to be self-evidently true if you know any history (to the point where I would say it is the onus of the “invest instead of donating” camp to prove this false).

anon85 14 Jun 2015 8:03 UTC
−1 points
on: Supporting Effective Altruism through spreading rationality
If by “spreading rationality” you mean spreading LW material and ideas, then a potential problem is that it causes many people to donate their money to AI friendliness research instead of to malaria nets. Although these people consider this to be “effective altruism”, as an AI skeptic it’s not clear to me that this is significantly more effective than, say, donating money to cancer research (as non-EA people often do).

anon85 15 Jun 2015 4:43 UTC
1 point
in reply to: Gleb_Tsipursky’s comment on: Supporting Effective Altruism through spreading rationality

My goal is convincing people to have more clear and rational, evidence-thinking, as informed by LW materials.

Is there an objective measure by which LW materials inform more “clear and rational” thought? Can you define “clear and rational”? Or actually, to use LW terminology, can you taboo “clear” and “rational” and restate your point?

Regardless, as Brian Tomasik points out, helping people be more rational contributes to improving the world, and thus the ultimate goal of the EA movement.

But does it contribute to improving the world in an effective way?

anon85 15 Jun 2015 16:28 UTC
1 point
in reply to: jacob_cannell’s comment on: Approximating Solomonoff Induction

Modern SGD mechanisms are powerful global optimizers.

They are heuristic optimizers that have no guarantees of finding a global optimum. It’s strange to call them “powerful global optimizers”.

Solomonoff induction is completely worthless—intractable—so you absolutely don’t want to do that anyway.

I believe that was my point.