I guess I was reacting to your suggestion later in the article that we should try to remove the proxy and instead just look right at the underlying “page quality” characteristic (or perhaps I misinterpreted that?) My objection is that, yes, one can (implicitly or explicitly) define an underlying quality characteristic of pages, and no, Google is not perfect at estimating that value, but however you go about estimating that parameter, the process will always involve making some observations about the page in question (e.g. local hyperlink structure) and then doing inference. Although the underlying characteristic might be well-defined, there is no way, even in principle (even given unlimited data and computing power far beyond that of a human), to just “look right at the underlying characteristic”—I don’t even think that phrase is meaningful.
If humans were given the same task then they would also follow the evidence-inference procedure, albeit with more sophisticated forms of inference than the Google programmers can work out how to program into their computers.
So I think that “get rid of the proxy” really means “gather more and different types of evidence and improve the page quality-inference procedure”.
I think at this point we’re hitting on the question of whether truly correct definitions exist. I believe Eliezer had some articles in the past about thingspace and ‘carving reality at its joints’, but, while this assumption underlies my article, I do see your point. In this case you would say that the algorithm has approached the ‘true characteristic’ when humans cannot discern any better than it can.
I guess one example is a guy who wrote a few articles for Hacker News, they got reasonably upvoted, and then he came out and said that he was producing them based on some underlying assumptions about what would do good on HN. Some people were negative and felt manipulated, but not everyone agreed that this was spam. The argument was “if you upvoted it, then you enjoyed it, and what the process that produced the article was shouldn’t matter”.
If you consider that spam, then certainly it’s spam beyond the human capability to detect it. But what you really need there is a process to infer intention, which may well be impossible.
Thanks for the reply :)
I guess I was reacting to your suggestion later in the article that we should try to remove the proxy and instead just look right at the underlying “page quality” characteristic (or perhaps I misinterpreted that?) My objection is that, yes, one can (implicitly or explicitly) define an underlying quality characteristic of pages, and no, Google is not perfect at estimating that value, but however you go about estimating that parameter, the process will always involve making some observations about the page in question (e.g. local hyperlink structure) and then doing inference. Although the underlying characteristic might be well-defined, there is no way, even in principle (even given unlimited data and computing power far beyond that of a human), to just “look right at the underlying characteristic”—I don’t even think that phrase is meaningful.
If humans were given the same task then they would also follow the evidence-inference procedure, albeit with more sophisticated forms of inference than the Google programmers can work out how to program into their computers.
So I think that “get rid of the proxy” really means “gather more and different types of evidence and improve the page quality-inference procedure”.
I think at this point we’re hitting on the question of whether truly correct definitions exist. I believe Eliezer had some articles in the past about thingspace and ‘carving reality at its joints’, but, while this assumption underlies my article, I do see your point. In this case you would say that the algorithm has approached the ‘true characteristic’ when humans cannot discern any better than it can.
I guess one example is a guy who wrote a few articles for Hacker News, they got reasonably upvoted, and then he came out and said that he was producing them based on some underlying assumptions about what would do good on HN. Some people were negative and felt manipulated, but not everyone agreed that this was spam. The argument was “if you upvoted it, then you enjoyed it, and what the process that produced the article was shouldn’t matter”.
If you consider that spam, then certainly it’s spam beyond the human capability to detect it. But what you really need there is a process to infer intention, which may well be impossible.