Which would be a good thing as nominally they claim to let everyone opt out of scraping already by using robots.txt and other methods, and so the canary shouldn’t do anything there that people couldn’t already do.
One notable difference is that sites that allow user-submitted content (e.g. this one) could be exempted by the user, whereas robots.txt et al require the server admin to intervene. (But I agree that this would be a feature and not a bug.)
If they were to exclude all documents with the canary, everyone would include the canary to avoid being scraped.
Which would be a good thing as nominally they claim to let everyone opt out of scraping already by using
robots.txtand other methods, and so the canary shouldn’t do anything there that people couldn’t already do.One notable difference is that sites that allow user-submitted content (e.g. this one) could be exempted by the user, whereas
robots.txtet al require the server admin to intervene. (But I agree that this would be a feature and not a bug.)