gwern comments on A Better Web is Coming

gwern 22 Aug 2021 2:22 UTC
26 points
0

I find the context tags on LessWrong useful at times.

I’ve found them useless in every iteration. They are extremely inconsistently applied, and those authors who do bother to make an effort often leave them at uselessly large levels of granularity like math or statistics or AI. (Gee, thanks.)

A decent tag or category system needs to be reasonably comprehensive—if not, why even bother, just go straight to Google search—and regularly refined to shrink member count. If there are 1000 members of a category, then it is long past time to break that down into a few sub-categories. When I look at websites whose tags or categories are useful, like Wikipedia or Danbooru or classic folksonomies like Del.icious (RIP), the tagging itself is a major focus of community efforts and it doesn’t require the cooperation of the author to update things.

Any WP editor can refine a category into subcategories or add a category to any article, and there are tools to assist by brute force to clean it all up. It’s a huge time-sink of human effort, like everything on WP, but it works, dammit! You can meaningfully browse WP categories and have a reasonable expectation of comprehensiveness, and they do a good job of gradually encoding the structure of all the crosscutting domains. I use them fairly often.

I use tags on gwern.net for pages, and I try to systematically add new tags to all relevant pages and refactor them down into reasonably sharp tags. I think they wind up being reasonably useful, but there’s also not enough pages on gwern.net for tags to shine. (When you can simply list all the good pages on the index in a few screens by topic, you’ve covered the Pareto value of tags.)

What I have been considering is extending tags to external links/documents. I have something like 20k external links + hosted documents, and the sheer volume means that tags are potentially highly useful for them. (A link like “Open-Ended Learning Leads to Generally Capable Agents” would benefit a lot from a set of tags like ′blessings-of-scale multi-agent DeepMind deep-reinforcement-learning′ which offer an entrance point to the scores of prior art links to contextualize it.) The problem is how to be systematic? My thinking is that this is a case where I can employ the OA GPT-3 API’s “classification” endpoint to do the work for me: I don’t scale well, but it does. I can initialize the link tags from my existing directory hierarchy, finetune a GPT-3 model to infer “tag” from “annotation” (GPT-3 is smart enough that it’ll understand this very well), use that to rank possible tags for all links, accept/reject by hand, and bootstrap. Then adding new tags can be done by re-classifying all links. A lot of details to get right, but if it works, it’ll be almost as good as if I’d been building up a tag folksonomy on my links from the getgo.
- habryka 23 Aug 2021 9:02 UTC
  13 points
  0
  Parent
  The current use-case for which tags work is for content discovery, not really for comprehensive tagging. There are some nice thing that comprehensive tagging gets you, but it’s just a really big pile of work, even if you build lots of custom tools for it.
  The flow that I think currently works pretty well is:
  1. User is interested in a certain topic, and hasn’t read 90% of what already exists on LessWrong
  2. User searches in the search bar or goes to the concepts page
  3. User clicks on a tag
  4. The top-relevance rated posts on that tag are indeed pretty good, and the user finds some content that helps them get oriented about the topic. The important thing here is mostly that the best and most relevant content for any category gets tagged, not that all content in that category gets tagged.
  We apply a number of core tags comprehensively to all posts (like the AI one you mentioned), because it allows people to do selective filtering for their frontpage feeds, but those are necessarily high-level, because for the granular ones there isn’t really enough content to justify a filter adjustment.
  You also still get decent folksonomy benefits of being able to show a user the rough ontology of the site, even without having comprehensive tagging.
  Overall, I guess… I don’t really get why for the use-case of LessWrong, it’s necessary for tagging to be comprehensive, in order for it to be useful. From my perspective most value add is pretty incremental, and the key thing is that the best stuff gets tagged, and that each tag has some posts that can give people a good intro.
- DirectedEvolution 22 Aug 2021 3:31 UTC
  2 points
  0
  Parent
  I’m sure you know that LW tags are broken down into sub-categories. We seem to lack the energy to apply those sub-categories. This post is tagged “world-optimization,” but might be best if it was sub-tagged with “mechanism design” and “coordination/cooperation” at the least. It takes some time to look those tags up, consider which is a good fit, and apply it, and there’s no reward for doing so. There’s an equilibrium issue as well. If few people are applying specific tags, then the tags remain underused for navigating the site, as well as unknown, thus discouraging their further adoption.
  That said, signs of any kind, including these tags, can give somebody the idea to embark on a reading expedition that they might not otherwise have conceived. You’re one of our shining lights, so perhaps you are normally driven to engage in thoughtfully directed reading projects. I suspect that many just sorta consume whatever happens to be at the top of the posts list, or whatever strikes their fancy in a sub-link. The idea of reviewing the collected LW writings on blackmail may never occur to them, unless they navigate to it even with our sub-par system of tags. They function as a “suggested reading” feature, and that has utility even if it’s not nearly comprehensive or specific enough to be of use to an expert reader.
  Hope you do execute some of these optimizations on your site, and let us know about your experience putting them into place.
  - gwern 22 Aug 2021 17:06 UTC
    4 points
    0
    Parent
    Yes, the community equilibrium is entirely different. On WP editors have little compunction about editing categories; here, I know vaguely that tags can be added (although I didn’t know that you could refactor them or remove them), but I wouldn’t do so because there’s no particular norm to do so. Who am I go to about editing matto’s post’s tags to break down world-optimization into something more specific?
    
    Tags could be useful, but they aren’t now, and so they stay being not useful, and it’s unrealistic to expect anyone to single-handedly fix that when there’s like 10 posts a day and approaching 12 years of backlog.
    
    A GPT-3 proof-of-concept will certainly be interesting. If it works, it could bootstrap useful tags on larger corpuses like LW. (It might be expensive, but it’s only money, and a lot cheaper than the expert LWer time it’d take; and of course, if GPT-3 works well, then perhaps a rival model like GPT-J or T5 or Jurassic would be worth finetuning to cut costs.)