LessWrong Team
Ruby
We’ve had the choice of tabs up for a month now and the results so far are encouraging, or at least not discouraging. There are many users who are very pleased with the Recommendations, liking among other things that it brings to attention posts that otherwise get lost if you only see what’s new. Clickthrough-rates are higher for people using the Enriched/Recommendations tab, although this is most certainly a selection effect on the kind of user who changes tab at all. Switching some people over automatically is motivated by wanting to get a better signal here before doing something like changing the global default.
The current recommendations still needs more work though. People are much less likely to click on recommendations of posts that they’ve already clicked on, but it’s proving tricky to eliminate such recommendation entirely. Also the algorithm overwhelmingly recommends posts from the last year when we’d like to see it surfacing stuff from further back too. Still, Latest is overwhelming stuff from the Last week, so it’s still an improvement over the counterfactual.
--
From when we started the project, we’ve settled on the “hybrid” list being likely optimal as the default list people look at. Many people want to “keep up with the latest” even if they’re also interested in good posts from all time, so any recommended list of posts that’s the default has to have a heavy latest component. We first tried making two calls to the Recommendations API, one with heavy recency bias, but it was hard to get it consisted, so we switched to just splitting the list between the usual Latest algorithm and new recommendations algorithm.
This has the advantage that is preserves some of the “common knowledge” aspect of the current algorithm where you know which posts other people are seeing too, and an author knows that if they get upvoted, their post will be visible automatically and transparently to many people. As discussed elsethread on this post, we want to have a pure-recommendations tab as well and have been waiting on a bit of coding to make that happen.--
People often have the fear of goodharting on the wrong metric (like clicks) for recommendation algorithms. I think we do need to keep an eye on that, and I want to build more analytics tools for detecting drift here, and more talking to people. I think as we fix up more basic issues like excluding read content and getting it to even recommend posts from older than a year ago[1], we’ll put more attention on is the trend good.- ^
One guess I have is the algorithm is stuck for dumb “structural” reasons, in that it’s been given recent data which is overwhelmingly of people reading recent content, so when it queries “what’s good?” recent content comes out on top even without explicitly training that into the system.
- ^
See comment.
It’s the plan to have that live, only reason we didn’t deploy it on Thursday was we have to do a small bit of extra work to extend caching (to achieve acceptable performance) to the pure-recommender view. Probably have it up soon.
- 11 May 2024 20:10 UTC; 2 points) 's comment on LW Frontpage Experiments! (aka “Take the wheel, Shoggoth!”) by (
The title is strong with this one. I like it.
Over the years the idea of a closed forum for more sensitive discussion has been raised, but never seemed to quite make sense. Significant issues included:
- It seems really hard or impossible to make it secure from nation state attacks
- It seems that members would likely leak stuff (even if it’s via their own devices not being adequately secure or what)
I’m thinking you can get some degree of inconvenience (and therefore delay), but hard to have large shared infrastructure that’s that secure from attack.
I’d be interested in a comparison with the Latest tab.
Typo? Do you mean “click on Recommended”? I think the answer is no, in order to have recommendations for individuals (and everyone), they have browsing data.
1) LessWrong itself doesn’t aim for a super high degree of infosec. I don’t believe our data is sensitive to warrant large security overhead.
2) I trust Recombee with our data about as much as our trust ourselves to not have a security breach. Maybe actually I could imagine LessWrong being of more interest to someone or some group and getting attacked.
It might help to understand what your specific privacy concerns are.
Hard to answer without knowing your background. I might try online courses or ask Chat-GPT here for advice.
Curated. It’s a funny thing how fiction can sharpen our predictions, at least fiction that’s aiming to be at least plausible in some world model. Perhaps it’s the exercise of playing our models forwards in detail rather than isolated abstracted predictions. This is a good example. Even if it seems implausible, noting why is interesting. Curating, and I hope to see more of these built on differing assumptions and reaching different places. Cheers.
Curated. Beyond the object level arguments for how to do plots here that are pretty interesting, I like this post for the periodic reminder/extra evidence that relatively “minor” details in how information is presented can nudge/bias interpretation and understanding.
I think the claims around bordering lines become strongly true if there were established convention, and more weakly so the way currently are. Obviously one ought to be conscious in reading and creating graphs for whether 0 is included.
I’d be pretty interested in the non-cartoonish version, also from people who are more competent and savvy.
For balanced feedback, I enjoyed the choice of diction, and particularly those two words.
Trivia: in racetracks, a “chicane” is a random “unnecessary” kink or twist inserted to make it more complicated (and more challenging/fun).
My understanding is commitment is you say that won’t swerve first in a game of chicken. Pre-commitment is throwing your steering wheel out the window so that there’s no way that you could swerve even if you changed your mind.
Sparsity seems like maybe a relevant keyword.
I feel like marring the reputation of a person in response to wrongdoing has a very important basic purpose for warning other people about interacting with the wrongdoer, i.e. Sarah Smith is dishonest, so don’t trust things she says to be true. This is valuable in worlds where everyone is already a fixed truth-teller/liar and everybody has fixed values.
I like the content/concept here but feel “curse of doom” doesn’t communicate the idea very well. This does seem like effectively a curse of dimensionality though? (Perhaps that’s what inspired this name). Not sure of “Pareto Best of the Curse of Dimensionality” is the right name, but I think it gets at the idea better than generic “doom”.
Curated. This post feels to me like a kind of a survey of the mental skills and properties people do/don’t have for effectiveness, of which I don’t recall any other examples right now, and so is quite interesting. I think it’s both interesting from allowing someone to ask themselves if they’re weak on any of these, but also helpful in modeling others and answering questions of the sort “why don’t people just X?”. For all that we spend a tonne of time interacting with people, people’s internal mental lives are private, and so much like shower habits (I’m told) vary a lot more than externally observable behaviors.
I would like to see the “scope sensitivity” piece fleshed out more. I can see how it applies to eliminating annoyances that take 10 minutes every day and add up, but I don’t think that’s at the heart of rationality. I’d be curious how much mileage someone gets from just reflection on their own mind, and how much that can be done without invoking numeracy.
It does, quite a bit! Definitely speeds me up somewhere between 20% and 100% depending on task. And I think it’s a bigger deal for those now working on code and who are newer to it.
This is basically what we do, capped by our team capacity. For most of the last ~2 years, we had ~4 people working full-time on LessWrong plus shared stuff we get from EA Forum team. Since the last few months, we reallocated people from elsewhere in the org and are at ~6 people, though several are newer to working on code. So pretty small startup. Dialogues has been the big focus of late (plus behind the scenes performance optimizations and code infrastructure).
All that to say, we could do more with more money and people. If you know skilled developers willing to live in the Berkeley area, please let us know!
As noted in an update on LW Frontpage Experiments! (aka “Take the wheel, Shoggoth!”), yesterday we started an AB test on some users automatically being switched over to the Enriched [with recommendations] Latest Posts feed.
The first ~18 hours worth of data does seem like a real uptick in clickthrough-rate, though some of that could be novelty.
(examining members of the test (n=921) and control groups (n~=3000) for the last month, the test group seemed to have a slightly (~7%) lower clickthrough-rate baseline, I haven’t investigated this)
However the specific posts that people are clicking on don’t feel on the whole like the ones I was most hoping the recommendations algorithm would suggest (and get clicked on). It feels kinda like there’s a selection towards clickbaity or must-read news (not completely, just not as much as I like).
If I look over items recommended by Shoggoth that are older (50% are from last month, 50% older than that), they feel better but seem to get fewer clicks.
A to-do item is to look at voting behavior relative to clicking behavior. Having clicked on these items, do people upvote them as much as others?
I’m also wanting to experiment with just applying a recency penalty if it seems that older content suggested by the algorithm is more “wholesome”, though I’d like to get some data from the current config before changing it.