My attempt to resurrect the old LW Power Reader is facing an obstacle just before the finish line, due to current LW’s API limitations. So this is a public appeal to the site admins/devs to relax the limit.
Specifically, my old code relied on LW1 allowing it to fetch all comments posted after a given comment ID, but I can’t find anything similar in the current API. I tried reproducing this by using the allRecentComments endpoint in GraphQL, but due to the offset parameter being limited to <2000, I can’t fetch comments older than a few weeks. The Power Reader is part designed to allow someone to catch up on or skim weeks/months worth of LW comments, hence the need for this functionality.
As a side effect of this project, my AI agents produced a documentation of LW’s GraphQL API from LW’s source code. (I was unable to find another API reference for it.) I believe it’s fairly accurate, as the code written based on it seems to work well aside from the comment-loading limit.
If you go to /graphiql there’s a query-editor with integrated documentation, and the API schema is in the github repo here. The offset limit is because database queries sometimes become extremely slow when given large offsets.
We added before and after date options to allRecentComments so you should now be able to get comments with something like:
The resurrected LessWrong Power Reader is now on GitHub. Thought I’d post it here in a lowkey way in case you (or anyone else) want to test it or review its API usage, make sure it’s not doing something undesirable, before I make an announcement post.
the current intro and feature list
A fast, context-first reader for LessWrong and the EA Forum, designed to make high-volume reading and thread navigation feel effortless.
Chronological Reader Core: Shows comments in strict time order with date-based pagination, so you can read without gaps.
Deep Thread Context: Loads missing parents and replies so deep comments still make sense.
Post + Comment Power Actions: Inline controls to expand/load post bodies, load all comments, jump between posts, and navigate thread roots.
Sticky Header Navigation: Keeps post controls and key metadata visible while scrolling long threads.
Keyboard-First Workflow: Hotkeys for post/comment actions, including fallback from hovered comments to parent post actions.
AI Studio Integration: Send posts or comment threads to Google AI Studio (g / Shift+G), with customizable prompt prefix.
Rich Voting & Reactions: Karma + agreement voting, reaction picker, and inline quote reactions with highlighted quoted text.
Smart Prioritization: Sorts by “most important unread content first” and can hide threads that are already fully read.
Personalization: Author preference controls, read tracking, and saved layout/settings in browser storage.
Site-wide Entry Point: Accessible from any forum page via an injected “POWER Reader” header link.
The code doesn’t look like it would cause catastrophic problems. The main risk to end users at the current level of testing is a bug causing important information to be missed. My ability to comment on the risk to a developer is limited however, because I haven’t read the source code of all the development dependencies.
I have visually checked (as a human) the dist/power-reader.user.js file. End users should be relatively safe copying this into their browser plugins, as long as all plugins have no relevant security problems or malicious code. As mentioned before, I’m not too sure of the safety of compiling this file. The file does appear to execute buggy code[1], but I haven’t seen anything security or database-mutation related.
Note that one of your recent commits is large, making it hard to audit and therefore making it difficult to establish the safety of doing development work on your script. That commit looks like a combination of LLM generated code and more traditional programmatically generated code. It may be helpful to make sure that code that looks like it was programmatically generated has a consistent ordering to reduce the size of diffs, and to make sure it is never directly touched by an LLM. LLMs are approaching the capability to write underhanded code, if they are not there already, suggesting that diffs should be small and carefully reviewed. Since LLMs don’t look very strategically competent at the current date, you may be able to have LLMs from one company review the code from another, as long as you can be sure that files don’t contain e.g. non-rendered Unicode information.
E.g. color mixing is done on raw color values, therefore implicitly in something close to either sRGB or RGB with a pure Gamma2.2 transfer. This should technically be done in linear light instead, but it might be fine as is given your non-user-selectable function inputs. See https://www.youtube.com/watch?v=xDLxFGXuPEc timestamp 1 minute 3 seconds. https://www.ericbrasseur.org/gamma.html gives the formulas.
Thanks for taking a look! I’ve removed the large programmatically generated file (src/generated/graphql.ts) from the repository to improve auditability and reduce diff noise.
I’ve also added these security mitigations for the development environment:
Dependency Audit: Integrated npm audit --audit-level=high into the build pipeline to automatically block builds with critical vulnerabilities.
Supply Chain: Pinned vite-plugin-monkey version to mitigate potential supply chain risks.
Re: color mixing—noted! It’s currently good enough for this use case, but I’ll keep the gamma correction resources in mind if we need higher fidelity.
And yes I’ve been using different LLMs (Gemini, GPT, Claude) to review each other’s code.
Please let me know if you have any other suggestions.
Mmm yeah this should be doable to fix. I think the limitation might be due to handling bot swarms that sometimes try to download everything, but if so I’m guessing we can handle it somehow. We’re discussing internally atm.
Hmm. I wonder what it’d take to create a no-ui, API-only, read-only mirror of LW data. For most uses, a few minute delay would cause no harm, and it could be scaled for this use independently of the rest of the site. If significant, it could be subscription-only—require auth and rate-limited based on a monthly fee (small, one hopes, to pay for the storage, bandwidth, and api compute).
I would need a first-sync (and resync/anti-entropy) mechanism, but could just poll the allRecentComments to stay mostly up-to-date, and turn this into a single-caller to the LW systems, rather than multiple.
GreaterWrong is calling the same API against the LW server, then serving the resulting data to you as HTML. As a result it has the same limitations, so if you keep going to the next page on Recent Comments, eventually you’ll get to https://www.greaterwrong.com/recentcomments?offset=2020 and get an error “Exceeded maximum value for skip”.
My attempt to resurrect the old LW Power Reader is facing an obstacle just before the finish line, due to current LW’s API limitations. So this is a public appeal to the site admins/devs to relax the limit.
Specifically, my old code relied on LW1 allowing it to fetch all comments posted after a given comment ID, but I can’t find anything similar in the current API. I tried reproducing this by using the allRecentComments endpoint in GraphQL, but due to the offset parameter being limited to <2000, I can’t fetch comments older than a few weeks. The Power Reader is part designed to allow someone to catch up on or skim weeks/months worth of LW comments, hence the need for this functionality.
As a side effect of this project, my AI agents produced a documentation of LW’s GraphQL API from LW’s source code. (I was unable to find another API reference for it.) I believe it’s fairly accurate, as the code written based on it seems to work well aside from the comment-loading limit.
If you go to
/graphiqlthere’s a query-editor with integrated documentation, and the API schema is in the github repo here. The offset limit is because database queries sometimes become extremely slow when given large offsets.We added
beforeandafterdate options toallRecentCommentsso you should now be able to get comments with something like:The resurrected LessWrong Power Reader is now on GitHub. Thought I’d post it here in a lowkey way in case you (or anyone else) want to test it or review its API usage, make sure it’s not doing something undesirable, before I make an announcement post.
the current intro and feature list
A fast, context-first reader for LessWrong and the EA Forum, designed to make high-volume reading and thread navigation feel effortless.
Chronological Reader Core: Shows comments in strict time order with date-based pagination, so you can read without gaps.
Deep Thread Context: Loads missing parents and replies so deep comments still make sense.
Post + Comment Power Actions: Inline controls to expand/load post bodies, load all comments, jump between posts, and navigate thread roots.
Sticky Header Navigation: Keeps post controls and key metadata visible while scrolling long threads.
Keyboard-First Workflow: Hotkeys for post/comment actions, including fallback from hovered comments to parent post actions.
AI Studio Integration: Send posts or comment threads to Google AI Studio (
g/Shift+G), with customizable prompt prefix.Rich Voting & Reactions: Karma + agreement voting, reaction picker, and inline quote reactions with highlighted quoted text.
Smart Prioritization: Sorts by “most important unread content first” and can hide threads that are already fully read.
Personalization: Author preference controls, read tracking, and saved layout/settings in browser storage.
Site-wide Entry Point: Accessible from any forum page via an injected “POWER Reader” header link.
The code doesn’t look like it would cause catastrophic problems. The main risk to end users at the current level of testing is a bug causing important information to be missed. My ability to comment on the risk to a developer is limited however, because I haven’t read the source code of all the development dependencies.
I have visually checked (as a human) the dist/power-reader.user.js file. End users should be relatively safe copying this into their browser plugins, as long as all plugins have no relevant security problems or malicious code. As mentioned before, I’m not too sure of the safety of compiling this file. The file does appear to execute buggy code[1], but I haven’t seen anything security or database-mutation related.
Note that one of your recent commits is large, making it hard to audit and therefore making it difficult to establish the safety of doing development work on your script. That commit looks like a combination of LLM generated code and more traditional programmatically generated code. It may be helpful to make sure that code that looks like it was programmatically generated has a consistent ordering to reduce the size of diffs, and to make sure it is never directly touched by an LLM. LLMs are approaching the capability to write underhanded code, if they are not there already, suggesting that diffs should be small and carefully reviewed. Since LLMs don’t look very strategically competent at the current date, you may be able to have LLMs from one company review the code from another, as long as you can be sure that files don’t contain e.g. non-rendered Unicode information.
E.g. color mixing is done on raw color values, therefore implicitly in something close to either sRGB or RGB with a pure Gamma2.2 transfer. This should technically be done in linear light instead, but it might be fine as is given your non-user-selectable function inputs. See https://www.youtube.com/watch?v=xDLxFGXuPEc timestamp 1 minute 3 seconds. https://www.ericbrasseur.org/gamma.html gives the formulas.
Thanks for taking a look! I’ve removed the large programmatically generated file (src/generated/graphql.ts) from the repository to improve auditability and reduce diff noise.
I’ve also added these security mitigations for the development environment:
Dependency Audit: Integrated
npm audit --audit-level=highinto the build pipeline to automatically block builds with critical vulnerabilities.Supply Chain: Pinned
vite-plugin-monkeyversion to mitigate potential supply chain risks.Re: color mixing—noted! It’s currently good enough for this use case, but I’ll keep the gamma correction resources in mind if we need higher fidelity.
And yes I’ve been using different LLMs (Gemini, GPT, Claude) to review each other’s code.
Please let me know if you have any other suggestions.
Mmm yeah this should be doable to fix. I think the limitation might be due to handling bot swarms that sometimes try to download everything, but if so I’m guessing we can handle it somehow. We’re discussing internally atm.
Hmm. I wonder what it’d take to create a no-ui, API-only, read-only mirror of LW data. For most uses, a few minute delay would cause no harm, and it could be scaled for this use independently of the rest of the site. If significant, it could be subscription-only—require auth and rate-limited based on a monthly fee (small, one hopes, to pay for the storage, bandwidth, and api compute).
I would need a first-sync (and resync/anti-entropy) mechanism, but could just poll the allRecentComments to stay mostly up-to-date, and turn this into a single-caller to the LW systems, rather than multiple.
Did you try submitting a PR? I assume this is a one line change. I would assume an open PR can reach the right people quicker than a shortform.
Idle suggestion, probably not useful: have you checked if you can do what you want by using GreaterWrong instead?
GreaterWrong is calling the same API against the LW server, then serving the resulting data to you as HTML. As a result it has the same limitations, so if you keep going to the next page on Recent Comments, eventually you’ll get to https://www.greaterwrong.com/recentcomments?offset=2020 and get an error “Exceeded maximum value for skip”.