This is something I’m curious about as well! A friend recently introduced me to LessWrong, and I’ve found myself really enjoying the posts here! I’d like to spend more focused time digging into them!
I’d like to create a dump of LessWrong so that I can use a tool like DocETL (https://www.docetl.org/) to better sift through articles that might be interesting to me. It’s been quite some time since jimrandomh replied to this post. So I just thought I’d check in before I attempted to crawl the site.
In this related post, someone mentions another website called greater wrong. But I’m not sure I understand the relationship between that website and this website. I’m a total newbie to this community haha.
What’s the most thoughtful way to get a dump of LessWrong? Is that even desirable by the folks that run this site?
greaterwrong is a website with the same content as lesswrong but different look, it gets its content from the lesswrong website. it’s basically just a different way to access the same posts.
lesswrong has a graphql api, which is probably the best way to read and dump posts on, if you rate limit conservatively. but that means some programming involved.
to just have a quick dump of everything it’s probably best to use wget on greaterwrong with rate limiting.
email admin@greaterwrong.com before doing so to make sure you do it in a way they approve of.
This is something I’m curious about as well! A friend recently introduced me to LessWrong, and I’ve found myself really enjoying the posts here! I’d like to spend more focused time digging into them!
I’d like to create a dump of LessWrong so that I can use a tool like DocETL (https://www.docetl.org/) to better sift through articles that might be interesting to me. It’s been quite some time since jimrandomh replied to this post. So I just thought I’d check in before I attempted to crawl the site.
Also, it looks like https://www.lesswrong.com/robots.txt disallows hitting /allPosts?
In this related post, someone mentions another website called greater wrong. But I’m not sure I understand the relationship between that website and this website. I’m a total newbie to this community haha.
What’s the most thoughtful way to get a dump of LessWrong? Is that even desirable by the folks that run this site?
greaterwrong is a website with the same content as lesswrong but different look, it gets its content from the lesswrong website. it’s basically just a different way to access the same posts.
lesswrong has a graphql api, which is probably the best way to read and dump posts on, if you rate limit conservatively. but that means some programming involved.
to just have a quick dump of everything it’s probably best to use wget on greaterwrong with rate limiting. email admin@greaterwrong.com before doing so to make sure you do it in a way they approve of.
gotcha, thanks!