Ruby

Karma: 14,609

LessWrong Team

I have signed no contracts or agreements whose existence I cannot mention.

Ruby Jul 11, 2025, 5:08 AM
45 points
6
on: Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity
I was one of the developers in the @METR_Evals study. Some thoughts:

1. This is much less true of my participation in the study where I was more conscientious, but I feel like historically a lot of my AI speed-up gains were eaten by the fact that while a prompt was running, I’d look at something else (FB, X, etc) and continue to do so for much longer than it took the prompt to run.

I discovered two days ago that Cursor has (or now has) a feature you can enable to ring a bell when the prompt is done. I expect to reclaim a lot of the AI gains this way.
2. Historically I’ve lost some of my AI speed ups to cleaning up the same issues LLM code would introduce, often relatively simple violations of code conventions lik e using || instead of ??

A bunch of this is avoidable with stored system prompts which I was lazy about writing. Cursor has now made this easier and even attempts to learn repeatable rules “The user prefers X” that will get reused, saving time here.
3. Regarding me specifically, I work on the LessWrong codebase which is technically open-source. I feel like calling myself an “open-source developer” has the wrong connotations, and makes it more sound like I contribute to a highly-used Python library or something as an upper-tier developer which I’m not.
4. As a developer in the study, it’s striking to me how much more capable the models have gotten since February (when I was participating in the study).

I’m trying to recall if I was even using agents at the start. Certainly the later models (Opus 4, Gemini 2.5 Pro, o3 could just do vastly with less guidance) than 3.6, o1, etc.

For me, not going over my own data in the study, I could buy that maybe i was being slowed down a few months ago, but it is much much harder to believe now.
5. There was a selection effect in which tasks I submitted to the study. (a) I didn’t want to risk getting randomized to “no AI” on tasks that felt sufficiently important or daunting to do without AI assistence. (b) Neatly packaged and well-scoped tasks felt suitable for the study, large open-ended greenfield stuff felt harder to legibilize, so I didn’t submit those tasks to study even though AI speed up might have been larger.
6. I think if the result is valid at this point in time, that’s one thing, I think if people are citing in another 3 months time, they’ll be making a mistake (and I hope Metr has published a follow-up).

Ruby Jul 11, 2025, 5:06 AM
8 points
3
in reply to: Austin Chen’s comment on: Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity
I was one of the devs. Granted the money went to Lightcone and not me personally, but even if it had, I don’t see it motivating me in any particular direction. For one thing, Not taking longer – I’ve got too much to do to to drag my feet to make a little more money. Not pleasing METR – I didn’t believe they wanted any particular result.

Ruby Jul 10, 2025, 6:58 PM
12 points
7
in reply to: Elizabeth’s comment on: Raemon’s Shortform Feed
Did you mean to reply to that parent?
I was part of the study actually. For me, I think a lot of the productivity gains were lost from starting to look at some distraction while waiting for the LLM and then being “afk” for a lot longer than the prompt took to wrong. However! I just discovered that Cursor has exactly the feature I wanted them to have: a bell that rings when your prompt is done. Probably that alone is worth 30% of the gains.

Other than that, the study started in February (?). The models have gotten a lot better in just the past few months such that even if the study was true for the average time it was run, I don’t expect it to be true now or in another three months (unless the devs are really bad at using AI actually or something).

Subjectively, I spend less time now trying to wrangle a solution out of them and a lot more it works pretty quickly.

Ruby Jul 10, 2025, 5:29 PM
2 points
0
in reply to: AnnaJo’s comment on: LessWrong Feed [new, now in beta]
I agree it’s a stark difference. The intention here was to match other sites with feeds out of a general sense that our mobile font is too small.
If you wanted to choose one font size across mobile, which would you go for?

Ruby Jul 7, 2025, 11:28 PM
2 points
0
in reply to: Nate Showell’s comment on: LessWrong Feed [new, now in beta]
Hmm, that’s no good. Sorry for the slow reply, if you’re willing I’d like to debug it with you (will DM).

Ruby Jul 7, 2025, 1:51 AM
2 points
0
in reply to: Roman Malov’s comment on: LessWrong Feed [new, now in beta]
Oh, very reasonable. I’ll have a think about how to solve that. So I can understand what you’re trying to do, why is it you want to refresh the page?

Ruby Jul 7, 2025, 1:50 AM
2 points
0
in reply to: Roman Malov’s comment on: LessWrong Feed [new, now in beta]
Oh, that’s the audio player widget. Seems it is broken here! Thank you for the report.

Ruby Jul 7, 2025, 1:48 AM
4 points
0
in reply to: Rana Dexsin’s comment on: LessWrong Feed [new, now in beta]
Cheers for the feedback, I apologize for confusing and annoyingness.

What do you mean by “makes the URL bar useless”? What’s the use you’re hoping would still be there? (typing in a different address should still work
The point of the modals is they don’t lose your place in the feed in a way that’s hard technically to do with proper navigation, though it’s possible we should just figure out how to do that.

And ah yeah, the “view all comments” isn’t a link on right-click, but I can make it be so (the titles are already that). That’s a good idea.

All comment threads are what I call a “linear-slice” (parent-child-child-child) with no branching. Conveying this relationship while breaking with the convention of the rest of the site (nesting) has proven tricky, but I’m reluctant to give up the horizontal space, and it looks cleaner. But two comments next to each other are just parent/child, and if there are ommitted comments, there’s a bar saying “+N” that when clicked, will display them.

Something I will do is make it so the post-modal and comments-modal is one, and when you click to view a particular comment, you’ll be shown it but the rest will also be there, which should hopefully help with orienting.

Thanks again for writing up those thoughts!

Ruby Jul 4, 2025, 1:52 AM
3 points
0
in reply to: the gears to ascension’s comment on: LessWrong Feed [new, now in beta]
I’m curious for examples, feel free to DM if you don’t want to draw further attention to them

Ruby Jul 3, 2025, 11:43 PM
6 points
0
on: LessWrong Feed [new, now in beta]
Thread for feedback on the New Feed
Question, complaints, confusions, bug reports, feature requests, and long philosophical screeds – here is the place!

Ruby Jun 30, 2025, 7:59 PM
4 points
0
in reply to: 14nw’s comment on: The Sixteen Kinds of Intimacy
I think that intellectual intimacy should include having similar mental capacities.
Seems right, for both reasons of understanding and trust.
A part of me wants to argue that these are intertwined
I think the default is they’re intertwined but the interesting thing is they can come apart: for example, you develop feelings of connection and intimacy through shared experience, falsely assume you can trust (or shared values or whatever), but then it turns out the experiences shared never actually filtered for that.

Ruby Jun 27, 2025, 3:35 PM
14 points
1
in reply to: johnswentworth’s comment on: johnswentworth’s Shortform
This matches with the dual: mania. All plans, even terrible ones, seem like they’ll succeed and this has flow through effects to elevated mood, hyperactivity, etc.
Whether or not this happens in all minds, the fact that people can alternate fairly rapidly between depression and mania with minimal trigger suggests there can be some kind of fragile “chemical balance” or something that’s easily upset. It’s possible that’s just in mood disorders and more stable minds are just vulnerable to the “too many negative updates at once” thing without greater instability.

Ruby Jun 25, 2025, 10:28 PM
2 points
0
in reply to: Ruby’s comment on: Eric Neyman’s Shortform
To clarify here, I think what Habryka says about LW generally promoting lots of content being normal is overwhelmingly true (e.g. spotlights and curation) and this is book is completely typical of what we’d promote to attention, i.e. high quality writing and reasoning. I might say promotion is equivalent to upvote, not to agree-vote.
I still think there details in the promotion here that I think make inferring LW agreement and endorsement reasonable:
1. lack of disclaimers around disagreement (absence is evidence) together with a good prior that LW team agrees a lot with Eliezer/Nate view on AI risk
2. promoting during pre-order (which I do find surprising)
3. that we promoted this in a new way (I don’t think this is as strong evidence as we did before, mostly it’s that we’ve only recently started doing this for events and this is the first book to come along, we might have and will do it for others). But maybe we wouldn’t have or as high-effort absent agreement.
But responding to the OP, rather than motivation coming from narrow endorsement of thesis, I think a bunch of the motivation flows more from a willingness/desire to promote Eliezer^[1] content, as (i) such content is reliably very good, and (ii) Eliezer founded LW and his writings make up the core writings that define so much of site culture and norms. We’d likely do the same for another major contributor, e.g. Scott Alexander.

I updated from when I first commented thinking about what we’d do if Eliezer wrote something we felt less agreement over, and I think we’d do much the same. My current assessment is the book placements is something like ~”80-95%” neutral promotion of high-quality content the way we generally do, not because of endorsement, but maybe there’s a 5-20% it got extra effort/prioritization because we in fact endorse the message, but hard to say for sure.
1. ^
  and Nate

Ruby Jun 23, 2025, 1:34 AM
3 points
0
in reply to: gwern’s comment on: sunwillrise’s Shortform
LW2 had to narrow down in scope under the pressure of ever-shorter AI timelines
I wouldn’t say the scope was narrowed, in fact the admin team took a lot of actions to preserve the scope, but a lot of people have shown up for AI or are now heavily interested in AI, simply making that the dominant topic. But, I like to think that people don’t think of LW as merely an “AI website”.

Ruby Jun 23, 2025, 12:35 AM
2 points
2
in reply to: habryka’s comment on: Habryka’s Shortform Feed
It really does look dope

The Sixteen Kinds of Intimacy

RubyJun 21, 2025, 7:59 PM

55 points

2 comments5 min readLW link

Ruby Jun 21, 2025, 5:52 PM
11 points
0
on: Futarchy’s fundamental flaw
Curated. The idea of using Futarchy and prediction markets to make decision markets was among the earliest ideas I recall learning when I found the LessWrong/Rationality cluster in 2012 (and they continue to feature in dath ilani fiction). It’s valuable then to have an explainer for fundamental challenges with prediction markets. I suggest looking at the comments and references, as there’s some debate here, but overall I’m glad to have this key topic explored critically.

Ruby Jun 21, 2025, 4:15 AM
4 points
3
in reply to: habryka’s comment on: Eric Neyman’s Shortform
Fwiw, it feels to me like we’re endorsing the message of the book with this placement. Changing the theme is much stronger than just a spotlight or curation, not to the mention that it’s pre-order promotion.

Ruby Jun 14, 2025, 5:21 AM
8 points
1
on: A Straightforward Explanation of the Good Regulator Theorem
Curated. Simple straightforward explanations of notable concepts is among my favorite genre of posts. Just a really great service when a person, confused about something, goes on a quest to figure it out and then shares the result with others. Given how misleading the title of the theorem is, it’s valuable here to have it clarified. Something that is surprising, is given what this theorem actual says and how limited it is, that it’s the basic of much other work given what it purportedly states, but perhaps people are assuming that the spirit of it is valid and it’s saved by modifications that e.g. John Wentworth provides. It’d be neat to see more of analysis of that. It’d be sad if a lot of work cites this theorem because people believed the claim of the title without checking the proof really supports it. All in all, kudos for making progress on all this.
This may be the most misleading title and summary I have ever seen on a math paper. If by “making a model” one means the sort of thing people usually do when model-making—i.e. reconstruct a system’s variables/parameters/structure from some information about them—then Conant & Ashby’s claim is simply false. - John Wentworh

Ruby Jun 13, 2025, 5:51 PM
4 points
0
in reply to: yue’s comment on: Policy for LLM Writing on LessWrong
I think it might well be the case that non-native English speakers gained a benefit from LLMs that native-speakers didn’t, but I don’t think the fact there’s uneven impact means it’s wrong to disallow LLM assistance.

- At worst, we’re back in the pre-LLM situation, I guess facing the general unfairness that some peoplew grew up as native English speakers and others didn’t.
- Practically, LLMs, whether they’re generated the idea or just wording, produce writing that’s often enough a bad experience that I and others struggle to read it at all, we just bounce off, and you will likely get downvoted. By and large, “could write good prose with LLM help” is a very good filter for quality.
- Allowing LLM use for non-English speakers but disallowing it for other usage would be wholly impractical as a policy. Where would the line be? How long would moderators have to spend on essays trying to judge? (but in any case, the result text might be gramatically correct but still painful to read)
- already the moderation burden of vetting the massive uptick in (overwhelmingly low quality AI-assisted essays) is too high and we’re going to have to automate more of it.

It’s sad to me that that with where LLMs are currently at, non-native speakers don’t get to use a tool that helps them communicate more easily, but I don’t think there’s an alternative here that’s at all viable as policy for LessWrong.

(Well, one alternative is moderator’s don’t pre-filter and then (1) the posts we’re currently filtering out would just get downvoted very hard, (2) we’d lose a lot of readers.)

Ruby

Thread for feedback on the New Feed

The Six­teen Kinds of Intimacy

The Sixteen Kinds of Intimacy