Rafael Harth

Karma: 5,357

I’m an independent researcher currently working on a sequence of posts about consciousness. You can send me anonymous feedback here: https://www.admonymous.co/rafaelharth.

Rafael Harth 14 Mar 2026 14:53 UTC
8 points
3
on: New LessWrong Editor! (Also, an update to our LLM policy.)
This is also why having LLMs “edit” your writing is often pernicious. LLM editing, unless managed extremely carefully, often involves rephrasings, added qualifiers, and swapped vocabulary in ways that meaningfully change the semantic content of your writing. Very often this is in unendorsed ways, but this can be hard to pick up on because the typical LLM writing style has a tendency to make people’s eyes slide off of it[5].

Hmm so I think this centrally depends on what you mean by editing? There’s
1. Give text to the LLM (with whatever instructions), then take its edited output
2. Tell the LLM to make suggestions about what to change in your text, then incorporate the suggestions to whatever extent you think makes sense—but do it entirely via manual editing, no copy-pasting
I’ve never done #1. I’ve never even considered doing #1, I think because the idea of publishing anything actually written by LLMs is just so emotionally yuck to me. But I do #2 all the time, to the point that not doing it seems like a weird decision for anything important. And I think #2 fundamentally avoids the problems you mentioned?

Rafael Harth 24 Feb 2026 12:41 UTC
5 points
0
in reply to: Raemon’s comment on: Rafael Harth’s Shortform
It is, although it was unintentional karma farming, I had no expectations that this would go above, idk, 7.

The issue is (was?) that it’s difficult to make a more high-effort post without singling out users, which I don’t want. This is why I didn’t say anything like this until now. But this is now solved with habryka greenlighting a private channel. If I’ll say anything else about this, I’ll do it there.

Rafael Harth 23 Feb 2026 10:30 UTC
3 points
0
in reply to: Steffee’s comment on: Bayesians Commit the Gambler’s Fallacy
This begs the question of how to develop priors. I thought the benefit of Bayes is that it can converge on the best probabilities no matter your starting point, when you’ve been presented enough evidence, so long as you don’t assign anything a 0% or 100% prior.

This is true, and this still happens, the post didn’t say anything about convergence in the limit

This begs the question of how to develop priors.

For instance, you could be playing a videogame and you don’t know whether an enemy boss was programmed to cycle between both of its possible attacks randomly, or if it was programmed to be Switchy or Sticky. Then I think the “fallacy” presented by the OP would apply, wouldn’t it?

So two things about this
- there is a theoretical answer on the priors problem, though it’s computationally intractable. It’s a huge rabbit hole, if you want to go down
- (nice example!) the real answer here is that you don’t start accumulating evidence with the first boss hit, but well before that. Lots of things in the world give you information about how real people will most likely have programmed a boss in this case. Or more practically relevant, you’d consider your prior knowledge to affect your choice of the prior in this case. Pretty sure sticky here is pretty unlikely, and it’s either switchy or randomly.

Rafael Harth 22 Feb 2026 13:18 UTC
2 points
1
in reply to: Cole Wyeth’s comment on: Rafael Harth’s Shortform
Qualitative/intuitive impression, I didn’t apply any formalized metric.

Rafael Harth 22 Feb 2026 13:16 UTC
3 points
0
in reply to: papetoast’s comment on: Rafael Harth’s Shortform
Fwiw I made a post/question/poll about this once (like whether you should vote based on total karma).

Rafael Harth 22 Feb 2026 1:36 UTC
2 points
0
in reply to: habryka’s comment on: Rafael Harth’s Shortform
Gotcha, I’ll at least consider writing a more comprehensive argument. Don’t mind including examples in a private context.

Rafael Harth 21 Feb 2026 23:52 UTC
2 points
0
in reply to: habryka’s comment on: Rafael Harth’s Shortform
Is this a problem that mods can solve? I mean if I 100% convinced you, what would you do about it?

Rafael Harth 21 Feb 2026 23:16 UTC
5 points
0
in reply to: Gordon Seidoh Worley’s comment on: AGI is Here

My judgement is obviously different, in that I want other people to freak out (well, I don’t actually want them to be anxious and fearful, but I can’t control that) in that I want people to realize what I think is happening and, if they agree, take short term actions that may buy us more time to do critical safety work.

Already said this, but want to repeat that I think the perspective is totally valid.

But in terms of the cold consequentialist calculus, I don’t know how you get to the “more alarming is a good idea” result. Maybe I’m biased because ²⁄₂ cases I know well (myself and the friend I mentioned) low-key left the platform because the constant reminders are so crippling for mental health. I don’t have a survey on how bad other people feel. But my impression is that I see a post about AI acceleration more than half the time I look at the frontpage. Valentine wrote Here’s the Exit literally over three years ago! It was already so bad back then that people contemplated leaving the community over it. And it’s been going on ever since.

I genuinely believe that even if your utility function has zero terms in it other than maximizing useful AI interventions (whether policy or technical safety work, or anything else), you should want fewer posts like this. Everyone got the memo that it’s time to panic. I think the awareness-of-how-bad-it-is curve would have plateaued even if there were one fifth as many posts like this one, and the marginal effect of every other post is just to make people freak out more.

Rafael Harth 21 Feb 2026 22:49 UTC
5 points
0
in reply to: Steven Byrnes’s comment on: Rafael Harth’s Shortform
It’s definitely not a response to your post! I mean, yes I do think the post is an example of the pattern, but your posts in general are not, and it’s also not the most egregious example, the front page is not exactly filled with takes on this particular topic.

Im terms of being directly causally linked to me getting annoyed enough to write the short form, the other post I just complained about playes a much bigger role, and is a much more central example. But I’d still dispute that the shortform is a response to that. This is a complaint I’ve had for months and could have written at any point.

Don’t really want to name examples because it would single out individual authors. That’s why I didn’t include any in the first place.

Rafael Harth 21 Feb 2026 14:50 UTC
16 points
8
in reply to: Gordon Seidoh Worley’s comment on: AGI is Here
Strongly agree that it correlates. It’s just not quite the same as most people agreeing with you. It prob measures something more like a lot of people strongly agreeing with you, with it mattering less how many people disagree with you. Like if on topic A, agreement/disagreement is ⁵⁰⁄₅₀, and on topic B it’s ⁹⁰⁄₁₀, but for A people feel super strongly and B people are mostly indifferent, then A might get rapid upvotes whereas B would just disappear from visibility immediately.

Anyway, much more importantly:

That said, I wrote this post mostly because I’m freaking out about the state of things and this is my best attempt to crystallize the source of that freak out. I’m not actually trying to make a rigorous argument about what counts as AGI, and I don’t even know if making a rigorous argument is worth it. You should probably read this post more as “Gordon is saying that he’s feeling the AGI hard, and you should do with the information about his judgement as a 25 year veteran of AGI discourse what you will.”

Big props for saying that! And this is actually one of the reasons why I perceive these posts as harmful, I think a lot of people are freaking out, this kind of post contributes to more people freaking out, and bc I think AGI isn’t very close, to me that just seems like a big net negative. But of course that depends entirely on your beliefs; if AGI were close, maybe it would be correct to freak out. This is what always makes it difficult to know what to say about posts like this; like to me the entire pre-apocalypse vibe is a huge negative for the overall utility of lesswrong, but I think people genuinely believe it, so unclear to what extent I should/can push back.

Fwiw my ire is much more directed at people upvoting this than at you for writing it. This seems to me like a pretty clear case: even if the narrative were 100% true, I don’t see how continuously broadcasting the vibe is a good idea. If I were an alignment researcher, I doubt it’d be good for my productivity. In fact I just talked to someone a few weeks ago on Discord who told me they don’t check LW much anymore for mental health reasons even though they still completely believe in the narrative and are trying to work on it. (They even deleted their account, which I thought was a bad idea.)

Rafael Harth 21 Feb 2026 13:47 UTC
47 points
29
on: Rafael Harth’s Shortform
It’s getting increasingly easy to farm karma on LessWrong by making almost substantive-less posts that just articulate a position most people agree with. Imo this is a really huge problem, karma matters, it shapes what people see and it sets incentives.

As your good deed for this week, find something to strong-downvote that you agree with. I’d guess most people do that far too infrequently, maybe never.
What links here?
- papetoast's comment on papetoast’s Shortforms by papetoast (28 Jan 2026 1:19 UTC; 2 points)

Rafael Harth 21 Feb 2026 13:40 UTC
12 points
3
in reply to: Gordon Seidoh Worley’s comment on: AGI is Here
Selection Bias. I completely disagree, ^[1] I just don’t know what to do with that so didn’t comment. “No I think you’re wrong”? This doesn’t seem like a really productive comment. “No I think AGI can’t do xyz yet?”. Maybe better but that will just open me up for another debate about this, and I’ve zero interest in debating it. Overall I didn’t see what kind of comment was worth writing, so I didn’t bother. This reaction is probably quite common, and at any rate, looking comments like this doesn’t tell you otherwise.

I’m only saying something now to push back on the “reception indicates agreement” point.

I’m not sure even upvotes indicate net agreement because people will have stronger inhibitions for downvoting. Many people who agree will just upvote because they agree. I doubt everyone who disagrees will downvote without formulating a critique. Though idk, iirc mods have encouraged liberal downvoting, maybe I’m wrong and people do it freely.
1. ↩︎
  For my view, I’ll just refer to Steven Byrnes’ formulation:
  
  By “AGI” I mean here “a bundle of chips, algorithms, electricity, and/or teleoperated robots that can autonomously do the kinds of stuff that ambitious human adults can do—founding and running new companies, R&D, learning new skills, using arbitrary teleoperated robots after very little practice, etc.”
  
  Yes I know, this does not exist yet! (Despite hype to the contrary.) Try asking an LLM to autonomously write a business plan, found a company, then run and grow it for years as CEO. Lol! It will crash and burn!
  
  And I guess I’ll add that I don’t think LLMs are close to AGI, will lead to AGI, or anything like that, and yea I do also think claims to the contrary are net harmful for several reasons.

Rafael Harth 20 Feb 2026 12:53 UTC
3 points
0
in reply to: Steven Byrnes’s comment on: The brain is a machine that runs an algorithm
Hmm, okay. So after reflecting on this a bunch, I think the things that still bug me about this post after reading your clarification aren’t about factual merits but about implications and tone. I’m not sure what the best practice in such a case is, maybe just not saying anything is best. I guess tell me if you prefer I didn’t write this response lol. But decided to say it this time.

I think this post is a) mostly attacking a strawman, in that most people who you think disagree with you actually do so for different reasons (although not everyone, I concede some people will disagree with the algorithm thing as you define it) and b) even insofar as they do exist, the net effect of this post will be substantially negative because it’s antagonizing, mocking, and unpersuasive.

E.g.:

The hilarious irony of psychedelics is:[4]

Objectively, psychedelics should be the most clear-cut evidence you could imagine for the idea that the brain is a machine that runs an algorithm, and that the mind is something that this algorithm does. After all, these tiny molecules, which just so happen to lock onto a widespread class of neuron receptors, create seismic shifts in consciousness, beliefs, perceptions, and so on.

…And yet, the people who actually take psychedelics are much likelier to stop believing that. Ironic.

This would feel right at home in r/sneerclub, which is odd to me because your posts usually have a very humble vibe. And yea I guess I don’t understand what your theory of mind is for how something good will result from anyone reading this.

But yea feel free not to reply, and can not express similar things in the future if you want.
What links here?
- Steven Byrnes's comment on Rafael Harth’s Shortform by Rafael Harth (21 Feb 2026 22:01 UTC; 8 points)

Rafael Harth 18 Feb 2026 12:10 UTC
3 points
0
on: The brain is a machine that runs an algorithm
What’s odd to me about this post is that it doesn’t define algorithm, so I don’t know what the claim is. ^[1]

The closest to a definition is the third-last section. Which I’d critique because (a) the definition should be at the beginning, but okay that’s just a nitpick about writing/structure, more importantly (b) I read it and I’m still not sure I know what the definition is, and (c) insofar as I understand it, I don’t think it maps onto conventional usage very well (see last section).

My best guess of what you mean based on that section is “anything that’s entirely about data processing in the widest sense is an algorithm”? But I don’t think that maps onto the common usage of the term very well. Also I don’t know how it applies to obscure cases. If you have a knotted wire and make the surface repulsive, it will ‘compute’ a way to disentangle, which is an ‘output’ in the sense that it corresponds to a set of topological transformations, but is this an algorithm? (Probably not because it actually does the disentangling itself, rather than just computing how you would disentangle, so therefore it’s not entirely about data processing?)

Edit: even the mechanical adder seems like an unclear case because it physically instantiates the solution, I guess it’s an algorithm because we don’t care about the physical instantiation, so in this case we can view it as only an informational output rather than a device that “does” anything? But this is not a crisp distinction; what if the mechanical adder were part of a larger system where the physical arrangement of marbles were utilized more?

If I did grasp the distinction correctly, then I don’t think this distinction is all that practically relevant. If the brain used some kind of quantum algorithm that had 100000x computational overhang if it were instantiated on a classical computer, ^[2] then your conclusion that you could replace the input/output mapping with a computer would still be true, so what does it prove? This seems to me like just a restatement of the Turing thesis, or I guess an application of the thesis to brains?

My practical concern RE (c) is that, imE, the concept of algorithm is generally seen as implying that the manner in which the brain does computation is similar to what a computer does, but that’s totally orthogonal to what you discuss here. So I can see people taking away the wrong thing.
1. ↩︎
  Well I guess there’s two claims, machine and algorithm, but the first I have no issues with so I’m only talking about the second here.
2. ↩︎
  I don’t think it does, this is a hypothetical only

Rafael Harth 5 Feb 2026 13:09 UTC
7 points
2
on: Scratching the sore: how pleasure relates to suffering
I’m not particularly buying any of this. The central metaphor just doesn’t seem true (scratching an itch can be way more pleasurable than not having one, imE, and ditto with many other instances of receding unpleasantness) and I don’t think “[t]herefore what we usually take as pleasure is just scratching the sore of underlying suffering” follows from any of the stuff before (lots of types of pleasure for which this doesn’t seem true).

Rafael Harth 20 Jan 2026 14:40 UTC
3 points
0
in reply to: Gabriel Alfour’s comment on: How to think about enemies: the example of Greenpeace
Have you read the most upvoted responses to your link?

yes. I don’t think any of them suggest that LessWrong is supporting or enthusiastic about OpenAI. (In particular, whether you should work there doesn’t have much relation to whether the company as a whole is a net negative.) I would describe the stance of top 2 comments on that post as mixed ^[1] and of LW’s stance in general as mixed-to-negative.

Lesswrong should have had much more enmity toward OpenAI,

Fwiw this is not a crux, I might agree that we should be more negative toward OpenAI than we are. I don’t think that’s an argument for laxer standards of critcism. Standards for rigor should lead toward higher quality criticism, not less harsh criticism. If you had attacked Greenpeace twice as much but had substantiated all your claims, I wouldn’t have downvoted the post. I’d guess that the net effectiveness of a community’s criticism of a person or org goes up with stricter norms.
1. ↩︎
  e.g., Ben pace also says, “An obvious reason to think OpenAI’s impact will be net negative is that they seem to be trying to reach AGI as fast as possible, and trying a route different from DeepMind and other competitors, so are in some world shortening the timeline until AI. (I’m aware that there are arguments about why a shorter timeline is better, but I’m not sold on them right now.)”

Rafael Harth 20 Jan 2026 11:38 UTC
3 points
−10
in reply to: Gabriel Alfour’s comment on: How to think about enemies: the example of Greenpeace
So I think the norm is something like “if you write something that will predictably make people feel worse about [real person or org], you should stick to journalistic standards of citing sources and such”. That means all your quotes depend on whether you’ve sufficiently established the substance of the quote.

If we take your post as it is now, well you only have one source, which is the group letter to congress. Imo as you used it this actually does not even establish that they’re anti nuclear power because the letter is primarily about fossil fuels, and the quote about nuclear power is in the context of protecting indigenous rights. Also you said it was signed with 600 other companies, so it might have been a compromise (maybe they oppose some parts of the content but thought the entire thing was still worth signing). An endorsement of a compromise/package is just really not a good way to establish their position. It would be much better to just look at the Wikipedia page and see whether that says they’re anti nuclear. Which in fact it does in the introduction. Some would probably quibble with that but for me that would actually be enough. So if you just did that, then I’d excuse all quotes that only reference them being anti-nuclear power (which I guess is just the first in your list).

Saying that they’re my enemy is a little harder because it would require establishing that they’re a negative for climate protection on net. This is not obvious; you could have an org that’s anti nuclear power and still does more good than harm overall. It probably still wouldn’t be that difficult, but your post as is certainly falls short. (And BTW it’s also not obvious that being anti nuclear power now is as bad as having been anti nuclear power historically. It could be the case that having been anti nuclear power historically was a huge mistake and we should have invested in the technology all this time, but that since we didn’t, at this point it actually no longer makes sense and we should only invest in renewables. I don’t think that’s the case, I think we should probably still build nuclear reactors now, but I’m genuinely not sure. This kind of thing very much matters for the ‘net negative impact’ question.)

Specifically, it should be about Lesswrong having a bad culture. One that favours norms that make punishing enemies harder, up to the point of not being able to straightforwardly say “if you are pro-nuke, an org that has been anti-nuke for decades is your enemy”.

I think it’s very unlikely that having laxer standards for accusing others is a good thing. Broadly speaking it seems to me that ~100% of groups-that-argue-about-political-or-culture-war-topics suffer from having too low standards for criticizing the outgroup, and ~0% suffer from having too high standards. And I don’t think these standards are even that high, like you could write a post that says Greenpeace is my enemy, you’d just have to put in the effort to source your claims a little. Or, more practically, you could have just written the post about a fictional org, then you can make your point about enemies without having to deal with the practical side of attacking a real org.

Not related but

why the Lesswrong community has supported three orgs racing to AGI.

This was not my impression. My impression was that people associated with the community have founded orgs that then did capability research, but that many, probably most, people on LW think that’s a disaster. To varying degrees. People are probably less negative on Anthropic than OpenAI. We’re certainly not enthusiastic about OpenAI.. In any case I don’t think it summarizes to “the Lesswrong community has supported” these orgs.

Rafael Harth 19 Jan 2026 12:35 UTC
12 points
4
in reply to: cousin_it’s comment on: How to think about enemies: the example of Greenpeace
Yea, having similar feelings about this post. The conclusion is probably still correct, but not sufficiently established. And I think there should be, idk, a norm about being more thorough when talking badly about an org, and violating that doesn’t seem worth the point made here.

Rafael Harth 6 Jan 2026 10:51 UTC
2 points
0
in reply to: sanyer’s comment on: A guide to Iterated Amplification & Debate
Hm, they all show up for me I think? Maybe it was something temporary?

Rafael Harth 29 Sep 2025 12:08 UTC
2 points
0
on: Rafael Harth’s Shortform
Okay so even though I’ve already written a full-length post about timelines, I thought I should make a shortform putting my model into a less eloquent and far more speculative-sounding and capricuous format. Also I think the part I was hedging the most on in the post is probably the most important aspect of the model.

I propose that the ability to make progress on...
1. well-defined problems with verifiable solutions; vs.
2. murky problems where the solution criterion is unclear and no one can ever prove anything
… are two substantially different dimensions of intelligence, and IQ is almost entirely about the first one. The second one isn’t in-principle impossible to measure, it’s probably not even difficult, but extremely difficult to make a socially respected test for it because you could almost only include questions where the right answer is up for debate. I called this philosophical intelligence in my post because philosophical problems are usually great examples, but it’s not restricted to those. You could also things like
- Is neoliberalism or progressivism a better governing philosophy?
- Should we ship weapons to Ukraine?
- What’s the best way to teach {insert topic here}?
Of course you can’t put those onto a test any more than you can ask “does liberterian free will exist?” on a test, so the existence of non-philosophical questions here doesn’t make measuring this ability any easier.

People often point to someone famous saying something they think is stupid and then say things like “this again proves that being an expert in one domain doesn’t translate into being smart anywhere else!” This always rubbed me the wrong way because intelligence in one area should transfer to other areas! It’s all general problem-solving capability! But in fact, those people do exist, and I’ve talked to some of them. People who have genuine intellectual horsepower on narrow problems, but as I ask them anything about a more fuzzy topic, their take is just so surface level and dumb that my immediate reaction is always this sense of disbelief, like, “it shouldn’t be possible for your thoughts here to be this shallow given how smart you are!”

… but conversely, there clearly is such a thing as expertise in a narrow area correlating with smart philosophical/political views. So sometimes intelligence does transfer and sometimes it doesn’t...

Well, I think it’s obvious what point I’m going to make here; I think sometimes people are experts in their field due to #1 and sometimes #2, and the extent that it’s #2 this tends to transfer into making sense on other questions, whereas to the extent it’s #1, it’s in fact almost meaningless. (And some people become famous without either #1 and #2, but less so if they’re experts in technical fields.)

I think #2 has outsized importance for progress on many things related to AI alignment and rationality. For example, I think Eliezer is quite high in both #1 and #2, but the reason he has produced a more useful body of work than the average genius has much more to do with #2. Almost nothing in the sequences seems to require genius level IQ; I think he could be a SD lower in IQ and still have written most of them. It would make a difference, don’t get me wrong, but I don’t think it would be the bottleneck. (None of this depends on what Eliezer is up to nowadays btw, you can ignore the last 15 years for this paragraph.)

Now what about dangerous capability advances and takeover scenarios from LLMs; can those happen without #2? Imo, absolutely not. Not even a little bit. You can have all sorts of negative effects of the kind that are already happening—job loss, increased social isolation, information silos, misinformation, maybe even some extent of capability enhancement, stuff like that—but the classical superingelligence-ian scenarios require the ability to make progress on problems with murky and unverifiable solutions.

I think the entire notion that LLMs can’t really come up with novel concepts—one of the less stupid criticisms of LLMs, imo—is a direct result of this (coming up with a novel concept is exactly the kind of thing you need #2 for because there’s no way to verify whether any one idea for a new concept does or doesn’t make sense). Although this is not absolute because sometimes they can spit out new ideas at random; the “inability to derive new concepts” framing doesn’t quite point at the right thing since creativity isn’t the issue, it’s the ability to reliably figure out whether a new concept is actually useful. The disconnect between stuff like METR’s supposed exponential growth in LLM’s capabilities on long-horizon tasks and actual job replacement on those tasks is another. There is just a really fundamental problem here where metrics for AI progress are biased towards things you can measure—duh! -- which systematically biases toward #1 over #2. (Although METR has actually acknowledged this at least a little bit, I feel like they’ve actually been very epistemically virtuous from what I could see, so I don’t wanna trash them.)

Or to just put it all very bluntly, if LLMs cannot answer questions as easy as “does libertarian free will exist” or “what’s the right interpretation of quantum mechanics?”—and they can’t—then clearly they’re not very smart. And I think they’re not very smart in a way that is necessary for basically all of the doom-y scenarios.

I’m not expecting anyone to agree with any of this, but in a nutshell, much of my real skepticism about LLM scaling is about the above, especially lately. I don’t think we’re particularly close to AGI… and consequently, I also don’t think much of the classical superintelligence-ian views have actually been tested, one way or another.