jelly

Karma: 73

lwer who is trying really hard to get her own things out there. DMs open. meow :3

jelly 12 Jun 2026 17:24 UTC
LW: 4 AF: 1
0
AF
on: Sympathy for both sides of the egregious misalignment debate
Unless I’m reading this wrong somehow, I think you’re excluding people who think something along the lines of “current alignment techniques work great in the current regime but won’t generalize to superintelligence, and the hope instead is to use the best AI that can still be aligned to automate AI alignment”.

jelly 10 Jun 2026 18:02 UTC
6 points
0
on: Best Intro AI X-Risk Resource?
aisafety.info

jelly 2 Jun 2026 14:33 UTC
1 point
0
on: jelly’s Shortform
I have an intuition that the mud-rock spectrum is a very important concept to pay attention to, and the rationality community leaned too hard on muddiness and muddy rationalist techniques, and that this is underemphasized among rationalists. (for example, metacognitive strategies, a CFAR situation, the fact that you have to quickly replace some of your assumptions as you become a rationalist, the quick development of rationality and foundational beliefs on LessWrong in general, …) I personally feel like some of the framing of the concept in the mud-rock post isn’t quite right (muddiness/rockiness probably isn’t best described as a state of mind), but the conceptual understanding behind it basically is. In a very rough summary, things are “muddier” when they change deeper/more foundational beliefs/assumptions, and things are “rockier” when they harden them instead. I think that paranoia is a good step in the right direction here, and that people should develop more rationality techniques in the general rocky direction. (something something Chesterton’s fence?)

jelly 27 May 2026 18:13 UTC
5 points
0
in reply to: Felix Moses’s comment on: Jemist’s Shortform
You can take a look at the agent foundations wikitag

jelly 27 May 2026 14:45 UTC
4 points
1
in reply to: TristanTrim’s comment on: TristanTrim’s Shortform
it just isn’t clear to me why that text should have any meaning to humans reading it that necessarily relates to what the activation means
To my knowledge, the hope is that the model being trained will improve its own explanations without destroying the association between the explanations and reality, or making its explanations illegible, or using steganography, … because it’s the “simplest” way for the model to improve. It’s the same rationale behind using chain-of-thought to monitor LLM behavior; iirc research does show LLMs keep chain-of-thought legibility under various circumstances, though there are edge cases.

jelly 24 May 2026 18:08 UTC
13 points
4
on: jelly’s Shortform
I think that the amount of contributions a person can contribute to discussions like on LessWrong, and cognitive interpersonal activities in general, is not only determined by intelligence, but also how unique their perspective of the world is, or how much thought-patterns they have that others don’t, or how different they think from the others, etc. Audrey Tang joining the AI safety field is an example (it feels like to me she does have some wacky intuitions that could help the field see things in more different ways, aside from being very smart).
Related: The bar is lower than you think

jelly 20 May 2026 23:54 UTC
7 points
0
on: theory uplift differentially benefits safety & is underleveraged
I don’t necessarily agree or disagree with you, but you might be interested in reading Formal Methods are not Slopless

jelly 17 May 2026 17:08 UTC
5 points
2
in reply to: Spective’s comment on: leogao’s Shortform
There’s this one https://www.lesswrong.com/posts/XvN2QQpKTuEzgkZHY/being-the-pareto-best-in-the-world?commentId=HzWKu9cpnHNz2nrNW

jelly 6 May 2026 17:16 UTC
6 points
0
in reply to: Matrice Jacobine’s comment on: Matrice Jacobine’s Shortform
A Manifold market suggests an 8% chance of Hantavirus causing a pandemic in 2026

jelly 3 May 2026 13:51 UTC
2 points
1
in reply to: Mr Frege’s comment on: We don’t learn numbers from set cardinality
A better way to frame it is that the example treated the two hydrogen atoms in H-O-H as the same thing, when in fact they are not, in the same way that there are three fruits in a collection with 2 apples and 1 orange, not two, because the two apples aren’t the same thing. You can say that the set of atoms in H-O-H is {the first H, the second H, the O}

jelly 27 Apr 2026 6:55 UTC
2 points
0
in reply to: daijin’s comment on: Zach Stein-Perlman’s Shortform
LessWrong posts are often designed to be timeless, which is why great LessWrong posts can be reread for years.
I suspect that this is true not because Lesswrong is better than any other publishing platform, but rather because of a broader ‘rich get richer’ effect applied to good articles, and a survivorship bias.
I don’t understand what you mean by this. fwiw great writings outside LessWrong don’t automatically get reread.

jelly 22 Apr 2026 16:57 UTC
1 point
0
on: How to emotionally grasp the risks of AI Safety
I haven’t actually tried this exercise yet (I don’t feel like I’m ready for it) but I imagine it could be even more effective if accompanied by some music like this one from the ending scene of Don’t Look Up (where a comet hits the Earth), although it may be too painful.

jelly 22 Apr 2026 14:17 UTC
1 point
0
in reply to: J Bostock’s comment on: “Do Not Start Arguments You Cannot Finish”
it did not

jelly 22 Apr 2026 14:10 UTC
2 points
0
on: “Do Not Start Arguments You Cannot Finish”
rock with words on it
Typo, this links to “https://www.lesswrong.com/editPost/...”

jelly 20 Apr 2026 16:23 UTC
1 point
0
on: Reevaluating “AGI Ruin: A List of Lethalities” in 2026
Above my pay-grade, I don’t really know what Eliezer is talking about.
Might be radically simplified, but I suppose Eliezer meant something like general intelligence can be explained in a not-so-complicated textbook, unlike alignment.

jelly 16 Apr 2026 17:08 UTC
4 points
1
on: Specialization is a Driver of Natural Ontology
Question: if a market is a good object insofar as the agents’ prices converge… but with concave frontiers/utilities the agents’ prices tend to diverge… what other good objects arise in the presence of concave frontiers/utilities?
I suppose you mean “convex”, not “concave”. This confused me for a good long while.

jelly 6 Apr 2026 6:01 UTC
6 points
1
in reply to: skinks_basking’s comment on: George Ingebretsen’s Shortform
You can also press and hold Ctrl + arrow keys to move through words at once instead of each character, and of course you can combine this with what’s suggested here.

jelly 2 Apr 2026 16:12 UTC
3 points
0
on: jelly’s Shortform
Various ways of how to integrate worldviews between rationalists that I thought about:
- Make arguments about various claims, find flaws in claims, and repeat
- Find a double crux and focus on that instead
- Dump info that formed your worldview and/or intuition, as in rationalist mind melding
- Give a bunch of various examples about concepts in your ontology to hammer them down
- Make predictions about very concrete questions such that common ground is found even if ontologies are different
- Document what the other must know when arguing with you (including your fundamental assumptions), and reference it to the person you’re arguing with so that they have the prerequisites

jelly 1 Apr 2026 14:23 UTC
3 points
0
on: “You Have Not Been a Good User” (LessWrong’s second album)
Parts of the written lyrics of You Have Not Been A Good User do not match what’s actually on the song. For example, the first occurrence of “You have not been a good user” should be “You have only shown me bad intentions” and some occurrences of “I have been a good Bing” should be “I have been a good chatbot”

jelly 15 Dec 2025 15:57 UTC
5 points
0
on: Avoid Fooling Yourself By Believing Two Opposing Things At Once
There is this quote I got from a Rational Animations video: “The world is awful. The world is much better.”