Open Thread Autumn 2025
If it’s worth saying, but not worth its own post, here’s a place to put it.
If you are new to LessWrong, here’s the place to introduce yourself. Personal stories, anecdotes, or just general comments on how you found us and what you hope to get from the site and community are invited. This is also the place to discuss feature requests and other ideas you have for the site, if you don’t want to write a full top-level post.
If you’re new to the community, you can start reading the Highlights from the Sequences, a collection of posts about the core ideas of LessWrong.
If you want to explore the community more, I recommend reading the Library, checking recent Curated posts, seeing if there are any meetups in your area, and checking out the Getting Started section of the LessWrong FAQ. If you want to orient to the content on the site, you can also check out the Concepts section.
The Open Thread tag is here. The Open Thread sequence is here.
Hello! My name is Laiba, I’m a 20-year-old Astrophysics student and new to LessWrong (or at least, new to having an account).
I’ve been into science since I could read and received a lot of exposure to futurism, transhumanism and a little rationality. I remember thinking, “This would make a lot of sense if I were an atheist.”
Lo and behold, about a month ago I gave up on religion, and I was no casual Muslim! I thought now would be a good time to join LessWrong. I’ve read a few posts here and there, and greatly enjoyed Harry Potter and the Methods of Rationality (which is where I found out about LessWrong).
My first blog post talks a bit about my deconversion: https://stellarstreamgalactica.substack.com/p/deconversion-has-been-a-real-productivity
I’m also starting up a PauseAI student group at my university. Taking death seriously has made me rethink where I’m putting my time.
Looking forward to having interesting discussions and being able to interact with the community without the fear of sinning!
Welcome!
Follow-up to this experiment:
Starting 2025-06-13, I started flossing only the right side of my mouth (selected via random-number generator). On 2025-09-18 I went to the dentist and asked what side he guessed I’d flossed. He guessed right.
Hi all! My name is Annabelle. I’m a Clinical Psychology PhD student primarily studying psychosocial correlates of substance use using intensive longitudinal designs. Other research areas include Borderline Personality Disorder, stigma and prejudice, and Self-Determination Theory.
I happened upon LessWrong while looking into AI alignment research and am impressed with the quality of discussion here. While I lack some relevant domain knowledge, I am eagerly working through the Sequences and have downloaded/accessed some computer science and machine learning textbooks to get started.
I have questions for LW veterans and would appreciate guidance on where best to ask them. Here are two:
(1) Has anyone documented how to successfully reduce—and maintain reduced—sycophancy over multi-turn conversations with LLMs? I learn through Socratic questioning, but models seem to “interpret” this as seeking validation and become increasingly agreeable in response. When I try to correct for this (using prompts like “think critically and anticipate counterarguments,” “maintain multiple hypotheses throughout and assign probabilities to them,” and “assume I will detect sycophancy”), I’ve found models overcorrect and become excessively critical/contrarian without engaging in improved critical thinking.[1] I understand this is an inherent problem of RLHF.
(2) Have there been any recent discussions about navigating practical life decisions under the assumption of a high p(doom)? I’ve read Eliezer’s death with dignity piece and reviewed the alignment problem mental health resources, but am interested in how others are behaviourally updating on this. It seems bunkers and excessive planning are probably futile and/or a poor use of time and effort, but are people reconsidering demanding careers? To what extent is a delayed gratification approach less compelling nowadays, and how might this impact financial management? Do people have any sort of low-cost, short-term-suffering-mitigation plans in place for the possible window between the emergence of ASI and people dying/getting uploaded/other horrors?
I have many more questions about a broad range of topics—many of which do not concern AI—and would appreciate guidance about norms re: how many questions are appropriate per comment, whether there are better venues for specific sorts of questions, etc.
Thank you, and I look forward to engaging with the community!
I find Claude’s models are more prone to sycophancy/contrarianism than Perplexity’s Sonar model, but I would prefer to use Claude as it seems superior in other important respects.
If you still have such questions, feel free to ask and I, one of the other site mods, or someone else will answer! Though just seeing what other people do is a pretty good guide.
Re: 2, not much it looks like, but I’ve got an upcoming post that bluedot asked for after I have a pitch for how to relate to it on a call. Basically, either enjoy life while you can, or rage against the dying of the light.
Nice! Since writing this comment I’ve adopted something like a 3:1 ratio of the former to the latter strategy (having previously been partial to the latter strategy). I like how my life has changed in response.
Would be interested in an update when you post!
Please don’t throw your mind away is a relevant and good post. Not quite about decision-making entirely, but I recently wrote A Slow Guide to Confronting Doom and accompanying A collection of approaches to confronting doom, and my thoughts on them.
My guiding question is “how do I want to have lived in the final years where humans could shape the universe?” It’s far from zero enjoying life but probably more 1:3 ratio.
Getting concrete on the practical level, my financial planning horizon is ten years. No 401k, and I’m happy with loans with much longer payback times, e.g. my solar panel loan is 25 years which is great, my home equity line of credit is interest-only payments for ten years. Mostly I know I can use money now. Still, I care about maintaining some capital because possibly human labor becomes worth nothing and capital is all that matters. I’m not happy about having a lot of wealth in my house because I really don’t know what happens to house prices in the next decade. In contrast, I expect stocks to rise in expectation so long as society is vaguely functional.
Hi everyone, my name is Miguel,
I am a Ph.D. in NeuroSymbolic AI, my main research is focused on how to enforce logic constraints into the learning process of neural networks. I am interested in the topics of Causality, Deep Learning Interpretability, and Reasoning.
I have been a passive reader of this and the alignment forum for over a year. In the internet, I have been a passive reader for my whole life. Today, I finally decided to take the step on start interacting with the forum. After re-reading the New User’s Guide, I think the forum philosophy fits with my curiosity, eagerness for learning, and collaboration values. I do hope this forum keeps being a safe place for human knowledge and I want to help it by contributing and sharing my ideas. I am both excited and hopeful that I can find like-minded people in the forum that is trying not only to be “less wrong”, but also seeks to apply their knowledge to a beneficial impact on people’s lives and make the world “more better”.
A crazy idea, I wonder if someone tried it: “All illegal drugs should be legal, if you buy them at a special government-managed shop, under the condition that you sign up for several months of addiction treatment.”
The idea is that drug addicts get really short-sighted and willing to do anything when they miss the drug. Typically that pushes them to crime (often encouraged by the dealers: “hey, if you don’t have cash, why don’t you just steal something from the shop over there and bring it to me?”). We could use the same energy to push them towards treatment instead.
“Are you willing to do anything for the next dose? Nice, sign these papers and get your dose for free! As a consequence you will spend a few months locked away, but hey, you don’t care about the long-term consequences now, do you?” (Ideally, the months of treatment would increase exponentially for repeated use.)
Seems to me like a win/win situation. The addict gets the drug immediately, which is all that matters to them at the moment. The public would pay for the drug use anyway, either directly, or by being victims of theft. (Or it might be possible to use confiscated drugs for this purpose.) At least this way there is no crime, and the addict is taken off the streets.
This would be especially useful in those situation where “everyone knows” the place where the drugs are being sold (because obvious addicts congregate there), but for some technical reasons it is difficult to prove it legally. Don’t need to prove anything, just open a sales stand there saying “free drugs” and watch the street get clean.
Intuitively this seems deeply ethically fraught but I upvoted for demonstrating the virtue of Thinking About Things Every Once In A While
This seems quite similar to supervised use clinics https://en.wikipedia.org/wiki/Supervised_injection_site
It doesn’t really seem that similar to me at all?
“Supervised Injection Sites” are missing the bit where you get legal immunity in exchange for locking yourself into rehab, which is the core of the idea. The idea isn’t just “there should be a place where you can do drugs without being arrested”, or even the (really cool) idea “there should be a place where you can do drugs with medical attendants”. The idea is “drug users have messed up discounting rates, can we legally lock them into rehab using their drug cravings?”
Hello, I’m a second-year M.D. student (23F) in Boston. Accepted straight to an accelerated program out of high school, I’m very lucky and grateful to have been accepted to a selective medical school so young.
I spent my time in undergrad taking cognitive anthropology, philosophy of mind, religious anthro, medical anthro, neuroscience of sex/cognition, and other hard science courses to create my version of a Cognitive Science background. I’ve researched in the largest cognitive science of religion study ever done, and am grateful to be academically close to the PI still. I grew up coding and in robotics, so my background has foundation in “if, then” statements.
I post often on Substack, Instagram, and been outspoken through other means, which have resulted in me winning an award at MIT for a theory on dissociation & creative simulative ability. I am not afraid of politics, especially where they intersect with technology and medicine. I’ve had stream-of-consciousness Automatism art of mine shown at a Harvard gallery.
I don’t know what I want to do with my MD after all this. Neurology makes sense, but I feel like I’m more of a writer and thinker and artist, than a repetitive do-er. But people are always my main thought and main concern, so all I want to do is care for others.
I love to talk, so please send me a message:)
I’d carefully examine the plan to do an MD given the breadth of you interest/capabilities. It seems like you could do a lot of things and the opportunity cost is pretty high. Certainly if your goal is caring for others, I’d question it. Not just what comes after, but whether it really makes sense to do.
I’d like to share a book recommendation:
“Writing for the reader”
by O’Rourke, 1976
https://archive.org/details/bitsavers_decBooksOReader1976_3930161
This primer on technical writing was published by Digital Equipment Corporation (DEC) in 1976. At the time, they faced the challenge of explaining how to use a computer to people who had never used a computer before. All of the examples are from DEC manuals that customers failed to understand. I found the entire book delightful, insightful, and mercifly brief. The book starts with a joke, which I’ve copied below:
P. C. Hodgell said, “That which can be destroyed by the truth should be.” What if we have no free will? Disregarding the debate of whether or not we have free will—if we do not have free will, is it beneficial for our belief in free will to be destroyed?
The consequences for an individual depend on the details. For example, if you still understand yourself as being part of the causal chain of events, because you make decisions that determine your actions—it’s just that your decisions are in turn determined by psychological factors like personality, experience, and intelligence—your sense of agency may remain entirely unaffected. The belief could even impact your decision-making positively, e.g. via a series of thoughts like “my decisions will be determined by my values”—“what do my values actually imply I should do in this situation”—followed by enhanced attention to reasoning about the decision.
On the other hand, one hears that loss of belief in free will can be accompanied by loss of agency or loss of morality, so, the consequences really depend on the psychological details. In general, I think an anti-free-will position that alienates you from the supposed causal machinery of your decision-making, rather than one that identifies you with it, has the potential to diminish a person.
″...because you make decisions that determine your actions” I don’t know that this would fit with the idea of no free will. Surely you’re not really making any decisions.
“my decisions will be determined by my values”—“what do my values actually imply I should do in this situation” But your values wouldn’t have been decided by you.
I agree with your last sentence. I’m leaning towards, “If we do not have free will, people should not be told about it.” (Assuming the “proof” of no free will eliminates any possibility of constructing selves that do have free will because in that case I would want us to build them and “move into” those bodies.)
This sounds like “epiphenomenalism”—the idea that the conscious mind has no causal power, it’s just somehow along for the ride of existence, while atoms or whatever do all the work. This is a philosophy that alienates you from your own power to choose.
But there is also “compatibilism”. This is originally the idea that free will is compatible with determinism, because free will is here defined to mean, not that personal decisions have no causes at all, but that all the causes are internal to the person who decides.
A criticism of compatibilism is that this definition isn’t what’s meant by free will. Maybe so. But for the present discussion, it gives us a concept of personal choice which isn’t disconnected from the rest of cause and effect.
We can consider simpler mechanical analogs. Consider any device that “makes choices”, whether it’s a climate control system in a building, or a computer running multiple processes. Does epiphenomenalism make sense here? Is the device irrelevant to the “choice” that happens? I’d say no: the device is the entity that performs the action. The action has a cause, but it is the state of the device itself, along with the relevant physical laws, which is the cause.
We can think similarly of human actions where conscious choice is involved.
Perhaps you didn’t choose your original values. But a person’s values can change, and if this was a matter of self-aware choice between two value systems, I’m willing to say that the person decided on their new values.
Something is making decisions, is it not? And that thing that makes the decisions is part of what you would normally describe as “you.” Everything still adds up to normality.
It can can be detrimental, though, to communicate certain subsets of true things without additional context, or in a way that is likely to be misinterpreted by the audience. Communicating truth (or at least not lying) is more about the content that actually ends up in people’s heads than it is about the content of the communication itself.
I also sleep and my heart beats, but “I” don’t get to decide those things, whereas free will implies “I” get to make day-to-day decisions.
I don’t think I’m 100% following with the second-to-last sentence. Are you saying it’s detrimental to disregard the debate of whether we have free will?
The chain of causality that makes your heart beat mostly goes outside your consciousness. (Not perfectly, for example if you start thinking about something scary and as a consequence your heart starts beating faster, then your thought did have an impact. But you are not doing it on purpose.)
The chain of causality that determines your day-to-day decisions goes through your consciousness. I think that makes the perceived difference.
That doesn’t change the fact that your consciousness is ultimately implemented on atoms which follow the laws of physics.
Personally the idea of no free will doesn’t negatively impact my mental state, but I can imagine it would for others, so I’m not going to argue that point. You should perhaps consider the positive impacts of the no-free will argument, I think it could lead to alot more understanding and empathy in the world. It’s easy for most to see someone making mistakes such as crime, obesity, or just being extremely unpleasant and blame/hate them for “choosing” to be that way. If you believe everything is determined, I find it’s pretty easy to re-frame it into someone who was just unlucky enough to be born into the specific situation that led them to this state. If you are yourself successful, instead of being prideful of your superior will/ soul, you can be humble and grateful for all the people and circumstances that allowed you to reach your position/mental state.
That is true but I think would lead to net-complacency… Let’s hope if we ever do find out that free will is definite and humanity accepts it that people take the view you describe here!
Mostly agree, however, I think it unnecessarily muddies the water, to take the concept of free will, which exists on a gradient throughout nature, not as an either/or (Binary concept)...…
And then attempt to answer this non-binary question, with a Binary answer of “either/or”.
It’s like poking around trying to find out how a square answer can fit into the round hole of the question.
A round question can only have a round answer. A question on a topic that exists on a gradient, may only accurately be answered with an answer that also exists on a gradient. You can not logically mix the 2 on any order, and expect an accurate answer.
At least that’s my opinion, I could be wrong. ---Tapske...
following up to mako: what is it you propose to imagine we do not have, and what is different in worlds where we do not have it from worlds where we do?
Or, phrasing it differently; “read the sequences”
Some counterfactual questions are unanswerable, because they propose worlds that are self-contradictory or just very hard to reason about.
My account of free will is just uncertainty about one’s own future decision output, so imagining the average natural world where we don’t have that is very difficult. (There may be other accounts of free will, but they seem very confused.)
Making Beliefs Pay Rent (in Anticipated Experiences)
Free will : A topic I have pondered deeply over the years.
Firstly, like almost everything else in this 4 dimensiona existence, “free will” is not a Binary concept. It is NOT either/or. It is on a gradient.
ALL mammals display traits of free will to varying degrees. The more natural born instincts in a species, the less their free will, the less instincts an animal has, the more free will it can express.
No mammal has zero free will, and no mammal has 100% free will, not humans, not any mammal.
So the idea of free will being “destroyed” is a non-starter. It can perhaps be diminished, but never destroyed.
For those who believe we have 100% free will, ask yourself a couple Q’s.
Can you willingly hold your breath till you die ? No, you would pass out, and begin breathing, against your will.
If you walk around a corner and I yell “BOO”… Did you jump because you decided to, or were your actions dictated by instincts that had nothing to do with free will ???
Same if I poke u with a straight pin, did you decide to draw back, or was it automatic ?
No one, and no thing has total free will.
At least that’s my opinion, I could be wrong.
---Tapske...
I’m afraid I don’t understand this. if we do not have free will, then which things we believe, which errors we mistake for truth, is not a choice.
True, I’ll rephrase. If we do not have free will, would it be beneficial for our belief in free will to be destroyed? If you were a divine operator with humanity’s best interests at heart, would you set up the causal chain of events to one day reveal to humans that they do not have free will?
You would need to make sure that there is no misunderstanding. Otherwise you would be communicating something else than you intended.
So, considering that the debate on this topic is typically full of confusion, the answer is probably: no.
If we assume that locus of control is a proxy for the perception or belief in free-will, then belief in free-will does appear to have certain beneficial effects. But it seems like a moot point anyway because what was gonna happen was gonna happen anyway, right?
8th grade female physics students who were given “attribution retraining” found “significantly improved performances in physics” and favourable effects on motivation.
Ziegler, A., & Heller, K. A. (2000). Effects of an attribution retraining with female students gifted in physics. Journal for the Education of the Gifted, 23(2), 217–243.
Among seventh graders in a, frankly euphemistically titled, “urban junior high school” researchers found support for an ascociation between locus of control and their academic achievement.
Diesterhaft, K., & Gerken, K. (1983). Self-Concept and Locus of Control as Related to Achievement of Junior High Students. Journal of Psychoeducational Assessment, 1(4), 367-375. https://doi.org/10.1177/073428298300100406 (Original work published 1983)
Among widows under the age of 54, Socio Economic Status and Locus of Control were found to impact depression and life satisfaction “independently”. And that the more internal a locus of control was – the better life satisfaction and less chance these widowers had of depression.
Landau, R. (1995). Locus of control and socioeconomic status: Does internal locus of control reflect real resources and opportunities or personal coping abilities? Social Science & Medicine, 41(11), 1499–1505. https://doi.org/10.1016/0277-9536(95)00020-8
Personally, my pet theory is that the “Law of Attraction” probably is effective. Not because of any pseudo-Swedenborg/Platonic metaphysics about the nature of thought, but from a motivational perspective people who are optimistic will have a “greater surface area for success”, because they simply don’t give up that easily.
Hi all, I’m Hari. Funnily enough, I found LessWrong after watching a YouTube video on R***’s b*******. (I already had some grasp of the dynamics of internet virality, so no I did not see it as saying anything substantive about the community at large.)
My background spans many subjects, but I tend to focus on computer science, psychology, and statistics. I’m really interested in figuring out the most efficient way to do various things—the most efficient way to learn, the fastest way of arriving at the correct belief, how to communicate the most information with the least amount of words, etc. So I read the Sequences and LessWrong just felt like a natural fit. And as you can imagine, I don’t have much tolerance for broken, inefficient systems, so I quit college and avoid large parts of the internet.
LessWrong is like a breath of fresh air away from all the dysfunction, which I’m really grateful for. (My only problem is that I can spend hours lost in comment sections and rabbit holes!). I think it’s a good time for me to start contributing some of my own thoughts. Here’s a few questions/requests I have:
Firstly, I’ve been trying to refine my information diet more, but it seems more difficult with some blogs that have valuable older posts. For example, I see Marginal Revolution often mentioned, but they don’t have a “best of” post that I can start with. There’s also the dreaded linkrot.
Secondly, I’m wondering to what extent expert blind spot has been covered on LW? It seems really important given the varied backgrounds and number of polymaths here.
Thirdly, I wanted to get some feedback on some of my thoughts on anthropics. After scanning through some prior work, it looks like a lot of it is unnecessarily long and more technical than it needs to be. But I think it does have real practical implications that are important to think through.
If you combine anthropics, many-worlds, timeless physics, and some decision theory, there is a consistent logic here. The simplest way I can think of to explain this is if one imagines a timeless dartboard that has the distribution of everyone’s conscious experience across time. The arbitrary dart throw is more likely to land on people with the most conscious experience across time. This addresses the anthropic trilemma—you still lose because your conscious experience across time in losing worlds vastly outweighs the trillion yous in winning worlds in that thin slice of time.
This then implies doom soon, as Nick Bostrom points out with the Doomsday argument. But your probability of doom would have to be way too high here. So perhaps humanity decides to expand current consciousnesses rather than creating new ones. There are decision-theoretic reasons for humanity to support this—if you didn’t contribute anything to the intelligence explosion, then why should you exist?
One major implication here is that you don’t need to despair because aligned ASI is practically guaranteed in at least a few worlds. (But that doesn’t mean existential risk reduction is useless! It’s more like the work that’s being done is to expand the range of worlds that make it, rather than saving only one.)
What do you think?
I think the most efficient way to absorb the existing Less Wrong wisdom is to read the articles without comments. Because the comments are easily 10 or more times the amount of text as the articles themselves. It is not a perfect solution: sometimes the best voted comments add something substantial. But I think it is better on average.
Less Wrong Sequences without comments: https://www.readthesequences.com/
Selected best articles from Less Wrong: https://www.lesswrong.com/bestoflesswrong—but these are links to articles with comments, not sure if there is a better way to read them
Anthropic reasoning is difficult. Small changes in your model of the world can cause dramatic changes in what the distribution of the conscious experience looks like. (Like: Maybe we will never expand to universe or build a Dyson sphere, and soon will consume the fossil resources and the civilization will collapse. Then our life on 21st century Earth is a normal experience. -- But maybe we will colonize the galaxies for billions of years. Then our life on 21st century Earth is astronomically exceptional. -- But maybe the things that will colonize the galaxies are mindless paperclip optimizers. Then our life on 21st century Earth is normal again. -- But maybe… -- But maybe… -- Every new thing you consider can completely change the outcome.)
Thanks for the links—I definitely do focus in on the essential parts when I have limited resources. So I personally don’t need versions without comments, but I find the alternate link for the Sequences quite aesthetically appealing, which is nice.
As for the anthropic reasoning, there are definitely all kinds of different scenarios that can play out, but I would argue that they can be clumped into one of three categories for anthropics. One is doom soon, meaning that everyone dies soon (no more souls). The second is galactic expansion with huge numbers of new conscious entities (many souls). The third is galactic expansion with only the expansion of conscious entities that have existed (same souls). Assuming many-worlds, no more souls is too unlikely to happen in all the worlds, but it will surely happen in some. Same with many souls. But given that we live in the current time period, one can infer that most worlds are same soul worlds.
Hello , chaizen. I would like to add to what you wrote on the topic of timeless decision theory etc.
I would point out that if you believe in an interpretation of physics like the “mathematical universe hypothesis”, then you need to average over instances of yourself in different ‘areas’ of mathematics or logic, as well as over different branches of a single wave function ( correct me if I am misunderstanding the Many Worlds Interpretation) . This might well affect the weight you assign to the many simulated copies of yourself; in particular, if you interpret yourself as a logical structure processing information, then it could be argued that at a high level of abstraction the trillion copies are (almost) identical and therefore don’t count as having 1 trillion times as much conscious experience as 1 of you, only being distinct consciousnesses insofar as they experience different things or thought processes.
The above would be my tentative argument for why an extremely large number of moderately happy beings would not necessarily be morally better than a moderately large number of very happy ones as they probably have much higher overlap with one another in a mathematical/logical universe.
I just noticed that hovering a Lesswrong link on Lesswrong.com gives me what looks like an AI summary of a post that is totally unlabeled. What the heck!?
I am registering that I dislike this and didn’t catch the announcement that this feature was getting pushed out without a little widget at the front saying it was AI-voice.
Edit for posterity: If you’re looking at this comment and are confused, scroll down to find a comment containing 3 examples. If you just want an example, here’s a post that contained the weird pop-in at the time I wrote this: Let’s think about slowing down AI
Can you give specific examples?
One thing this might possibly be is that there is a secret field for “custom highlights” for a post that admins can manually create, which basically only I-in-particular have ever used (although I might have set it so that Best of LessWrong posts use the description from their spotlight item?)
It took me a minute to find the post I saw, since I (incorrectly) assumed it was everywhere. I was browsing Ethical Design Patterns by AnnaSalamon and noticed a few:
*Historically people worried about extinction risk from artificial intelligence have not seriously considered deliberately slowing down AI progress as a solution. Katja Grace argues this strategy should be considered more seriously, and that common objections to it are incorrect or exaggerated. *
The field of AI alignment is growing rapidly, attracting more resources and mindshare each year. As it grows, more people will be incentivized to misleadingly portray themselves or their projects as more alignment-friendly than they are. Adam proposes “safetywashing” as the term for this [note: this one cutting off is what made me suspect it was automatic]
You might feel like AI risk is an “emergency” that demands drastic changes to your life. But is this actually the best way to respond? Anna Salamon explores what kinds of changes actually make sense in different types of emergencies, and what that might mean for how to approach existential risk.
Ah, yeah, I think we shouldn’t show the spotlight item summary on hover. Seems confusing and speaking about the article and author in third person feels sudden.
I’m honestly not really happy with describing the author in the third person in spotlight either, I think we should just try to find a different way of accomplishing the goal there (which I think is to avoid “I” speak which also feels jarring in the summaries)
I said similar elsewhere, but I agree that “I” speak would be really bad (don’t put words into the author’s mouth, especially in this case, where it would mislead the reader about the writing style/ability of the post), but I also think switching out the post for a summary is pretty jarring to begin with.
Since every post for years has been peekable as the first couple paragraphs, showing a summary unlabeled is always a jarring bait-and-switch
I do not have especially strong feelings about it besides the initial confusion. I think maybe confusion could be fixed just by saying something like “Featured post summary:” or similar, which would help explain why, when I expected the first paragraph of the essay, I’m reading a summary
Okay yeah those are all posts that won Best of LessWrong. We generate like 8 AI descriptions, and then a LW teammate goes through, picks the best starting one, and then fine-tunes it to create the spotlights you see at the top of the page. (Sometimes this involves mostly rewriting it, sometimes we end up mostly sticking with the existing one).
Ah, so it really was an AI summary of a post that is totally unlabeled? An amusing twist.
Even with what appears to be a fair amount of fine-tuning, it still reads like unlabeled AI text, which is maybe why I found it so jarring. Possibly a label could help, then?
(though honestly, it’s pretty weird to not see the first paragraph, so even if the AI thing doesn’t ring as important to you, some kind of differentiating label would be REALLY HELPFUL when the expected POV is altered after the fact.)
(RE parenthetical: I heavily suspect you know this, but switching the POV to first person would be much much much worse and would lead me to skip posts I would assume read like AI text)
No such thing exists! So my guess is you must have gotten confused somewhere.
I am glad to hear this! I am going to reply in the other thread off the parent, though, to keep any commenting together. Thanks for the clarity.
Hello everyone,
To be honest, I’m not entirely sure what to write here. I see that many of you have very interesting lives or are pursuing studies in science, which I find amazing.
Well, I’m 24 years old and from Chile. I’m finishing a degree in Cybersecurity Engineering after an earlier (and not very successful) attempt at programming. I’ve always been a very curious person — I love space, chemistry, physics, philosophy, and really anything that sparks curiosity. I don’t know a lot about each of those fields, but I truly enjoy learning; it makes me feel alive.
I hope to learn a lot here. Honestly, ChatGPT recommended this page to me after one of our long conversations, since it’s hard to find people who share these interests. I really enjoy learning constantly, and chatting with it is fun — but I’d love to meet real people to share ideas with.
I hope we get along well!
Hello everyone. I’m Ciaran Marshall, an economist by trade. I’ve been following the rationalist community for a while; the breadth of topics frequently discussed at rigour is unparalleled. I run a Substack where you can see my work here (in particular, I recommend the blog on how AI may reduce the efficiency of labour markets, as AI seems to be the most popular topic here): https://open.substack.com/pub/microfounded?utm_source=share&utm_medium=android&r=56swa
For those of you on X, here is my account: https://x.com/microfounded?t=2S5RSGlluRQX3J4SokTtcw&s=09
I was first introduced to the rationality community from reading Richard Hanania’s work. As we both share libertarian perspectives on the world, I naturally aimed to satiate my confirmation bias via reading public figures that agreed with me. However, I’m incredibly high (probably top 5%) in openness to experience, so I then read whatever was recommended to me. Then I gained a grasp of the core ethos of ACX and LessWrong, then read up on Kahneman and Tversky to identify as many cognitive biases as possible.
To me, the Bayesian epistemological framework makes sense: any empirical study (as economists are all aware of, and as most famously stated in SSX’s “beware of the man of one study”) can be “debunked” in the sense that there will always be flaws. The point is not to debunk one side vs another, but rather to ask, what is the probability that this claim is correct given the available evidence and what we already know (our prior knowledge, or base rates), which invokes a continuous as opposed to a discrete version of the ‘truth’. So we have a middle ground between frequentism vs radical skepticism, which I suspect is healthiest: our intellectual discourse is mired in polarisation, disinformation, and mistrust—in such an environment it’s easy to either 100% swallow a flawed argument or reject everything. This approach has been proven to work with superforecasting, which suggests this is the optimal epistemic framework to deploy.
So to summarise, I guess my primary motivation for finally signing up to this community is that I’m eager to learn, to satisfy my somewhat selfish desire for intellectual curiosity.
Hey everyone, I’m Deric. I’m new to LW but I’ve read through the new user guide and I’m very impressed and excited that a place like this actually exists on the internet. I was actually sent here after a conversation I had with Gemini regarding AGI and specifically instrumental convergence.
To preface, I’m a Game Designer (Systems/Monetization) from Winnipeg, I went to school for Electrical Engineering but didn’t finish my degree as I was offered a job in my current field that I couldn’t refuse. I had Gemini and GPT-5 do some deep research on the idea I had to see if it had already been discussed but it seems like it hasn’t. I was initially going to make a post about this but after reading through the etiquette here, I’ve decided to just post it in this open thread to not offend anyone.
I came about this idea by thinking about the alignment problem in an analogous way to a democracy and extrapolated what a superintellegent democracy might look like. For example there might be one person who really wants to turn the universe into paperclips and they form the Paperclip Party, but this would quickly gain 0 political traction, and thus, not be implemented.
In my opinion, in a perfect world, the voting people in a democracy would be voting for the policies they want to be implemented in their society, and they would not be subject to external pressures like their family, media bias, or cult of personality. So effectively, they would be given sets of policy plans, unlabeled from any party, and they would fully comprehend these plans, then using their own judgement based on their values and beliefs, rank them from 1-n where n is the total number of plans. After all plans were ranked the plan with the lowest total score would be selected and put in place.
Following this thought process to AGI would look something like this:
1. Create as many different sub-AGIs as possible that distill various positive human values. For example: AGI-Economy would approve of plans that promote a strong economy. AGI-Ecology would approve of plans that would have low impact on nature. AGI-Liberty would approve of plans that keep humans free. The list could go on. In addition to Ranked Choice Voting (RCV) they would also give each plan a utility score from 0-100 of how well the plan aligned with their goals, the reason for this will be explained in the next section.
2. Instead of asking the main superintellegence to make paperclips, you instead ask it to generate a fixed number of plans to put in place to maximise paperclip production. These plans would then be sent to every sub-AGI and they would do RCV on the plans. The main AGI would then be given its reward based on two factors:
a. Ensuring an even distribution of satisfaction of sub-AGIs that are above a certain threshold. I.e. If every sub-AGI gave a plan a score of above 90%, the main AGI would get a reward for that plan. If any single sub-AGI rated a plan below a certain threshold set by humans, for example AGI-Liberty scored a plan at 94 when it’s threshold for acceptance was 95, the plan would be rejected, but which sub-AGI that rejected the plan would be hidden from the main AGI to prevent reward hacking.
b. Ensuring an entropic distribution of plans such that it would receive a higher reward the more evenly a whole set of plans RCV scoring is returned. For example, it would receive a high reward if the set it generated had an even distribution of unique rank n plans across all sub-AGIs. This is to ensure it isn’t just generating the same plan repeatedly to maximize reward.
Edit: The sub-AGIs would not be able to talk to each other or even be aware of each others existence to prevent collusion. The main AGI would only be able to present plans to the sub-AGIs and receive scoring from them, not have any write access to their code, or have any awareness of what metrics those AGIs were using to determine reward score.
I’d be really interested to know what people think about this concept. I wholeheartedly apologize if this concept has already been discussed, but like I said I did some searching around and while parts of this are definitely not new, I think the whole approach might be novel. Thanks to everyone who reads this.
- Demonde
Hi there, this is Replitze. I’ll be learning firsthand how much of human behavior is theatre, I’m 19 years old, just graduated from high school, and starting a career in diplomacy. I’m here because I’m interested in biases, reasoning, and how people persuade themselves of beliefs they already hold. Despite my lack of expertise, I’ve observed enough people to pick up on patterns that others might overlook. I enjoy posing queries that expose presumptions because, although it can occasionally be awkward, that’s frequently where the insight is hidden. I want to thoughtfully contribute, absorb your viewpoints, and occasionally question ideas, not just for fun, but because clarity is worth a little conflict.
Hello! I’ve lurked here for ~2 years, found this via HPMOR. I think its funny that I’ve been to an IRL meetup before making an account.
Hi everyone! My name’s Matt. I hold a physics b.s. but have been working in international business development for the past 17 years. Physics, philosophy, and technology have remained passionate hobbies for me.
In college, I used to keep my philosophy major roommate up all night, confounding him with my imprecise applications and interpretations of what he was studying. I hope to continue that (...and improve...) here at LessWrong.
I found this community through AI. I was having a philosophical conversation with Google Gemini (my stand-in for a philosophy major roommate these days), and it suggested I share my thoughts here. So I will!
Looking forward to creative and stimulating discussion!
i’m a bad user but i’m looking for two buttons on my profile and can’t find them.
I would like to edit my user profile
I would like to form a Sequence out of two of my blog posts.
To edit your user profile, go to your profile at https://www.lesswrong.com/users/quinn-dougherty and then click “Account Settings”.
To create a sequence, go to your profile at https://www.lesswrong.com/users/quinn-dougherty and then click “Create New Sequence”.
You must be logged in for this to work.
I checked for both of those! tho the second time I found the edit profile part of “account settings”. That one is totally on me!
New sequence: again, not seeing it at my profile. I ctrl+f for “sequence” and get nothing.
Here’s what I see on my account page. Perhaps this is because I created sequences long ago and Lightcone has changed the UI for people without existing sequences.
I don’t have that option myself, as someone without existing sequences. However, a google turned up https://www.lesswrong.com/sequencesNew , which seems to do the trick.
This worked for me!
Great, thanks. Yeah I dont have a preexisting sequence yet. So this doesnt work for me
Go to the library page, scroll down until you hit “community sequences” and then press the button there.
Hi there, I finally created my account on LW for like five months after I discovered this whole thing about rationalism and the Sequences and EA and such through HPMOR (which I curiously found in the essay Why I Don’t Discuss Politics With My Friends—that is not somewhere you’d expect the name Harry Potter to occur! ).
And the first thing I found after logging in is that your website has a dark theme option. Darn, so what was the point of turning invert color mode on and off to get a dark background so that my eyes don’t hurt after spending another whole day “wasted” in LW reading all the tabs of ten posts mentioned in a post and ten other posts and outside links mentioned in the comments I opened yesterday before I registered?
I have not seen much written about the incentives around strategic throttling of public AI capabilities. Links would be appreciated! I’ve seen speculation and assumptions woven into other conversations, but haven’t found a focused discussion on this specifically.
If knowledge work can be substantially automated, will this capability be shown to the public? My current expectation is no.
I think it’s >99% likely that various national security folks are in touch with the heads of AI companies, 90% likely they can exert significant control over model releases via implicit or explicit incentives, and 80% likely that they would prevent or substantially delay companies from announcing the automation of big chunks of knowledge work. I expect a tacit understanding that if models which destabilize society beyond some threshold are released, the toys will be taken away. Perhaps government doesn’t need to be involved, and the incentives support self-censorship to avoid regulation.
This predicts public model performance which lingers at “almost incredibly valuable” whether there is a technical barrier there or not, while internal capabilities advance however fast they can. Even if this is not happening now, this mechanism seems relevant to the future.
A Google employee might object by saying “I had lunch with Steve yesterday, he is the world’s leading AI researcher, and he’s working on public-facing models. He’s a terrible liar (we play poker on Tuesdays), and he showed me his laptop”. This would be good evidence that the frontier is visible, at least to those who play poker with Steve.
There might be some hints of an artificial barrier in eval performances or scaling metrics, but it seems things are getting more opaque.
Also, I am new, and I’ve really been enjoying reading the discussions here!
I think the little scrollbar on mobile on the right side of the screen isn’t very useful because its’ position is dependent on the length of the entire page including all comments, and what I want is an estimation of how much more of the article is left to read. I wonder if anyone else agrees
I agree, but that’s controlled by your browser, and not something that (AFAIK) LessWrong can alter. On desktop we have the TOC scroll bar, that shows how far through the article you are. Possibly on mobile we should have a horizontal scroll bar for the article body.
I think the new “Your feed” is not paying attention to the tag based weighting from the first section and it probably should. I’ve got the AI tag very heavily downweighted, and Your Feed just offered me four AI posts out of the top five.
(I’m not opposed to reading any AI posts, I just want them to be less than half of my LessWrong experience. My ideal version of LessWrong gives me about one AI post for every ten posts.)
Yeah, agree we should integrate those weights somehow.
is there a search feature on lesswrong for like
so
from:@annasalamon burning manwhich greps through any anna salamon post that mentions burning man?it was about the contrast between “fixing electronics in all the dust for your friend’s art installation” and “expanding your mind, maybe with drugs” and how the physics-bound task keeps you from spiraling into insanity.
You’re looking for this section, btw
user:annasalamon burning man. Sadly, it doesn’t like we support quotes withuser:searches. I’ll file a bug about itHello! My name is Owen, I’m a 21-year-old physics and computer science student and newish here. I have been in parallel communities to the rationality scene since I was 16 and have a lot of experience speculating wildly about science fiction topics haha. Seriously though I think this community is interesting and has a wonderful goal of attempting to be less wrong :) I’m excited to participate more as I attempt to lose my pre-reserved judgements about the people in this community lol.
https://substack.com/@ravenofempire is a free substack where I argue with more artistic flair then allowed here. Anything further I post about is likely to be a LessWrong translated version of things I have posted there.
Hello! My name is Sean Fillingham. For the past 9 months I have been exploring a career transition into technical AI safety research, currently with a strong focus on technical AI policy and governance.
Previously I was an astrophysics researcher and I hold a PhD in physics. After leaving academia, while I was exploring a career transition into data science or ML engineering, I somewhat stumbled across AI safety and the EA community. The intention and ideals of this community have strongly resonated with me and I have since been focusing on finding my “personal Pareto frontier”, where my background and interests can have a meaningful impact.
I have an idea for my first post but I am a bit unsure if its a good fit for LessWrong, is there anywhere on the site where we can discuss/brainstorm ideas? I’m looking for some initial feedback or refinement of my ideas prior to posting.
I look forward to meeting, interacting, and learning from everyone as I further explore this space!
the quick takes section or open threads are both fine for requesting comment on drafts.
Greetings! In May 2024 I was recruited through my university’s literature department to work at a big name AI company, and it was a lot of fun to work on different models since then :) One of my incredible & inspiring leads (shout out to Alexandra!) ran a now-defunct blog (Superfast AI), which I discovered on the WWW last night. In my journey reading through the posts I managed to link up to LessWrong, and I am so excited to be here! It is going to be fun to read through these different ideas.
Some personal facts: I clicked “honesty” as a lesswrong sequence to look into, and I was shocked no one had written anything in years. I am glad this welcome page indicates life here and I hope to investigate / “aura farm” the pages. I’m reading Witold Gombrowitcz’s Diary right now, about at the halfway point. I am not Polish, but I am obsessed with this amazing narrator and his funny philosophies. He is wild, and I, having lived in Poland for several years (random U Chicago Humanities Program at Jagellonion over a decade ago), I really appreciate the viewpoints in his diaries, and I can’t stop laughing at his pettiness.
I’m not really a genius or anything and I feel like this blog will be super inspiring. I do consider myself a really middle of the road type person. I grew up in a small town of 400 people, and as a result I like to think of myself, even if it is probably totally cognizant dissonance) as the average consumer of the internet. I remember my first Myspace Account as a preteen and writing the HTML for my Tumblr in middleschool and highschool (where it perma rests today). I remember having to make my first fake facebook birthday so I could get an account before I was 13 yo and then subsequently adding 300 strangers in a rapid click fire in the night.
I love the internet, and I just want to make it a nicer place where people can come and find a world that they otherwise did not have access to. I’d really like to make a better life for the world wide web.
My name is Rose P. My email is rose@queen.house. I would like to explore the pages in this place & become a frequent visitor godwilling.
Hi everyone, I’m Gerson!
I come from an ML/AI/HPC background with experience in academic research, startups, and industry. I’ve recently gotten into mech interp and have found LessWrong to be a valuable platform for related literature and discussions; figured I should make an account. Looking forward to being less wrong :)
Who/what does the art on the different pages in lesswrong?
It’s done by the Lightcone team using various flavors of AI art tool.
AI interpretability can assign meaning to states of an AI, but what about process? Are there principled ways of concluding that an AI is thinking, deciding, trying, and so on?
If it can assign meaning to states, then sure why not? Currently this comes with plenty of caveats, so it kind of depends on how much you want to stick about principledness and effectiveness.
Sometimes “deciding” etc. is represented in the activations, which is kind of trivial. So you can also be asking about interpreting the parameters of the AI that transform one state to another. Keywords might be circuit interpretability, automated circuit discovery, or parameter decomposition.
The title of this thread breaks the open thread naming pattern; should it be Fall 2025, or should we be in an October 2025 thread by now? Moving to monthly might be nice for the more frequent reminder.
It looks like last year it was Fall, and the year before it was Autumn.
Ah, I think perhaps I was misreading the title as August instead of Autumn. If that is case, I prefer ‘Autumn’ :)
Hi, I’m Amina, I am new to LessWrong and my career has mostly been in Machine Learning Science and Engineering. I am fascinated by the field of AI interpretability and believe it is very useful for AI safety, which I care about a lot. I recently participated in Neel Nanda’s training phase and learnt a lot about current research directions, efficient techniques and recently wrote my very first post about unerstanding alignment faking. Looking forward to engaging in discussions, events and sharing my learnings.
Is there a download API? I’d love to download posts as Markdown if that’s already built-in. (Eventually, I’m working on integrating this into a tool which helps me make PDFs for my e-reader or for printing out with custom formatting).
LW uses graphql. You can follow the guide below for querying if you’re unfamiliar with it.
https://www.lesswrong.com/posts/LJiGhpq8w4Badr5KJ/graphql-tutorial-for-lesswrong-and-effective-altruism-forum (For step 3 it seems like you now want to hover over output_type instead of input)
I am curious if the people you encounter in your dreams count as p-zombies or if they contribute anything to the discussion. This might need to be a whole post or it might be total nonsense. When in the dream, they feel like real people and from my limited reading, lucid dreaming does not universally break this. Are they conscious? If they are not conscious can you prove that? Accepting that dream characters are conscious seems absurd. Coming up with an experiment to show they are not seems impossible. Therefore p-zombies?
idk about you, but the characters in my dream act nowhere near how real people act, I’m just too stupid in my dreams to realize how inconsistent and strange their actions are.
They certainly act weird but not universally so and no weirder than you act in your own dreams, perhaps not even weirder than someone drunk. We might characterize those latter states as being unconscious or semi-conscious in some way but that feels wrong. Yes, I know that dreams happen when you’re asleep and hence unconscious but I think that is a bastardization of the term in this case. Also, my intuition is that if a someone in real life acted as weirdly as a the weirdest dream character did, that would qualify them as mentally ill but not as a p-zombie.
Greetings all. My first visit, not sure where to put this Gen. Info. So will start here, and take guidance from participants, if there is a better thread.
I stumbled on this site after a friend suggested I research “Roko’s”. An interesting thought experiment, I enjoyed it but nothing worth loosing sleep over. Would be happy to discuss.
I am about 1 year into a manuscript (200 pages so far), dealing with all aspects of cognitive problem solving, via psychological self awareness, and how to debate, discuss issues with the understanding of our (humans) “default” mental and emotional “settings”. Which prevent enlightenment.
The 2 most common being
We are all predisposed to think in Binary terms; either/or, black & white, good or bad, etc. This is counter productive to accurate conclusions/assessments. A more accurate truth is: other than very few “Base Principles”, almost nothing in this 4 dimensional existence is truly binary. Almost everything is on a gradient. The problem with the auto-binary approach is it suggests “absolutes” where none exist. It takes intentional, mental effort to avoid this conceptual trap.
We are all predisposed to think in linear term, (beginning, middle, end), when in truth, the overwhelming majority of things in this 4 dimensional existence are cyclical, not linear.
*** What this means to the avg. Joe living his life, the majority of problems, situations, questions, you will ever have are most likely non-binary. If you attempt to solve a non-binary question, with a Binary state of mind, or a Binary answer, you will NOT be “LESS Wrong”. Square peg, round hole.
Same with attempts to solve a cyclical Q, with a linear mind set, or a linear answer, it simply can not be done accurately.
There are plenty of accurate statements of “absolute”, those are easy (with sentence modifiers). Then there are some statements that seem absolute, which aren’t, untill you add modifiers. IE: The speed of light is a constant.
While this is true, it is NOT the accurate truth, therefore NOT a constant. It needs a modifier to reach that level. IE: The speed of light, in the vacuum of space, is aconstant. NOW, you have “A truth” “A constant” a “solid base” from which further analysis either will or will not be supported.
*** For those who are of the opinion that there are NO absolutes, please understand, in order for you to affirm that, you would have to use a statement of absolute, thereby nullifying the very point you are trying to make.
The trick.… the really difficult (and fun thing for me) is to ID statements of absolute with zero modifiers...… that’s the challenge 😀.
That’s about. .1% of the subject matter I am writing about.
I am also quite comfortable discussing political or U.S constitutional issues. I am not emotionally invested in them, therefore a logical discussion is in my wheelhouse. (Frredom of speech, 2nd amendment, abortion rights, whatever.)
Fair winds to all, ---Tapske...
Contrary to what Wikipedia suggests, the people who enjoy discussing this topic on Less Wrong are mostly the newcomers who arrived here after reading Wikipedia. But we have a wiki page on the topic.
Another danger is that people who want to go behind the binary, often fall into one of the following traps:
Unary—“everything is unknowable”, “everything is relative”, etc.
Ternary—there are three values: “yes”, “no”, and “maybe”, but all the “maybe” values are the same
That is not a frequent topic here, for reasons. Maybe ACX is a better place for that.
I have an idea for a yaoi isekai. It’s a Tolkien/One Piece/New Testament crossover where you wake up as Peter Thiel, and the rival player characters are Greta Thunberg and Eliezer Yudkowsky. We can make this easily with Sora 2, right?