Socially Immunocompromised 19M Nerd
Salmonus_Kim
I’d like to add that, in addition to the bottom-up approach you mentioned, a top-down approach also exists. To extend the biology analogy you used, it’s like how we can study human diseases by experimenting on mice, but we can also directly dissect human cadavers. The latter sometimes provides much clearer answers. That said, I really like this effort to define the terms. Sorry if this comment feels off-topic for a post about model organisms!
Isn’t the prefill approach you mentioned ultimately just a variant of SFT? It would be great if it works well, but without directly forcing the behavior itself, I’m concerned we might not be able to sufficiently simulate the severity. I liked that you proposed alternatives beyond the standard benchmarks by adding μ-decisiveness and perplexity on webtext. However, it’s not clear whether the problems observed in current model organisms would also appear when using the prefill method. Could you elaborate on that point a bit more? Other than that, I found the explanation really helpful!
I’ve been considering an alternative approach: shifting towards exclusively local AI usage. Much like the transition from plastic to paper straws, this could be a proactive measure. The probability of a superintelligence developing on a personal device is quite low, and if such a model remains offline, the potential for catastrophic threats is largely mitigated. I’d be interested to hear your thoughts on this.
Ultimately, it seems to be a matter of ad hominem. While we are taught that it is a logical fallacy, it is really just a form of self-referential attack against the other person—not a fallacy in and of itself. However, from a meta-perspective, if you want to keep the conversation going, it is best to provide a way for the other person to escape that self-referential trap. For instance, as you mentioned, you might say something like, “Your perspective on that matter seems a bit naive.”
Recently, on June 3rd, South Korea held its local elections. Despite having just reached adulthood and gaining the right to vote, I did not participate. Thinking from a ‘LessWrong-style Bayesian perspective,’ I doubted the actual contribution and significance of my single vote. I didn’t even listen to my father’s encouragement to vote, which was sparked by his fascination with a candidate he saw on YouTube Shorts. That being said, I will likely participate in the next election season. Thank you for the insightful read!
That is quite fascinating. However, as you noted, there is no single ‘IQ-like’ metric for AI intelligence, and an AI can be exceptional in certain areas while struggling significantly in others. With that in mind, is there a clear criterion for dividing AI into ‘Teacher’ and ‘Student’ roles?
The parameter count is the first thing that comes to mind, but as you know, a higher number of parameters in an LLM doesn’t unconditionally guarantee a smarter model.
Furthermore, in Idea 4, you mentioned capping the Student AI’s intelligence at a level where alignment audits are still viable(i.e. auditable). It seems to me that we would still need some kind of metric to measure the LLM’s intelligence for this purpose. We can’t simply rely on benchmark scores, as we’ve seen far too many instances where they bear little relation to an AI’s actual ability to solve real-world problems.
In my view, the biggest limitation of the Logical Inductor lies in its failure to satisfy Desiderata 15 and 17. What is the ultimate value of a system if it lacks metacognition—the ability to determine on its own which problems are genuinely important—and fails to emulate the way human collective intelligence solves problems (akin to how Einstein’s theories revolutionized Newtonian mechanics)?
I believe an algorithm capable of distinguishing between a “larger problem” and a “smaller problem” is necessary. For instance, to prove that “primes are infinite,” one must first establish that “natural numbers are infinite,” with the latter being an explicitly “smaller” problem. I felt the paper did not sufficiently address this methodology.
To use a military analogy, the paper reads like a tactical manual that only instructs on “how to charge.” While it may guarantee victory in theory, strictly adhering to it in practice would result in total annihilation before advancing even a single meter. The true significance of this paper lies in its enabling of “friend-or-foe identification.” In other words, it prevents the system from falling into self-contradiction—a logical trap caused by miscalculating probabilities when unable to discern whether an unfamiliar mathematical formula is true or false.
Put simply, it reconciles beliefs so that logical propositions do not contradict one another. Given infinite time, its judgments are practically indistinguishable from having complete, pre-existing information (much like the relationship between 1 and 0.999). Returning to the military metaphor, this could be expressed as: “Our troops won’t engage in friendly fire, and if time and resources are infinite, there is no terrain we cannot overrun!”
The mechanism for achieving this is akin to hiring more part-time auditors than the actual number of soldiers, rewarding them whenever they catch a soldier breaking formation. This reward is deducted from the soldiers’ pay, and the ultimate goal of the Logical Inductor is to reduce this “payout” to zero dollars. The reason the Logical Inductor is rarely discussed in modern computing environments is that the industry views the Scaling Law as far more efficient—preferring to allocate computational resources toward expanding the number of “troops” rather than investing in “auditors.” Granted, this means the formation may become disorganized (i.e., hallucinations will occur). This aligns perfectly with Richard Sutton’s insight in The Bitter Lesson: “The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin.”
Following logical induction, subsequent research should have focused on guiding these “traders” to incorporate accumulated mathematical proofs from the outset, rather than forcing them to learn completely from scratch through trial and error. I am aware that such research does exist. While many blindly worship the Scaling Law, if we look at the next 30 years of mathematical progress and ask whether we should “support a mathematician like Terence Tao or build another data center” (assuming equivalent costs), the pragmatic choice would undoubtedly be the former. This is a bitter truth that Big Tech must eventually confront. In that regard, I completely agree that if we ever want to reach a point where “building a data center is cheaper than supporting the greatest mathematician of our time for 30 years to advance pure mathematics,” we must pursue research in the direction of logical induction rather than relying solely on the Scaling Law.
Looking back at what I wrote, I think this could be triggering to some people. Please know that nothing serious actually happened and these are entirely my own thoughts. I hope you won’t take it too seriously.
Let me add an interesting perspective to this. The reason I dare to call my own viewpoint ‘interesting’ is because it’s one you don’t come across very often. I was assigned male at birth and am currently in my first year of feminizing hormone therapy. This means I’ve been taking prescribed anti-androgens and estrogen for about a year now. That process has brought about a massive shift in how I experience physical attraction to others.
To cut to the chase, I completely agree with your post! While your post included a brief survey to reduce sampling bias, my own experience observing men also tells me that situations where ‘things progress beyond flirting just through eye contact’ are incredibly rare. Eye contact is important, of course. But I view it more like a handshake in a business setting. It’s not the ‘deal’ itself. I also agree that just because you aren’t inherently ‘hot’ in physical traits that are hard to change, you shouldn’t just let yourself get out of shape and excuse it by saying, ‘I’ll invest in other types of charm instead.’
I’d like to share a recent experience regarding this. A while ago, I had a psychiatric appointment. The doctor was a middle-aged man with a ‘Nordic god’ build—chubby but with solid, muscular forearms—and he was wearing a mask that made it hard to see his face. However, I quickly noticed that he wasn’t the typical psychiatrist who just hands out pills and rushes you out the door. When I asked him about it, he explained that he used to be a professor at a university hospital, got completely fed up with that approach, and now genuinely wants to counsel his patients.
Right off the bat, I found this far more attractive than awkward eye contact (I was actually looking away the whole time). He examined my chart with clinical detachment, but his prescription was based entirely on our lengthy conversation. His office was packed with heavy medical textbooks, yet playfully decorated with Legos. His build was rugged, but I could sense a delicate quality in his eyes. Faced with these contradictory charms, I found him ‘sexy.’ Honestly, if he hadn’t been wearing a doctor’s coat, I would have slept with him.
As this example shows, looks aren’t everything. He was chubby, and his face was covered, after all. However, his physical appearance interacted with his other contradictory traits (his gaze, his build, etc.) to play a decisive role. Walking out of that clinic, it hit me that this kind of delicate dynamic is what truly determines whether a woman will sleep with a man or not.
I really enjoyed reading your post.
This is really great. I think it’s a wonderful idea to explore various worldbuilding concepts, rather than limiting people’s imaginations to just a single possibility. Come to think of it, it actually makes me wonder why no one has tried this before. Once again, I really enjoyed this. Please keep developing it!
In the recent Gemini update, the ability to clear chat history all at once has been removed. Now, one must navigate to the settings, select ‘Delete Activity,’ and go through several additional steps, which I find rather frustrating. Furthermore, I have strong doubts as to whether deleting this data actually removes it entirely from Google’s databases. I suppose I could simply stop using Gemini, but since this is more of a minor inconvenience than a critical issue, abandoning the platform altogether seems excessive. What are your thoughts on this matter?
Thank you for saying that. In fact, I believe this is exactly why many colleges still utilize a ‘holistic review’ process in their admissions, evaluating essays and extracurricular activities rather than just relying on numbers.
Numbers like GPAs and AP scores certainly demonstrate a student’s academic capabilities. However, through essays, colleges want to see how foundational knowledge—such as the fascinating yet scientifically obvious truth that a tree is mostly made of air—supports and shapes a person’s life values. College is a marathon, not a sprint. And I believe it is precisely this kind of wisdom that helps us complete that marathon.
I am currently 19 years old and will be starting college as a freshman this September, which is probably why your writing resonated with me so deeply.
I also read the Aeneid while studying Latin. It isn’t hard to recognize the symmetrical structure of the line ‘spem vultu simulat, premit altum corde dolorem’ in Book 1, line 209. You might also know—whether from being taught or simply by intuition—that this reflects Aeneas’s internal state at the time. To be fair, in the context of studying Latin, that qualifies as ‘basic knowledge.’ I don’t want to deny that this played a huge role in helping me read the Aeneid to the very end. That specific knowledge wasn’t actually all that useful for getting a high score on the AP Latin exam. Ultimately, however, that seemingly trivial piece of information helped me finish the book, and it undoubtedly gave me the motivation to master the rest of the material needed to get a good grade. I’m not entirely sure, though, if my experience relates to the aspiring AI alignment researchers you mentioned.
This is excellent. It seems like it will serve as a great guideline for anyone interested in the field to help them memorize genetic maps.
However, have you considered the narrative aspect? For example, the reason it is relatively easy to memorize the family trees in Game of Thrones—which you used as an example—is that the individual storylines are organically intertwined. The rest of the narrative cross-verifies that Event B occurred between Event A and Event C.
From a layperson’s perspective, if your goal is to make domain knowledge more accessible to non-experts, it might be helpful to lean a bit more heavily into personification. For instance, if there is a connection between ‘Oncogene-induced apoptosis’ and the ‘Enucleation of an erythrocyte,’ you could link them to relatable, everyday events so that remembering one naturally helps you remember the other.
If this kind of approach doesn’t align with your intended purpose, please feel free to disregard my suggestion. Overall, this is a truly great initiative!
Here is something we need to keep in mind: not all powerful entities will always act in a way that actually resolves problems. Just because they are massive organizations (like the IT companies or the WHO used as examples) doesn’t mean they all think, ‘We must serve the public good of humanity.’
Outwardly, of course, that is what they profess. However, an organization is a collective of various stakeholders, and those who actually hold significant stakes might only be interested in downplaying the issue. In such cases, making excuses just for the sake of making excuses is, from their perspective, essentially ‘an explanation that solves the problem.’
I’m not sure if what I just said commits the very error you pointed out in your text. If it does, that would be quite interesting.
I agree. I haven’t used Claude myself, but with the latest versions of Gemini, I’ve definitely felt like it tries to keep users trapped inside their own ‘bubble.’ What I mean is, it seems to be getting better at packaging potentially biased conversations with the user as if they were absolute, universal truths.
I think this is largely because LLMs are becoming increasingly commercialized. After all, one of the greatest pleasures a person can experience is having their own beliefs validated. The dilemma for Big Tech companies is that they need to attract casual users somehow, so it looks like they’re tuning their models to appeal to the masses who just want to use LLMs as echo chambers for their confirmation bias.
The problem with this, as you pointed out, is that it leads in a less trustworthy direction. Even if the older models were more rigid, I completely agree that when it comes to situations requiring genuine ethical reliability, the newer models aren’t the ones you can trust.
If you disagree, I’d appreciate if you could point out where the reasoning feels off.
I’d say I’m closer to Camp B. I get, at least conceptually, how we might arrive at ASI from Eliezer’s earlier writings—but I don’t really know how it would actually be developed in practice. Especially when it comes to the idea that scalability could somehow lead to emergence or self-reference, I just don’t see any solid technical or scientific basis for that kind of qualitative leap yet.
As Douglas Hofstadter suggested in Gödel, Escher, Bach, the essence of human cognition lies in its self-referential nature, the ability to move between levels of thought, and the capacity to choose what counts as figure and what counts as ground in one’s own perception. I’m not even sure AGI—let alone ASI—could ever truly have that.
To put it in Merleau-Ponty’s terms, intelligence requires embodiment; the world is always a world of something; and consciousness is inherently perspectival. I know it sounds a bit old-fashioned to bring phenomenology into such a cutting-edge discussion, but I still think there are hints worth exploring in both Gödelian semiotics and Merleau-Ponty’s phenomenology.
Ultimately, my point is this: just as a body isn’t designed top-down but grown bottom-up, intelligence—at least artificial intelligence as we’re building it—seems more like something engineered than cultivated. That’s why I feel current discussions around ASI still lack concreteness. Maybe that’s something LessWrong could focus on more—going beyond math and economics, and bringing in perspectives from structuralism and biology as well.
And of course, if you’d like to talk more about this, I’d really welcome that.
If we look at this issue not from a positivist or Bayesian point of view, but from an existentialist one, I think it makes sense to say that a writer should always write as if everyone were going to read their words. That’s actually something Sartre talks about in What Is Literature?
I realize this might sound a bit out of tune with the LessWrong mindset, but if we stick only to Bayesian empiricism or Popper’s falsifiability as our way of modeling the world, we eventually hit a fundamental problem with data itself: data always describe the past. We can’t turn all of reality into data and predict the future like Laplace’s demon.
Maybe that’s part of why a space like LessWrong—something halfway between poetry (emotion) and prose (reason)—came into being in the first place.
And yes, I agree it might have been better if Yudkowsky had engaged more concretely with political realities, or if MIRI had pursued both the “Camp A” and “Camp B” approaches more forcefully. But from an existentialist point of view, I think it’s understandable that Eliezer wrote from the stance of “I believe I’m being logically consistent, so the world will eventually understand me.”
That said, I’d genuinely welcome any disagreement or critique—you might see something I’m missing.
Wow, this is a really fascinating post. Even just working out the relationship between the error budget and the waterline feels like a significant contribution, especially since it’s in the Gaussian case as you mentioned. That said, I’m curious whether there’s an exact formula for how the waterline is set depending on the amount of error. It would be great to see that as well. Really enjoyed reading it!