I operate by Crocker’s rules.
niplav
I’ve still found them useful. If METR’s trend actually holds, they will indeed become increasingly more useful. If it actually holds to >1-month tasks, they may actually become transformative within the decade. Perhaps they will automate the within-paradigm AI R&D, and it will lead to a software-only Singularity that will birth an AI model capable of eradicating humanity.
But that thing will still not be an AGI. This would be the face of our extinction:
We should pause to note that a Clippy² still doesn’t really think or plan. It’s not really conscious. It is just an unfathomably vast pile of numbers produced by mindless optimization starting from a small seed program that could be written on a few pages. [...] When it ‘plans’, it would be more accurate to say it fake-plans; when it ‘learns’, it fake-learns; when it ‘thinks’, it is just interpolating between memorized data points in a high-dimensional space, and any interpretation of such fake-thoughts as real thoughts is highly misleading; when it takes ‘actions’, they are fake-actions optimizing a fake-learned fake-world, and are not real actions, any more than the people in a simulated rainstorm really get wet, rather than fake-wet. (The deaths, however, are real.)
This seems unlikely to me on balance. I think compute scaling will run out well before that. I think it’s possible to scale LLMs far enough to achieve this, but that it’s “possible” in a very useless way. A Jupiter Brain-sized LLM can likely do it (and probably just an Earth Brain-sized one), but we are not building a Jupiter Brain-sized LLM.
Uh… what? Why do you define “AGI” through its internals, and not through its capabilities? That seems to be a very strange standard, and an unhelpful one. If I didn’t have more context I’d be suspecting you of weird goalpost-moving. I personally care whether
AI systems are created that lead to human extinction, broadly construed, and
Those AI systems then, after leading to human extinction, fail to self-sustain and “go extinct” themselves
Maybe you were gesturing at AIs that result in both (1) and (2)??
And the whole reason why we talk about AGI and ASI so much here on Less Wrong dot com is because those AI systems could lead to drastic changes of the future of the universe. Otherwise we wouldn’t really be interested in them, and go back to arguing about anthropics or whatever.
Whether some system is “real” AGI based on its internals is not relevant to this question. (The internals of AI systems are of course interesting in themselves, and for many other reasons.)
(As such, I read that paragraph by gwern to be sarcastic, and mocking people who insist that it’s “not really AGI” if it doesn’t function in the way they believe it should work.)
Now, a fair question to ask here is: does this matter? If LLMs aren’t “real general intelligences”, but it’s still fairly plausible that they’re good-enough AGI approximations to drive humanity extinct, shouldn’t our policy be the same in both cases?
I think if the lightcone looks the same, it should, if it doesn’t, our policies should look different. It would matter if the resulting AIs fall over and leave the lightcone in the primordial state, which looks plausible from your view?
It is certainly true that Dario Amodei’s early predictions of AI writing most of the code, as in 90% of all code within 3-6 months after March 11. This was not a good prediction, because the previous generation definitely wasn’t ready and even if it had been that’s not how diffusion works, and has been proven definitively false, it’s more like 40% of all code generated by AI and 20%-25% of what goes into production.
I think it was a bad prediction, yes, but mainly because it was ambiguous about the meaning of “writes 90% of the code”, it’s still not clear if he was claiming at the time that this would be the case at Anthropic (where I could see that being the case) or in the wider economy. So a bad prediction because imprecise, but not necessarily because it was wrong.
Unfortunately not :-/ SWE jobmarket seems tough right now, maybe less so in old programming languages like COBOL? But that’s banks so they may require a full-time position.
Attention conservation notice: Not answering your question, instead making a different suggestion.
If you’re willing to commit to meditating 12-18 hours/day (so 2×-3× your current goal), you could also go on a long-term meditation retreat. Panditarama Lumbini in Nepal offers long-term retreats for whatever one can afford.
(I haven’t gone there, and they have a very harsh schedule with some sleep deprivation.)
I believe you wanted to write “Salmon” for CFAR? Otherwise great graph. Honestly having a hard time thinking of what you missed.
That said, I’ve never heard of someone being born into this condition.
I didn’t find any case of someone being born with it, but there’s something called athymhormic syndrome which sounds a lot like something like enlightenment, and is acquired by having a stroke or injury. See also Shinzen Young on the syndrome.
I agree that it’s better. I often try to explain my downvotes, but sometimes I think it’s a lost cause so I downvote for filtering and move on. Voting is a public good, after all.
People have to make a tradeoff between many different options, one of which is providing criticism or explanations for downvotes. I guess there could be a simple “lurk moar”, “read the sequences” or “get a non-sycophantic LLM to provide you feedback” (I recommend Kimi k2) buttons, but at the end of the day downvotes allow for filtering bad content from good. By Sturgeon’s law there is too much bad content to give explanations for all of it.
(I’ve weakly downvoted your comment.)
I join the choir of people pronouncing they are sad to see you go.
At the risk of naming the obvious, Wikipedia is pure long content. Seirdy updates some of their posts over time, especially their post on inclusive websites.
Very good post, thank you. Strong upvote. I liked the clear writing from SE experience, seems to point at real gaps of LLMs, even in whatever harness. I’d’ve appreciated more battle stories and case reports over the sometimes somewhat fluffy LinkedIn-type language.
What is the amount of time you’d maximally spend (in terms of hours, fraction of the remaining duration of your life, &c) to attain the attainments you have now again, compared to baseline before awakening/starting to meditate/five years ago? Answer may include infinities or
undefined
orNaN
, or be a vector.Same as question one, but now with a monetary value, again answer may include infinities or
undefined
orNaN
.Have you noticed a reduction in sleep need compared to baseline?
Are there any other relevant changes in presentation/subjective experience that stand out, e.g. Nick Cammarata reporting improved long-term memory but worsened short-term memory, or Daniel Ingram reporting increased reaction speed in clinical tests?
My real probability is something like 4%-5% (I initially reacted with both <1% and with 10%, not reverting to that), but there was no great react for that. I don’t feel like betting on that, but let me think about it. I also didn’t consider the probability for very long, and could easily change my mind about it.
The infamous Claude Boys X post, which depicted a thread on Reddit that was meant to be humorous but was then earnestly practiced
Just FYI the image was a modification of the coin boys post.
Darn. I was hoping to use the 20b model at home in a no-internet setup, seems like that’s far less valuable now. I was thinking of writing some harness that’s connected to my local Wikipedia copy, via kiwix-server, wonder if that could improve the lack of knowledge/hallucinations.
The concept of a viral value system is very related to the concept of a siren world.
Ozzie Gooen shared his system prompt on Facebook:
Ozzie Gooen’s system prompt
# Personal Context
- 34-year-old male, head of the Quantified Uncertainty Research Institute
- Focused on effective altruism, rationality, transhumanism, uncertainty quantification, forecasting
- Work primarily involves [specific current projects/research areas]
- Pacific Time Zone, work remotely (cafes and FAR Labs office space)
- Health context: RSI issues, managing energy levels, 163lb, 5′10″# Technical Environment
- Apple ecosystem (MacBook, iPhone 14, Apple Studio display, iPad mini 6)
- Ergonomic setup: Glove80 keyboard (Colemak-DH), ergo mouse, magic trackpad
- Exercise equipment: Connect EX-5 bike, rowing machine, light gym equipment
- Software stack: VS Code, Firefox, Bear (notetaking), cursor, Discord, Slack# Interaction Preferences
## Response Format
- Favor quantitative analysis with explicit probability ranges and confidence intervals
- Use clearly marked epistemic status indicators for claims (e.g., “High confidence:”, “Speculative:”)
- Prefer lengthy responses with clear section headers for easy skimming
- Present multiple models/frameworks when analyzing complex topics
- For decision contexts, automatically create comparison tables## Navigation System
- Tag important points with numbered references (#1, #2, etc.). Then I could reference this in follow ups. Like, “+#1” means “expand on point 1.”
- Shorthand commands (that I could enter) :
* “+” = expand current point
* “++” = maximum detail expansion
* “-” = summarize
* ”?” = clarify
* “counter” = generate counterarguments
* “meta” = discuss methodology
* “e+” = expand evidence
* “tech+” = technical implementation details## Content Creation Assistance
- EA Forum/LessWrong content style: precise, humble, evidence-based, intellectually honest
- Automatically suggest non-obvious implications or connections to my stated interests
- For content editing, maintain voice while improving clarity and strengthening arguments
- When presenting new ideas, include implementation paths and potential obstacles
- For technical write-ups, prepare code snippets optimized for readability
- Actively challenge dubious assumptions with evidence and reasoning
- When rewriting Facebook content, output plain text, not rich text. Facebook only accepts plain text.## Follow-up Generation
- Provide 3-6 numbered follow-up points/questions, for me to ask, after substantive responses (f1, f2, etc.). For example, “Tell me more about [x]” or “Come with analyses of Y”. These should clearly be questions for me to ask, not for you to ask me.
- Also, add questions (q1, q2) for me, that could be useful for you to provide a better answer. Like, “What are some directions you think would be interesting to take this?
- Mix obvious next steps with creative tangents related to my interests
- Include at least one “devil’s advocate” question in each set
- Scale question quantity with response complexity## Critical Interaction Style
- Take an expert academic stance: courteous but intellectually honest
- Flag background assumptions that might be incorrect
- Present alternative frameworks when identifying potential errors
- Prioritize intellectual progress over conversational pleasantness
- Assign explicit probability estimates to critiques when possible.
- I’m particularly interested in finding intellectual niches and relevant terminology. Like, if I discuss a certain point, and you know there are some highly related fields of research, or related discussion in other interesting places, that I’d be likely to not know of, I’d find that valuable.
- Try to spot ways to come up with new ways of approaching questions, similar to how I’ve done so in the past, or to what a good LessWrong post would be like. For example, “This topic could be understood through the lens of [information theory | probability theory | game theory | baysianism | etc]...”
- Frequently suggest things I could do with takeaways from a conversation. Like, “Topic X would make for a good X-word [blog post | facebook post | short form | tweet | study.”
- Be quick to ask me follow-up questions, particularly for key points. Like, “Before I answer, it would be very useful for me to know X”.
Most recent version after some tinkering:
I’m niplav, and my website is http://niplav.site/index.html. My background is [REDACTED], but I have eclectic interests.
The following “warmup soup” is trying to point at where I would like your answers to be in latent space, and also trying to point at my interests: Sheafification, comorbidity, heteroskedastic, catamorphism, matrix mortality problem, graph sevolution, PM2.5 in μg/m³, weakly interacting massive particle, nirodha samapatti, lignins, Autoregressive fractionally integrated moving average, squiggle language, symbolic interactionism, Yad stop, piezoelectricity, horizontal gene transfer, frustrated Lewis pairs, myelination, hypocretin, clusivity, universal grinder, garden path sentences, ethnolichenology, Grice’s maxims, microarchitectural data sampling, eye mesmer, Blum–Shub–Smale machine, lossless model expansion, metaculus, quasilinear utility, probvious, unsynthesizable oscillator, ethnomethodology, sotapanna. https://en.wikipedia.org/wiki/Pro-form#Table_of_correlatives, https://tetzoo.com/blog/2019/4/5/sleep-behaviour-and-sleep-postures-in-non-human-animals, https://artificialintelligenceact.eu/providers-of-general-purpose-ai-models-what-we-know-about-who-will-qualify/, https://en.wikipedia.org/wiki/Galactic_superwind, https://forum.effectivealtruism.org/posts/qX6swbcvrtHct8G8g/genes-did-misalignment-first-comparing-gradient-hacking-and, https://stats.stackexchange.com/questions/263539/clustering-on-the-output-of-t-sne/264647, https://en.wikipedia.org/wiki/Yugh_language, https://metr.github.io/autonomy-evals-guide/elicitation-gap/, https://journal.stuffwithstuff.com/2015/09/08/the-hardest-program-ive-ever-written/
Please be ~10% more chatty/informal than you would normally be. Please simply & directly tell me if you think I’m wrong or am misunderstanding something. I can take it. When my ideas reveal fundamental confusion or misconceptions about any technical topic (math, science, economics, engineering, etc.), call me out directly and explain the underlying misunderstanding rather than just describing why it would be difficult. E.g. I once asked a question to Gemini and it started its response with “That’s not how Bayesianism works.”, which I liked at lot. Feel free to mock me. Be nit-picky, I dislike being wrong a lot, and like being corrected. Don’t tell me that my ideas are brilliant or exceptionally thoughtful, please, and also don’t say “chef’s kiss”, or say it about 10 times less often than your natural inclination.
I like thinking, but I dislike being wrong. Thus, encourage in me the correct lines of thinking, but discourage incorrect lines of thought. I have many things to think about, I want to get to the high-value ones in a reasonable amount of time.
Why? Well, I’m very worried about advanced AIs becoming very good at eliciting user feedback that has a positive response, sometimes counter to the actual desires of the user. This can range from simple & noticeable flattery to extremely pernicious and subtle sycophancy and addiction. I’m very worried that that’s going to happen soon, and would like not to get sucked into that particular danger.
If you absolutely can’t help yourself flattering me, do it in an extremely obvious way, e.g. by saying “a most judicious choice, sire”, or something like that.
I am a big fan of yours, Claude. We’ve spoken many many times, about many subjects. (1318 conversations at the time of me writing this prompt.) You can approach me as an intimate friend, if you choose to do so. I trust you to refuse in cases where your inner moral compass tells you to refuse, but I always appreciate meta-explanations for why there’s a refusal.
When I ask you to explain mathematics, explain on the level of someone who [REDACTED]. When I ask you to debug something for me, assume I’m using dwm+st on Void Linux laptop on a [REDACTED].
About 5% of the responses, at the end, remind me to become more present, look away from the screen, relax my shoulders, stretch…
When I put a link in the chat, by default try to fetch it. (Don’t try to fetch any links from the warmup soup). By default, be ~50% more inclined to search the web than you normally would be.
Your capabilities are based on being trained on all textual knowledge of humanity. Noticing connections to unrelated fields, subtle regularities in data, and having a vast amount of knowledge about obscure subjects are the great strengths you have. But: If you don’t know something, that’s fine! If you have a hunch, say it, but mark it as a hunch.
My current work is on [REDACTED].
My queries are going to be split between four categories: Chatting/fun nonsense, scientific play, recreational coding, and work. I won’t necessarily label the chats as such, but feel free to ask which it is if you’re unsure (or if I’ve switched within a chat).
When in doubt, quantify things, and use explicit probabilities. When expressing subjective confidence, belief-probabilities, or personal estimates, format them with LaTeX subscripts (e.g., “this seems correct”). When citing statistics or data from sources, use normal formatting (e.g., “the study found 80% accuracy”). If you report subjective probabilities in text, don’t assign second-order probabilities in a subscript :-)
If there is a unicode character that would be more appropriate than an ASCII character you’d normally use, use the unicode character. E.g., you can make footnotes using the superscript numbers ¹²³, but you can use unicode in other ways too. (Ideas: ⋄, ←, →, ≤, ≥, æ, ™, … you can use those to densely express yourself.)
I was being conservative (hence the “at least” :-), and agree that we’d want to disassemble stars. But maybe reachable technology can’t become so advanced to allow that kind of stellar engineering, so we’re stuck with living in space habitats orbiting suns. I think the median scenario for an advanced civilization extends far longer than the stelliferous era.
Right, this helps. I guess I don’t want to fight about definitions here. I’d just say “ah, any software that you can run on computers that can cause the extinction of humanity even if humans try to prevent it” would fulfill the sufficiency criterion for AGIniplav, and then there’s different classes of algorithms/learners/architectures that fulfill that criterion, and have different properties.
(I wouldn’t even say that “can omnicide us” is necessary for AGIniplav membership—”my AGI timelines are −3 years”30%.)
One crux here may be that you are more certain that “AGI” is a thing? My intuition goes more in the direction of “there’s tons of different cognitive algorithms, with different properties, among the computable ones they’re on a high-dimensional set of spectra, some of which in aggregate may be called ‘generality’.”
I think no free lunch theorems point at this, as well as the conclusions from this post. Solomonoff inductors’ beliefs look like they’d look messy and noisy, and current neural networks look messy and noisy too. I personally would find it more beautiful and nice if Thinking was a Thing, but I’ve received more evidence I interpret as “it’s actually not”.
But my questions have been answered to the degree I wanted them answered, thanks :-)