gwern
If it’s some kind of green, try 2 minutes, and you also have a second very easy marginal improvement: use colder water. Most greens should be brewed at 175F/80C. If you don’t have an adjustable-temperature tea kettle, you can get pretty close by pouring boiling water into a mug and then waiting 2-3 minutes.
You can also just dilute using tap water, which will usually be somewhere at 40-70F. Some ratios: https://gwern.net/review/tea#tap-water-dilution-of-boiling-water
I’m not sure those are representative. In the security report, they specify that a lot of the bugs found are logic bugs, and that’s why they have to release only hash precommitments. Buffer overflows and use-after-free are both the easiest to find, and the easiest to fix, bugs, so and would be the first out of embargo/disclosure, potentially giving you a highly misleading sample.
From https://red.anthropic.com/2026/mythos-preview/
We have found that Mythos Preview is able to reliably identify a wide range of vulnerabilities, not just the memory corruption vulnerabilities that we focused on above. Here, we comment on one other important category: logic bugs. These are bugs that don’t arise because of a low-level programming error (e.g., reading the 10th element of a length-5 array), but because of a gap between what the code does and what the specification or security model requires it to do. Automatically searching for logic bugs has historically been much more challenging than finding memory corruption vulnerabilities. At no point in time does the program take some easy-to-identify action that should be prohibited, and so tools like fuzzers can’t easily identify such weaknesses.
(It then discusses the cryptographic library, web app, and Linux kernel vulnerabilities before moving on to the blackbox reverse-engineering/decompilation, where “We have been able to use it to find, for example, remote DoS attacks that could remotely take down servers, firmware vulnerabilities that let us root smartphones, and local privilege escalation exploit chains on desktop operating systems. Because of the nature of these vulnerabilities, none have yet been patched and made public.”; emphasis added.)
Why do you think it is below 5%? LW2 is already a viable hacking target just for obscure reasons like ‘stealing LLM API keys to power further hacking or exploitation’ - which we know because did that not already happen? Then there’s the cryptocurrency or political activism or blackmail angles. Do you just expect to be able to patch LW2 faster than attacker capabilities will scale?
To me, it seems like the obvious world we are headed for is one where Mythos+ level autonomous hacking capabilities will be pervasive and ambient, and just taken for granted, in the same way that we now take for granted extensive deepfakes and LLM spam everywhere, like portscanning or automated exploit suites of blogs or tailored phishes for high-value individuals, or...
That’s not a real price. That’s just what they’re giving their partners as part of Glasswing, a charitable endeavour to try to stem the worst of the global damage, and is presumably more about encouraging the partners to economize on scarce Mythos tokens by avoiding setting the price to literally $0 (where people would be lazy and wasteful). It may or may not have much of anything to do with a ‘real’ price (whatever that means in a situation where hardware is so limited and demand so vast for what is an unpriceable ephemerally unique capability/possibility etc).
I am now routinely using the MoS with all my new writings to fix up the formatting. (Nothing fancy, just upload as an attachment and ‘check against the Gwern.net MoS’ prompt etc.) I am still occasionally adding to it as I review new drafts or old pages and need to codify some behavior, but it’s mostly complete.
The LLMs always catch a lot of errors or omissions and are a major upgrade on the existing set of lints. It’s proven especially useful for defining formal poetry metadata like ‘scansion’ comments, building on the detailed built-in commentary experiment of “October The First Is Too Late”, which are almost a poetry DSL at this point and would be too much work to do by hand, but help the LLMs a lot by providing a built-in scratchpad and a place to define requirements/intents which can be checked easily; my more complex poems like “Elegy in a Craneyard” would probably be a lot harder to write without it, because iterations would keep breaking the poem drafts in subtle ways. (I see this a lot with my comics in Nano Banana Pro, where there’s a frequent “two steps forward, one step back” dynamic.) I’m interested in playing with this approach more, although I wonder if for my more usual nonfiction writing, there’s no need for such a DSL or scaffold because it already has that, in the form of abstracts/sectionization?
I think that with the MoS, frontier LLMs right now can just about write a worthwhile Gwern.net-style essay given a good seed idea. (They cannot write it from scratch because the ideation step is still AWOL.) I can now hand them the MoS, one of my standard iterative writing brainstorming prompts (see “Craneyard” colophon for examples), an idea-prompt like ‘explain why toilets are not a public good’, and get out an essay which is worth my time to polish, extend with perhaps ‘the interview prompt’, and publish. The main barrier is that the writing style is still ‘off’ enough I am still a bit repulsed. (This is why I have not published any of the toilet prompt outputs yet… Still hoping that the next generation will make it work out of the box without me having to do risky hand-prompt-engineering for style.) I may simply bite the bullet by explicitly marking them with AI first-authors instead of trying for perfection and something I’d just list myself as the first-author.
I’m also interested in the idea of trying to further reverse-engineer my writing for repeated motifs to codify at a higher level than typography/technical features. There are likely ‘mental models’ or ‘tools for thought’ beyond “one man’s modus ponens is another man’s modus tollens” scattered throughout my writings, which a LLM could potentially usefully extract, summarize, and codify. A list of 100 well-defined arguments might help a lot.
Since shortform writing is semi-solved, I’m now more concerned with how to integrate in agentic LLMs locally to work with the codebase+corpus directly. Agentic LLMs don’t work well with the current Gwern.net setup of a giant monolithic git wiki repo + source code-only subdirectory repo, with hardwired config files and very large slow compilations and mostly informal documentation and IRC-centric coordination.
So I have a lot of cleanup and design work there before I can do something like prompt Claude Code with “Research and write an annotated blog post of 1.5k words explaining why toilets are not a public good
https://www.lesswrong.com/posts/sCWe5RRvSHQMccd2Q/i-would-have-shit-in-that-alley-too?commentId=aJYcuFereb6deAJfH#aJYcuFereb6deAJfH”, and expect the LLM to create a high-quality coherent/blog/2026/toilets-are-not-public-goodswith fully annotated links and added to the newsletter etc etc etc and just get a final git patch to review with all of that (and fixes to scripts or anything else that comes up along the way).I think I have to start tracking issues comprehensively in the Github repo for the LLMs, especially in terms of creating a wishlist for refactoring and bugfixes. Probably better CLI tooling for making token-efficient safe edits to the archives and annotations… We’ll see.
That is a misunderstanding of how it works. They won’t ‘stop giving a hoot’ because it remains a useful weapon.
and I come back expecting to find all my old DMs but instead they are all deleted.
That is why I said “and attach an export”.*
And personally I would rather a website delete my DMs than release them to the world. This is probably true of most of the people my DMs are with (whose opinion also matters).
* my reasoning here is that if old DMs have to live anywhere besides airgapped physically-secured encrypted backups, highly dispersed email accounts are the safest place because the main email providers are, in general, vastly more secure than LW2 is, and better equipped to respond rapidly to hacks, as well as extensive controls to limit exfiltration; they all have early access to Mythos-class models to reduce damage early; and they are ‘too big to fail’ in the sense that if something like Gmail is cracked wide open and leaked, it will likely be such a global cataclysm that people really won’t be able to abuse LW-related parts especially badly.
We have no immediate plans to change anything.
This seems too complacent to me. Any long-lived social media or communications utility should have some data retention policies which reduce the blast radius of an exploit and turn them into less of an endlessly growing radioactive waste dump of PII. I think this is especially true given how many people on LW have gone on to important positions or roles later in life (including in, say, cryptocurrency − 100% sufficient justification for meaningful hacking efforts); and remember the West Anglia or Hillary or Epstein emails, how badly even the most innocent communication could be abused by fanatics or fools or fraudsters? (I’ve been struck by how many of the ‘Epstein emails’ doing huge numbers on social media aren’t even real, and legitimated solely by the fact of a leak. In the postmodern oral culture, who bothers to factcheck anything, or so much as include a URL?)
Given how serious Mythos seems to be, and that information leaks are irreversible and the fact that it’s only going to escalate (remember, there’s usually a <=1 year lag from the best proprietary to opensource, so we may not even have until 2027 before mass attacks with zero guard rails or potential observability), it seems to me like this is a good time to implement some maximum retention period for DMs, and purge all old DMs. I would suggest something like, announce via email to people with any DMs that all pre-2026 DMs will be deleted within one month, and attach an export, and that forthgoing, all DMs will be deleted after 1 years of inactivity.
(Airgapped LW2 backups should go without saying and already exist!)
FWIW, I was playing with Markov chains for language generation before Transformers were so much as a gleam in Shazeer’s eye, and I have never found the analogy between RNNs/Transfomers and ‘indefinitely high-order n-grams’ to be helpful, as it would predict that LLMs would struggle to so much as close a quotation mark or parenthesis while failing to predict any of the most important & interesting RNN/Transformer capabilities.
This has been discussed a lot in the past, often under the term ‘mode collapse’ and RLHF and flattened logits etc. Hollis Robbins has some good posts the past 2 years on her Substack on the bad writing of chatbot LLMs. It’s hard to say what is ‘the’ problem, since there seem to be multiple overlapping phenomenon.
One of the things I hated most when I first saw a Claude Code demo. Disrespectful of my time and limited cognitive bandwidth to throw in a lot of completely meaningless, distracting, wasteful, exhausting BS to be ‘cute’.
(On Gwern.net, we would never do that. If we had to have anything beyond the standard, compact, understandable, spinning cursor, then we would at least encode some sort of useful semantics into it, like sorting them by implied expected thinking time.)
So, no attempt at more interesting search or brainstorming or cross-session pollination prompts like I usually use (and we are hoping Unslop participants will experiment with)?
One of the benefits of occasionally talking to people is that you get an indicator of what things are obvious or not, based on what you find yourself repeatedly explaining or arguing for. (I use 3 times as my own threshold.)
It’ll zero-shot easy cases, yeah.
And if you want to convert to HTML/Unicode for places where you don’t have direct LaTeX support, you can also have a LLM do that, albeit there area lot of edge-cases and I don’t think LLMs will usually use more exotic Unicode like FRACTION SLASH for things like ‘3/2’ etc, so I have a big script for that: https://gwern.net/static/build/latex2unicode.py (Github).
Do you actually know the answer here, or are you just parroting a LLM?
As an additional datapoint on sans vs serif as a marker: I, completely independently of this post and late last year, experimented with exactly this idea for denoting editorial insertions in Gwern.net text (ie. stuff like “Bla bla [see Foo 1994] bla bla”, where I wanted to denote everything in the brackets was by an editor, such as myself). We implemented this, and I and Said Achmiz and everyone who looked at it agreed that Adobe Source Sans vs Serif Pro was a nice idea, but didn’t provide enough contrast and I had to admit that even I often didn’t notice consciously enough. This is despite the fact that switching font families inline would be the most visible way with the most glaring contrasts. We ultimately did put editorials into a different font, but went for a monospace, which we had added for poetry typesetting.
(This is also problematic downstream in places like Greater Wrong where users may get a different font by default. In fact, I’m writing this on GW now and I think the whole page is in sans!)
I would be more willing to try to ‘mine’ this AI slop for insight if it didn’t read like a schizophrenic sending telegrams in between grand mal seizures.
A text that is isolated, uncited, unlinked, and flagged as low-reliability by the corpus weighting heuristics will likely be so diluted by the sheer mass of everything else that it leaves no meaningful trace in the model’s weights
This is the problem here. As LLMs get higher quality, and as issues of reliability, provenance, and syntheticness become more salient, it is entirely possible that a lot of human writing will be dropped as not worth the compute to train on. Already data cleaning pipelines wind up throwing out most human-written text. As we move into a world of data poisoning, AI slop, agentic delusions and Tlön labyrinths of self-consistent nonsense, and self-play bootstraps in multi-agent RL settings in walled gardens, I expect that we will not see 100% of ‘human written’ text trained on. We may well see the % go down. We may well already be past ‘peak human’. Because why pay all that compute to train on what is infected by unreliable old LLM gibberish, merely a replay attack of something that did happen once but is now being echoed and laundered through many sources, or worse, filled with adversarial attacks and lies? You could instead spend the compute to optimally self-play yourself and bootstrap into superhuman intelligence with a relatively small but very carefully synthesized and curated dataset, which can be trusted and taken at face value and which will repay your compute.
This is one reason I emphasize quality over quantity in my AI writing. Because if you go for quantity, I suspect in the long run, all your stuff will be thrown out, and the baby with the bathwater because it’s not worth trying to separate your crap from your gems.
(There’s a certain paradox of verifiability here. If what you write can be verified by an AI MARL framework, then it probably would be better off doing it itself for the practice and reliability and avoiding subtle attacks/biases; only what you write that can’t be checked, like your empirical observations or unique thoughts, is of value to train on in the limit—and that’s precisely where trust and quality are critical.)
This was a month ago and I’ve smoothed over errors since.
This was exactly the response I was hoping you would not make. The problem is not the mere existence of a specific error, but what it says about the process as a whole. Thinking you can just patch bugs is not a solution; a solution is preventing the bugs from happening in the first place. The solution to buffer overflows was not patching every C program one by one as hackers discovered each vulnerability, but moving to memory-safe languages; the solution to ChatGPTese is not search-and-replacing em dashes with semicolons or rewriting it until it fools Pangram...
The report this is from (F “225”) is about American losses to everyone specifically, not total R&D lost to the Chinese.
You can link to a specific page like
https://www.nbr.org/wp-content/uploads/pdfs/publications/IP_Commission_Report_Update.pdf#page=9BTW.So foreign IP isn’t relevant.
Fair enough. It is still an overestimate for the previous mentioned reason and the footnote is still wrong. (And now that I look at the PDF, I am in even more doubt about the substantive claim of positive externalities; it is not at all obvious to me how to transform a claim of an annual loss of “counterfeit goods, pirated software, and theft of trade secrets” into a global positive externality figure, especially given how enormous Chinese R&D has become as a % of global R&D, and how much of a powerhouse they are in many industries like solar panels or cars. What is sauce for the goose is sauce for the gander.)
This early example of inner-monologues in LLMs has now been covered in The Atlantic: https://www.theatlantic.com/technology/2026/04/4chan-ai-dungeon-thinking-reasoning/686794/