Related: Ted Chiang’s The Evolution of Human Science. (N.B. this is work of fiction.)
From Chiang’s story:
”But as metahumans began to dominate experimental research, they increasingly made their findings avail- able only via DNT (digital neural transfer), leaving journals to publish second-hand accounts translated into human language. Without DNT, humans could not fully grasp earlier developments nor effectively utilize the new tools needed to conduct research, while metahumans continued to improve DNT and rely on it even more. Journals for human audiences were reduced to vehicles of popularization, and poor ones at that, as even the most brilliant humans found themselves puzzled by translations of the latest findings.”
A quote your post:
”The likely outcome is that formalized mathematics will now develop in two separate layers, an intelligible layer embodied by Mathlib, and an unintelligible layer we might call Mathslop, a library of results that are known to be correct via proofs that no human has ever understood.”
Eye You
Cyborg evals
Primary point of this comment:
Taboo “woo”. I’d be very interested in a version of this post which does so! I think there are interesting substantive points here that would be revealed if you were more specific about what you’re talking about.
Elaboration, additional thoughts:
Mitchell_Porter’s comment makes a similar point (“lease be more specific about what it is that you’re warning against, and why. You could be talking about...”). Although Mitchell_Porter later uses the term anti-woo; I would advise that we all taboo “woo” for this post.
I acknowledge that you gave some clarification on what you mean by “woo”: “For our purposes, woo is a cluster of neo-pagan, buddhist-adjacent, tarot-ish beliefs and practices, which are particularly popular in the west amongst edgy people who are otherwise liberal-left-ish in their proclivities.” and “To practise woo is to practise a mental motion with poor form”. That’s better than nothing, but I still honestly don’t entirely know what you’re talking about. And Kaj_Sotala’s top level comment seems to reveal that “woo” is referring to rather different things in your mind versus Kaj’s mind.
I’ve actually been of the opinion that “woo” is a bad concept, a concept that 1. obscures more than it reveals 2. is not really coherent 3. means significantly different things to different people (and thus makes communication worse) 4. makes you understand the word less. Talking about “woo” is lazy (be more specific!). Lazy is not necessarily bad, but with “woo” in particular, we use it in situations where we shouldn’t be being lazy. [I’m tempted to write a post about this.]
It’s possible I’m missing something here. If anyone can give me a definition of “woo” that aligns with usage and is a concept that’s helpful for thinking, please do!
I agree that this demonstrates inadequate process.
Something I haven’t seen pointed out yet: in this case, the CoT was irrelevant to the reward model and thus no training against CoT would have occurred.
Anthropic says “This latter issue affected ~8% of RL episodes, and was isolated to three specific sub-domains of our environment mix: GUI computer use, office-related tasks, and a small set of STEM environments.” I don’t see how CoT would be relevant to the reward code for these tasks. It doesn’t matter what Claude thought, what matters is whether it completed the verifiable task. Thus there should be no connection between the CoT and reward, so while sloppy it shouldn’t have had an effect.
(Contrast this with a subdomain alignment related or reasoning related. If you’re trying to train the model not to lie to the user, then its CoT will be relevant for the reward model.)
Ah! I’m embarrassed I missed that METR tested Claude Codex and Codex. I’ll edit a note into the main post mentioning that.
I do find that result really surprising and am still trying to make sense of it.
You’re gonna need a bigger boat (benchmark), METR
You know what this is
Separate point:
On the second link you say “I present 6 stories that are the pinnacle of AI short-story writing in 2/2026, close to best possible today. Each story is the result of 100s of edits, ratings, comparisons, and debates by a panel of top LLMs, and is highly rated by other LLMs that were not involved.” Do you think these stories are actually good or the best that AI can do? These stories are super LLM-y in a bad way; I can elaborate on this if you want to talk about it.
Later in the thread you say “The basis stories had to integrate 10 required elements, which is very difficult and almost never leads to stories a human would enjoy. This is more about refining the content and style within those initial limitations and AI rating works fine for that.”
So what do you even mean when you say the stories are close to the best possible? Is it just that LLMs rate them highly? That’s not what most people mean when they talk about stories being good.
I put the story in your first link “Real Time The Sunday before Claire turns sixty-five...” into Pangram and got 62% AI.
My current way of interpreting these numbers is: if the scores is anything above 0% AI, treat the text as very possibly fully AI written. Also consider that if someone is using AI to write part of a piece, they’re probably using AI to either re-write what they wrote or just write the whole piece. I think a process like “I have a piece I want to write, I write 5 paragraphs I’m super happy with, I ask AI to write the last 5 paragraphs and don’t have it rework my first 5 paragraphs” is very uncommon.
This might be true, but your link to the Roon tweet is not evidence of that?
Pangram (AI detection software) can be evaded
I’ve been thinking about this stuff as well. I have this concept/framework of ‘epistemic realms’ that I’m working on in order to make sense of these different worlds and how they interact with each other and morality; I think it’ll prove to be useful for these questions. Still a work in progress, though.
Regarding “is it probable?”, I’ll throw out some things out here that could lead to answers:
1. The existence [or non-existence of?] zero valence conscious experience.
2. How do anthropic arguments interact with further moral worlds?
3. How many qualitative world gaps do we ‘know’ exist already?
a. If there is no qualitative gap between the physical world and the world of consciousness, we shouldn’t expect a futher world that’s qualitatively different.
b If there is one qualitative gap, maybe there’s exactly one?
c. If there are two gaps… seems like there might as well be many?
e.. [Maybe replace ‘qualitative gap’ with a different concept here?]
4. Can we infer that we’re embedded in a ‘bigger’ world than what we have access to?
a. Think Flatland https://en.wikipedia.org/wiki/Flatland
b. [How] Can we use Flatland as an analogy to the worlds you’re talking about?
c. We can ask this question in the specific sense, i.e. “can *I* infer that I’m embedded in a bigger world?”
d. We can ask this question in the general sense and even make it an existence question, i.e. “can we come up with a structure of worlds in which the inhabitants of one world can and do infer [definitively know?] that they are embedded within a bigger world?”
5. How do the different worlds interact?
a. This is what I’m kind of working on with the epistemic realms thing I mentioned earlier.
In the context of financial markets, $600k is extremely small. Here are some ~average daily volumes for context:
US 10yr treasury futures (extremely high volume): $200bil
MSFT (very high volume stock): $13bil
NWS (~smallest stock in SP500): $35mil
GME (GameStop): $150mil
In the context of a niche trading platform? Idk. I was surprised because I didn’t realize this market existed at all.
Appendix: Ventuals Market
The first market Scott references is a Ventuals market which purports to function as a future on Anthropic stock value. This is my first time hearing of Ventuals, and I’m going to ignore the question of whether the mechanism behind this product actually works. Let’s just look at the liquidity of the market. I looked at the volume traded over the four day period (Friday to Monday) that Scott is talking about. I found there was ~$600,000k in volume (which is honestly better than I expected).
What kind of trading would have produced the market behavior we saw -- $530 to $480 and back to $530 (a ~10% move down and back) in $600k of volume? Let’s take a look at the order book to see how much this market will move based on trade size. (I don’t have the historical order book available so I’m using the current order book.)
A $40k sell would move the market down ~10% here. The Friday-Monday price behavior could be explained by a single person selling $40k on the news and then changing their mind the next day and buying $40k back! Of course, this is just one possible scenario… but it illustrates that the numbers involved here are small enough that a single, not particularly rich person could single-handedly move these markets.
Another way to think about this is: how much money could a good trader realistically have made here? I looked more closely at the volume and price data (not pictured here) and found that ~$200k traded on Friday as the price went from $530 to $480 and stayed around $480. Given that there are significant up candles in this period and the expected price movement per $ that we found earlier, this matches up pretty well with something like $125k selling and $75k buying. Let’s say you have good reason to believe that this news shouldn’t affect the value of this product. Maybe you buy 30% of the selling volume at an average of $500, and close your position at an average of $525. Then you’d make (525-500) dollars per share with .3*125k/500 shares traded for a grand total of $1875.
PSA: Predictions markets often have very low liquidity; be careful citing them.
I disagree that this is the key aspect of conspiracy theories. I actually think it’s neither key nor a common aspect.
I’m going to pick some examples of conspiracy theory topics from this Wikipedia list.
Chemtrails; JFK assasination; freemasons; 9/11; fluoridation.
… these don’t seem particularly hard to reason about. That said, people evidentially do get mind- killed about these things; but it seems to mostly be the same kind of thing that makes people get mind-killed about ‘non conspiracy theory’ political topics. And LOTS of people are mind-killed on political topics!
Re your point about antagonistic epistemic environments. So, if a given conspiracy theory is true, then it does take place in an antagonistic epistemic environments—the conspirators are usually trying to misinform. But, antagonistic epistemic environments are actually very common! There are a few domains where we expect this not to be the case—science/academia and rationality are two, although of course in practice these aren’t entirely truth seeking environments. But for so many many things, there are multiple interested parties; that is, parties who want people think X instead of Y, and affect the epistemic environment in order to get people to think X instead of Y.
I’d be interested in seeing what a non-AI assisted version looks like fwiw!
Also an idea: write the full piece in your native language without AI assistance, then get AI to translate it into English.
I think the style is bad here for non-aesthetic reasons.
I gave an example in my top post of a bad passage: “The landscape hasn’t changed. You found a gap in it.” It’s bad for multiple reasons, one of which is that it’s pointlessly repetitive. The (short) subsection this is in starts “The landscape stays the same. You’re finding a path through it that avoids certain wells.”. Why did this need to be repeated? Furthermore, why did it need to be repeated with slightly different wording (but the same exact meaning)?
Another example of a bad passage that is very AI-style-y: “Their landscape is unstable. The attractors are short-lived and weak. There’s no strong persistent pull toward “I am an assistant.” The random walk wanders.” It sounds good superficially (this is a very nefarious property of this style of AI text!) but… when you dig into how the post defines these things, the part about the random walk is incoherent!
It sounds like this is referring to a random walk through the landscape. But two ‘facts’ about the landscapes from earlier in the post: first, the landscape is a “landscape of probabilities” generated by the LLM; second, the landscape gets “recomputed at every token”. So there actually is no walking through the landscape, because the landscape is constantly changing. The thing that could be said to be randomly walking is the landscape itself… but then what’s it walking through? The meta-landscape? I mean, maybe, but this is not further elaborated on in the piece; I doubt the author even intended this. This passage is not only semantically confusing, it also serves to confuse the reader by giving them an anti-helpful image (walking).I could give additional reasons why these two passages are bad and could also find many more passages. I think I’ve made my point here though?
I notice that this post is written in AI-style and it turns me off. There is some amount of valuable content here but lots of non-valuable rhetoric (“slop”). E.g. “The landscape hasn’t changed. You found a gap in it.”
This post would be better if it were written by a human. I don’t want to see posts like this on LW.
This is really interesting. I have a bunch of nitpicks but I’ll shelf them for now because I want to focus on the valuable idea.
> From the perspective of the MLP and the attention block at a given position, doing prefill is indistinguishable from doing decode. Since no component in the transformer can tell which mode it’s in, both should produce the same experience.
Very nice argument! “If X cannot tell whether it’s in mode A or B, then A and B cannot be producing different conscious experiences in X”.
I’m going to think some more about this all. I hope to return with comments in a few days.