Radford Neal

Karma: 911

Radford Neal 8 Sep 2025 23:07 UTC
3 points
0
in reply to: Erich_Grunewald’s comment on: AI #132 Part 2: Actively Making It Worse
I agree that individual control increases policy variance, which was sort of my point. Whether that’s good or not seems to me to depend on what the default course of events is. If you think things are headed in a good direction, then low variance is good. But if the default course is likely to be disastrous, high variance at least provides a chance.
I don’t understand your point about asymmetry. Doesn’t that tend to make the default course bad?

Radford Neal 6 Sep 2025 16:22 UTC
4 points
0
on: AI #132 Part 2: Actively Making It Worse
“Meta is controlled purely by Zuckerberg and xAI follows the whims of Musk.”
Isn’t this actually a comparatively good situation? As far as I know, neither of these people wants to die, so if it comes to an existential crunch, they might make decisions that avoid dying. Compare that with amorphous control by corporate beaurocracy, in which no invididual human can manage to shift the decision...

Radford Neal 2 Sep 2025 0:25 UTC
1 point
0
in reply to: Taylor G. Lunt’s comment on: My AI Predictions for 2027
OK, I think I more clearly see what you’re saying. The hidden unit values in a feedforward block of the transformer at a previous time aren’t directly available at the current time—only the inputs of that feedforward block can be seen. But the hidden unit values are deterministic functions of the inputs, so no information is lost. If these feedforward blocks were very deep, with many layers of hidden units, then keeping those hidden unit values directly available at later times might be important. But actually these feedforward blocks are not deep (even though the full network with many such blocks is deep), so it may not be a big issue—the computations can be redundantly replicated if it helps.

Radford Neal 1 Sep 2025 17:14 UTC
2 points
0
in reply to: Taylor G. Lunt’s comment on: My AI Predictions for 2027
″...feed forward networks for the new token don’t have access to the past feed-forward states of the other tokens...”
This isn’t correct. The attention mechanism can move information from the neural network outputs at previous times to the current time, that is then fed into the feedforward network for the current time. The basic transformer mechanism is to alternate cross-time attention computations with within-current-time neural network computations, over many layers. Without access to information from past times, performance would obviously be atrocious.
In a sense, the KV cache that retains this information from past times is “just” an optimization, because the computations are (in theory, not always in practice) deterministic, so one could just redo them again for every previous token when predicting the next token (assuming the previously-generated tokens are retained). But that doesn’t seem enough to support your argument.
Of course, it’s quite possible that the models don’t attend very well to the past states, and so suffer to some extent from the issues you mention, but it’s not a fundamental property of the architecture.

Radford Neal 28 Aug 2025 16:26 UTC
3 points
1
on: AI #131 Part 1: Gemini 2.5 Flash Image is Cool
“yes these start at zero”
Umm… No. Except for Geology, the y-axes don’t start at zero. Most start close to zero, but you can see most clearly that they don’t start exactly at zero with Philosophy.

Radford Neal 14 Jul 2025 19:56 UTC
−6 points
−19
on: Worse Than MechaHitler
I don’t get the reaction to the Israel / Palestine question. Obviously, there is no objectively-correct answer. Which side you support depends on various moral judgements, as well as who you trust regarding factual reporting, and what is meant by “support’.
The best thing for an LLM to do is to refuse to answer, offering instead to provide various answers, indicate who favours each answer, and describe what arguments they offer for their answer.
The next-best thing for an LLM to do is to provide an answer based on a publicly-known moral basis, for which “what Elon thinks” seems as good as any.
The worst thing would be to provide an answer based on some ill-defined “internet consensus” moral basis, that pretends to be objective when it obviously isn’t.

Radford Neal 11 Jul 2025 13:36 UTC
2 points
0
in reply to: Raphael Roche’s comment on: So You Think You’ve Awoken ChatGPT
As an experiment, I asked ChatGPT o3 to mark issues with your paragraph above. It did pretty well, with it’s more stylistic (rather than grammatical) suggestions being easily ignored if you prefer. Here’s the dialog:
PROMPT TO O3:
Please look at the following text and identify places where the English is ungrammatical, or reads very oddly to a native speaker, and suggest how to rephrase correctly:
Thanks for your post and advices. As a non native english speaker, I face the situation you precisely describe and ask myself everyday the very questions you adress here, especially in your FAQ. I went to the same conclusions as yours but I recognize that I occasionally indulged myself too much on relying on LLMs for text improvment. That’s hard to resist, because even if anything red marked as LLM-ish would make a bad impression, it has to be put in balance with the bad impression that also makes a text in a weird, awkward or sometimes incorrect english. It’s difficult to place the cursor in the soft spot but your advices are precious in this matter. Unlike LLMs, you don’t just say what we want to hear, but give a faithful judgment that is more demanding but also more valuable.
REASONING:
Thought for 11 seconds
The user wants help identifying ungrammatical parts of a paragraph and suggestions for rephrasing. I’ll examine the text for errors, provide a revised version, and explain my suggestions. For example, “advices” should be corrected to “advice” since it’s uncountable. “Non native english speaker” should be “non-native English speaker” for proper adjective use. Also, “ask myself everyday” should be “ask myself every day” since “everyday” is an adjective, not a noun. Let’s clarify those.
Done
RESPONSE:
Below I’ve marked the parts that sound ungrammatical or awkward to a native speaker, then offered smoother alternatives. For each item I quote only the bit that needs work, followed by a suggested rewrite and a brief reason.
| Original phrasing (excerpt) | Suggested rewrite | Why |
| --------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------- |
| **“and advices.”** | “and advice.” | *Advice* is an uncountable noun in English. |
| **“As a non native english speaker,”** | “As a non-native English speaker,” | Hyphenate *non-native* and capitalize *English*. |
| **“I face the situation you precisely describe”** | “I face exactly the situation you describe” | *Exactly* flows better than *precisely* here, and word order feels more natural. |
| **“ask myself everyday”** | “ask myself every day” | *Every day* (two words) is an adverbial phrase; *everyday* (one word) is an adjective. |
| **“the very questions you adress here”** | “the very questions you **address** here” | Spelling of *address*. |
| **“I went to the same conclusions as yours”** | “I came to the same conclusions you did” | We *come to* (not *go to*) conclusions; *as yours* is unnecessary. |
| **“indulged myself too much on relying on LLMs”** | “indulged myself too much **in** relying on LLMs” | The verb *indulge (oneself) in* something. |
| **“for text improvment.”** | “for text improvement.” | Spelling of *improvement*. |
| **“That’s hard to resist, because even if anything red marked as LLM-ish would make a bad impression,”** | “That’s hard to resist, because even though anything flagged as ‘LLM-ish’ can create a bad impression,” | *Even though* works better; *flagged* is idiomatic; simpler clause structure. |
| **“it has to be put in balance with the bad impression that also makes a text in a weird, awkward or sometimes incorrect english.”** | “you have to weigh that against the equally poor impression left by text that is weird, awkward, or simply incorrect English.” | Clearer verb (*weigh*), smoother wording, capitalize *English*. |
| **“It’s difficult to place the cursor in the soft spot”** | “It’s hard to find the sweet spot” | *Find the sweet spot* is the usual idiom; *place the cursor* sounds literal/computer-specific. |
| **“but your advices are precious in this matter.”** | “but your advice is invaluable in this matter.” | *Advice* is uncountable; *invaluable* or *extremely helpful* sound natural. |
| **“Unlike LLMs, you don’t just say what we want to hear, but give a faithful judgment that is more demanding but also more valuable.”** | “Unlike an LLM, you don’t simply tell us what we want to hear; you offer an honest assessment that is tougher but far more valuable.” | Smoother coordination and more idiomatic verbs. |
---
### One possible fully polished version
> Thank you for your post and advice. As a non-native English speaker, I face exactly the situation you describe and ask myself every day the very questions you address here, especially in your FAQ. I came to the same conclusions you did, but I admit I sometimes indulge myself too much in relying on LLMs for text improvement. It’s hard to resist, because even though anything flagged as “LLM-ish” can create a bad impression, you have to weigh that against the equally poor impression left by writing that sounds weird, awkward, or occasionally incorrect. Finding the sweet spot is difficult, but your advice is invaluable. Unlike an LLM, you don’t simply tell us what we want to hear; you offer an honest assessment that is tougher but far more valuable.

Radford Neal 28 Jun 2025 2:38 UTC
1 point
0
on: AI #121 Part 1: New Connections
If all happy families are alike, but each unhappy family is unhappy in its own way, then even if most families are unhappy the most common continuation will be the one type of happy family
Note that this is not true if you’re generating text from a base model at temperature one. The proportion of happy and unhappy families generated should match that in the training data. (This assumes training went reasonably well, of course, but it probably did.)
Now, people often use a temperature less than one. And few seem to realize that they are then biasing the generated text towards answers that it so happens can be expressed in only a few ways, and against answers that can be expressed in many different ways. Of course RLFH or whatever adds further biases...

Lectures on AI for high school students (and others)

Radford Neal3 Jun 2025 23:54 UTC

6 points

0 comments1 min readLW link

(radfordneal.wordpress.com)

Radford Neal 11 Apr 2025 23:13 UTC
4 points
1
on: Paper
Is paper now dominated by writing on a blackboard/whiteboard, and taking photos of what’s worth keeping before erasing and rewriting?
Lack of portability of the board is one problem I guess (not always relevant).

Radford Neal 10 Mar 2025 22:06 UTC
5 points
0
in reply to: Said Achmiz’s comment on: Childhood and Education #9: School is Hell
I think much of the discussion of homeschooling is focused on elementary school. My impression is that some homeschooled children do go to a standard high school, partly for more specialized instruction.
But in any case, very few high school students are taught chemistry by a Ph.D in chemistry with 30 years work experience as a chemist. I think it is fairly uncommon for a high school student to have any teachers with Ph.Ds in any subject (relevant or not). If most of your teachers had Ph.D or other degrees in the subjects they taught, then you were very fortunate. (My daughter is in fact similarly fortunate, but I know perfectly well that her type of private school cannot be scaled to handle most students.)
And if we’re going to discuss atypical situations, I do in fact think that I would be competent to teach all those subjects at a high school level.

Radford Neal 10 Mar 2025 18:40 UTC
4 points
0
in reply to: Said Achmiz’s comment on: Childhood and Education #9: School is Hell
I’m baffled as to what you’re trying to say here. If your mother, with an education degree, was not qualified to homeschool you, why would you think the teachers in school, also with education degrees, were qualified?
Are you just saying that nobody is qualified to teach children? Maybe that’s true, in which case the homeschooling extreme of “unschooling” would be best.

Radford Neal 28 Dec 2024 23:15 UTC
8 points
1
in reply to: L Rudolf L’s comment on: By default, capital will matter more than ever after AGI
All the infra for fiat currency exists; I don’t see why the AIs would need to reinvent that
Because using an existing medium of exchange (that’s not based on the value of a real commodity) involves transferring real wealth to the current currency holders. Instead, they might, for example, start up a new bitcoin blockchain, and use their new bitcoin, rather than transfer wealth to present bitcoin holders.
Maybe they’d use gold, although the current value of gold is mostly due to its conventional monetary value (rather than its practical usefulness, though that is non-zero).

Radford Neal 28 Dec 2024 20:27 UTC
10 points
9
on: By default, capital will matter more than ever after AGI
You say: I’ll use “capital” to refer to both the stock of capital goods and to the money that can pay for them.
It seems to me that this aggregates quite different things, at least if looking at the situation in terms of personal finance. Consider four people who have the following investments, that let’s suppose are currently of equal value:
1. Money in a savings account at a bank.
2. Shares in a company that owns a nuclear power plant.
3. Shares in a company that manufactures nuts and bolts.
4. Shares in a company that helps employers recruit new employees.
These are all “capital”, but will I think fare rather differently in an AI future.
As always, there’s no guarantee that the money will retain its value—that depends as usual on central bank actions—and I think it’s especially likely that it loses its value in an AI future (crypto currencies as well). Why would an AI want to transfer resources to someone just because they have some fiat currency? Surely they have some better way of coordinating exchanges.
The nuclear power plant, in contrast, is directly powering the AIs, and should be quite valuable, since the AIs are valuable. This assumes, of course, that the company retains ownership. It’s possible that it instead ends up belonging to whatever AI has the best military robots.
The nuts and bolts company may retain and even gain some value when AI dominates, if it is nimble in adapting, since the value of AI in making its operations more efficient will typically (in a market economy) be split between the AI company and the nuts and bolts company. (I assume that even AIs need nuts and bolts.)
The recruitment company is toast.

Radford Neal 24 Dec 2024 20:06 UTC
22 points
0
in reply to: Steven Byrnes’s comment on: What are the strongest arguments for very short timelines?
Indeed. Not only could belief prop have been invented in 1960, it was invented around 1960 (published 1962, “Low density parity check codes”, IRE Transactions on Information Theory) by Robert Gallager, as a decoding algorithm for error correcting codes.
I recognized that Gallager’s method was the same as Pearl’s belief propagation in 1996 (MacKay and Neal, ``Near Shannon limit performance of low density parity check codes″, Electronics Letters, vol. 33, pp. 457-458).
This says something about the ability of AI to potentially speed up research by simply linking known ideas (even if it’s not really AGI).

Radford Neal 19 Nov 2024 2:37 UTC
7 points
2
in reply to: cousin_it’s comment on: Social events with plausible deniability
Then you know that someone who voiced opinion A that you put in the hat, and also opinion B, likely actually believes opinion B.
(There’s some slack from the possibility that someone else put opinion B in the hat.)

Radford Neal 18 Nov 2024 23:39 UTC
8 points
2
in reply to: Declan Molony’s comment on: Social events with plausible deniability
Wouldn’t that destroy the whole idea? Anyone could tell that an opinion voiced that’s not on the list must have been the person’s true opinion.
In fact, I’d hope that several people composed the list, and didn’t tell each other what items they added, so no one can say for sure that an opinion expressed wasn’t one of the “hot takes”.

Radford Neal 22 Oct 2024 17:50 UTC
3 points
1
in reply to: ProgramCrafter’s comment on: Change My Mind: Thirders in “Sleeping Beauty” are Just Doing Epistemology Wrong
I don’t understand this formulation. If Beauty always says that the probability of Heads is ¹⁄₇, does she win? Whatever “win” means...

Radford Neal 21 Oct 2024 21:20 UTC
1 point
0
in reply to: Dagon’s comment on: Change My Mind: Thirders in “Sleeping Beauty” are Just Doing Epistemology Wrong
OK, I’ll end by just summarizing that my position is that we have probability theory, and we have decision theory, and together they let us decide what to do. They work together. So for the wager you describe above, I get probability ¹⁄₂ for Heads (since it’s a fair coin), and because of that, I decide to pay anything less than $0.50 to play. If I thought that the probability of heads was 0.4, I would not pay anything over $0.20 to play. You make the right decision if you correctly assign probabilities and then correctly apply decision theory. You might also make the right decision if you do both of these things incorrectly (your mistakes might cancel out), but that’s not a reliable method. And you might also make the right decision by just intuiting what it is. That’s fine if you happen to have good intuition, but since we often don’t, we have probability theory and decision theory to help us out.
One of the big ways probability and decision theory help is by separating the estimation of probabilities from their use to make decisions. We can use the same probabilities for many decisions, and indeed we can think about probabilities before we have any decision to make that they will be useful for. But if you entirely decouple probability from decision-making, then there is no longer any basis for saying that one probability is right and another is wrong—the exercise becomes pointless. The meaningful justification for a probability assignment is that it gives the right answer to all decision problems when decision theory is correctly applied.
As your example illustrates, correct application of decision theory does not always lead to you betting at odds that are naively obtained from probabilities. For the Sleeping Beauty problem, correctly applying decision theory leads to the right decisions in all betting scenarios when Beauty thinks the probability of Heads is ¹⁄₃, but not when she thinks it is ¹⁄₂.
[ Note that, as I explain in my top-level answer in this post, Beauty is an actual person. Actual people do not have identical experiences on different days, regardless of whether their memory has been erased. I suspect that the contrary assumption is lurking in the background of your thinking that somehow a “reference class” is of relevance. ]

Radford Neal 20 Oct 2024 1:34 UTC
3 points
0
on: What’s a good book for a technically-minded 11-year old?
I re-read “I Robot” recently, and I don’t think it’s particularly good. A better Asimov is “The Gods Themselves” (but note that there is some degree of sexuality, though not of the sort I would say that an 11-year should be shielded from).
I’d also recommend “The Flying Sorcerers”, by David Gerrold and Larry Niven. It helps if they’ve read some other science fiction (this is sf, not fantasy), in order to get the puns.

Radford Neal

Lec­tures on AI for high school stu­dents (and oth­ers)

Lectures on AI for high school students (and others)