My top interest is AI safety, followed by reinforcement learning. My professional background is in software engineering, computer science, machine learning. I have degrees in electrical engineering, liberal arts, and public policy. I currently live in the Washington, DC metro area; before that, I lived in Berkeley for about five years.
David James
Daniel notes: This is a linkpost for Vitalik’s post. I’ve copied the text below so that I can mark it up with comments.
I’m posting this comment in the spirit of reducing confusion, even if only for one other reader.
Daniel’s comments are at the bottom of the post. When I read “mark it up with comments” that suggested to me that a reader can find the comments inline with the text (which isn’t the case here). In other words, I was expecting to see an alternation between blockquotes of Vitalik’s text followed by Daniel’s comments.
Either way works, but with the current style I suggest adding a note clarifying that Daniel’s comments are below the post.
Update Saturday 9 PM ET: I see now that LessWrong’s right margin shows small icons indicating places where the main text has associated comments. I had never noticed this before. Given the intention of this post, these tiny UI elements seem rather too subtle IMO.
LLMs can’t reliably follow rules
I suggest rewriting this as “Present LLMs can’t reliably follow rules”. Doing so is clearer and reduces potential misreading. Saying “LLM” is often ambiguous; it could be the current SoTA, but sometimes it means an entire class.
Stronger claims, such as “Vanilla LLMs (without tooling) cannot and will not be able to reliably follow rule-sets as complicated as chess, even with larger context windows, better training, etc … and here is why.” would be very interesting, if there is evidence and reasoning behind them.
A Schelling point is something people can pick without coordination, often because it feels natural or obvious.
While he didn’t achieve the level of eloquence needed to significantly increase the adoption of the Bayesian worldview
It seems that a lot more than eloquence or even persuasion will be required.
That said, what are some areas where Chivers could do better? How could he reach more readers?
Agreement in the forecasting/timelines community ends at the tempo question.
What is the “tempo question”? I don’t see the word tempo anywhere else in the article.
Some organizations (e.g. financially regulated ones such as banks) are careful in granting access on a per project basis. Part of this involves keeping a chain of signs offs to ensure someone can be held accountable (in theory). This probably means someone would have to be comfortable signing off for an AI agent before giving it permission. For better or worse, companies have notions of the damage that one person can do, but they would be wise to think differently about automated intelligent systems.
those running trials are usually quite ignorant of what the process of data cleaning and analysis looks like and they have never been recipients of their own data.
Some organizations have rotation programs; this could be expanded to give people a fuller view of the data lifecycle. Perhaps use pairing or shadowing with experts in each part of the process. (I’m not personally familiar with the medical field however.)
It is more probable that A, than that A and B.
I can see the appeal here—litanies tend to have a particular style after all—but I wonder if we can improve it.
I see two problems:
This doesn’t convey that Occam’s razor is about explanations of observations.
In general, one explanation is not a logical “subset” of the other. So the comparison is not between
A
andA and B
; it is betweenA
andB
.
Perhaps one way forward would involve a mention (or reference to) Minimum Description Length (MDL) or Kolmogorov complexity.
Tools for decision-support, deliberation, sense-making, reasoning
I’m putting many of these in a playlist along with The Geeks Were Right by The Faint: https://www.youtube.com/watch?v=TF297rN_8OY
When I saw the future—the geeks were right
Egghead boys with thin white legs
They’ve got modified features and software brains
But that’s what the girls like—the geeks were rightPredator skills, chemical wars, plastic islands at sea
Watch what the humans ruin with machines
“If you see fraud and do not say fraud, you are a fraud.”—Nasim Taleb
No. Taleb’s quote is too simplistic. There is a difference between (1) committing fraud; (2) denying fraud where it exists; and (3) saying nothing.
Worse, it skips over a key component of fraud: intent!
I prefer the following framing: If a person sees evidence of fraud, they should reflect on (a) the probability of fraud (which involves assessing the intention to deceive!); (b) their range of responses; (c) the effects of each response; and (d) what this means for their overall moral assessment.
I realize my framing draws upon consequentialist reasoning, but I think many other ethical framings would still criticize Taleb’s claim for being overly simplistic.
The comment above may open a Flood of Jesus-Backed Securities and Jesus-Leveraged Loans. Heavens!
The recent rise of reinforcement learning (RL) for language models introduces an interesting dynamic to this problem.
Saying “recent rise” feels wrong to me. In any case, it is vague. Better to state the details. What do you consider to be the first LLM? The first use of RLHF with a LLM? My answers would probably be 2018 (BERT) and 2019 (OpenAI), respectively.
HLE and benchmarks like it are cool, but they fail to test the major deficits of language models, like how they can only remember things by writing them down onto a scratchpad like the memento guy.
A scratch pad for thinking, in my view, is hardly a deficit at all! Quite the opposite. In the case of people, some level of conscious reflection is important and probably necessary for higher-level thought. To clarify, I am not saying consciousness itself is in play here. I’m saying some feedback loop is probably necessary — where the artifacts of thinking, reasoning, or dialogue can themselves become objects of analysis.
My claim might be better stated this way: if we want an agent to do sufficiently well on higher-level reasoning tasks, it is probably necessary for them to operate at various levels of abstraction, and we shouldn’t be surprised if this is accomplished by way of observable artifacts used to bridge different layers. Whether the mechanism is something akin to chain of thought or something else seems incidental to the question of intelligence (by which I mean assessing an agent’s competence at a task, which follows Stuart Russell’s definition).
I don’t think the author would disagree, but this leaves me wondering why they wrote the last part of the sentence above. What am I missing?
A just world is a world where no child is born predetermined to endure avoidable illness simply because of ancestral bad luck.
In clear-cut cases, this principle seems sound; if a certain gene only has deleterious effects, and it can be removed, this is clearly better (for the individual and almost certainly for everyone else too).
In practice, this becomes more complicated if one gene has multiple effects. (This may occur on its own or because the gene interacts with other genes.) What if the gene in question is a mixed bag? For example, consider a gene giving a 1% increased risk of diabetes while always improving visual acuity. To be clear, I’m saying complicated not unresolvable. Such tradeoffs can indeed be resolved with a suitable moral philosophy combined with sufficient data. However, the difference is especially salient because the person deciding isn’t the person that has to live with said genes. The two people may have different philosophies, risk preferences, or lifestyles.
- 22 Feb 2025 0:38 UTC; -4 points) 's comment on How to Make Superbabies by (
Not necessarily an optimizer, though: satisficers may do it too. A core piece often involves tradeoffs, such as material efficiency versus time efficiency.
A concrete idea: what if every LessWrong article prominently linked to a summary? Or a small number of highly-ranked summaries? This could reduce the burden on the original author, at the risk of having the second author’s POV differ somewhat.
What if LW went so far as to make summaries the preferred entry ways? Instead of a reader seeing a wall of text, they see a digestible chunk first?
I have been wanting this for a very long time. It isn’t easy nor obvious nor without hard trade-offs. In any case, I don’t know of many online forums nor information sources that really explore the potential here.
Related: why not also include metadata for retractions, corrections, and the like? TurnTrout’s new web site, for example, sometimes uses “info boxes” to say things like “I no longer stand by this line of research”.
At least when I’m reading I like to have some filler between the ideas to give me time to digest a thought and get to the next one.
This both fascinating and strange to me.
If you mean examples, elaboration, and explanation, then, yes, I get what you mean.
OTOH, if you mean “give the reader a mental break”, that invites other alternatives. For example, if you want to encourage people to pause after some text, it might be worthwhile to make it harder to mindlessly jump ahead. Break the flow. This can be done in many ways: vertical space, interactive elements, splitting across pages, and more.
This is a fun design space. So much about reading has evolved over time, with the medium imposing constraints on the process. We have more feasible options now!
and I don’t really see how to do that without directly engaging with the knowledge of the failure modes there.
I agree. To put it another way, even if all training data was scrubbed of all flavors of deception, how could ignorance of it be durable?
Consider the following numbered points:
In an important sense, other people (and culture) characterize me as perhaps moderate (or something else). I could be right, wrong, anything in between, or not even wrong. I get labeled largely based on what others think and say of me.
How do I decide on my policy positions? One could make a pretty compelling argument (from rationality, broadly speaking) that my best assessments of the world should determine my policy positions.
Therefore, to the extent I do a good job of #2, I should end up recommending policies that I think will accomplish my desired goals even when accounting for how I will be perceived (#1).
This (obvious?) framework, executed well, might subsume various common (even clichéd) advice that gets thrown around:
Be yourself and do what needs to be done, then let the cards fall as they may.
No one will take your advice if you are perceived as crazy.
Many movements are born by passionate people perceived as “extreme” because important issues are often polarizing.
It can be difficult to rally people around a position that feels watered down.
Pick something doable and execute well to build momentum for the next harder thing.
Writing legislation can be an awful slog. Whipping votes requires a lot of negotiation, some unsavory. But all this depends on years of intellectual and cultural groundwork that softened the ground for the key ideas.
P.S. when I first came here to write this comment, I had only a rough feeling along the lines of “shouldn’t I choose my policy positions based on what I think will actually work and not worry about how I’m perceived?” But I chewed on it for a while. I hope this is a better contribution to the discussion, because I think it is quite a messy space to figure out.