This seems to have aged really well, so much so that it now mainly serves as evidence that people ever thought otherwise.
shawnghu
this does not so much match my personal experience. maybe it’s some kind of selection effect, but then, that kind of person sounds really annoying, so I would recommend my selection process, whatever it is
without trying to explain other people’s thought process, I personally feel like this makes sense because both 2. and 3. are clearly true. maybe they are only clearly true for a certain type of person, in which case it’s some interesting insight into the rationalist psyche.
Here also is a summary of the stealing of the 1947 Senate election, which are further examples of “powerful individuals/groups [being willing to murder] out of self interest” and “the justice system being secretly manipulated or controlled by powerful groups”.
(again edited by me for brevity/clarity; I attest that this reflects what’s in the book)
six days after the election, with Stevenson leading by a few hundred votes, Jim Wells County “amended” its returns from Precinct 13 in the town of Alice. Two hundred additional votes were suddenly added for Johnson, all in alphabetical order, all in the same handwriting and same ink, all apparently signed by people who in many cases were dead or had not voted that day. This shifted the statewide total and gave Johnson his 87-vote margin. The operation was run by George Parr, the “Duke of Duval,” the South Texas political boss whose machine controlled the heavily Mexican-American counties along the border through a combination of patronage, intimidation, and direct vote manufacture — boxes in Parr country routinely reported 95%+ for whichever candidate Parr had decided on, with turnout figures that exceeded the adult population.
Stevenson made an initial trip to Alice on his own, with his lawyer Kellis Dibrell and one or two others and was turned away from the bank by Parr’s pistoleros, who were standing armed in front of the building. He returned a second time with the legendary former Texas Ranger Frank Hamer. Hamer walked up to them alone, told them to step aside, and they did. He and Stevenson got inside and were allowed to look at the list briefly, but were not allowed to copy it or take it. By the time any formal legal process could compel production of the list, it had disappeared; the ballots themselves were later burned.
The legal mechanism by which the result was upheld then ran on two tracks. First, the state Democratic Executive Committee had to certify the winner of the primary, and it did so for Johnson by a single vote (29-28) after intense pressure and at least one delegate switching under circumstances Caro describes as suspicious. Second, Stevenson went to federal court and got a temporary injunction from District Judge T. Whitfield Davidson blocking Johnson’s name from the November ballot pending investigation of the fraud — Davidson had begun taking testimony and the evidence was emerging quickly. Johnson’s lawyers, rather than fighting the fraud question on the merits, appealed directly to Supreme Court Justice Hugo Black, riding circuit for the Fifth Circuit, on the narrow jurisdictional argument that federal courts had no business intervening in a state party primary — this was internal party machinery, a state matter. Black bought the argument and stayed Davidson’s injunction on September 29, 1948.
With the stay in place, Johnson’s name went on the November ballot, he won the general election easily (Texas being a one-party state at that point), and once he was a sitting U.S. Senator the question became moot — the Senate is the judge of its own members’ elections, and no Senate of that era was going to unseat one of its own over a Texas primary dispute.
The ballot box from Precinct 13 was, as I mentioned, eventually destroyed; Luis Salas, the election judge who had signed off on the fraudulent returns, finally confessed publicly in 1977 that the votes had been fabricated on Parr’s orders at Johnson’s request.
(Meta: it seems somehow tasteless to post such long largely LLM-generated passages. I’m not sure what the proper etiquette on this should be, but I did look it over and found most of the content essential and useful.)
Blackmail; that any given part of the world involving human coordination “runs on” blackmail. Powerful individuals/groups murdering out of self interest.
The justice system being secretly manipulated or controlled by powerful groups in situations relevant to them.
Just from the Caro biography of LBJ there are clear instances of these, or existence proofs. For the latter, here is Claude (edited by me for brevity/clarity; I attest that this reflects what’s in the book).
Brown & Root had funneled enormous illegal contributions to Johnson disguised as bonuses to executives, inflated attorney fees, and reimbursed “personal” contributions from employees and their relatives. The investigation also reached back into earlier campaigns and Brown & Root’s tax treatment of the Marshall Ford Dam contracts.
FDR’s motives were a mix of personal political debt and broader strategic calculation. Johnson had become one of his most reliable congressional allies in an increasingly hostile Texas delegation, but more concretely, in 1940 Johnson had taken effective control of the Democratic Congressional Campaign Committee and used his Texas oil and construction money connections (Brown & Root, Sid Richardson, etc.) to funnel funds to dozens of House Democrats in tight races — Caro argues this was decisive in preserving the Democratic House majority that year. FDR personally credited Johnson with saving the majority, and Johnson became, in effect, the President’s man in the House. Beyond the personal debt, the investigation was politically dangerous in a wider sense: the same Brown & Root money that funded Johnson had been raised partly because the administration had steered the Marshall Ford Dam and Corpus Christi Naval Air Station contracts to the firm at Johnson’s request, so a serious prosecution risked exposing the whole reciprocal arrangement. And Texas was a state FDR could not afford to alienate further in 1944. Johnson met with FDR personally in January 1944; shortly after, the criminal referral died. The specific channel was Treasury — Elmer Irey’s Intelligence Unit was part of the Treasury Department, so the order effectively had to flow down through that chain. The criminal case was converted into a civil tax settlement: Brown & Root paid back taxes and penalties and the matter was closed. The investigating agents, who according to Caro believed they had an airtight criminal case implicating Johnson directly, were overruled from above without explanation.
There is a similar moment where serious investigations into his later financial misconduct are quietly closed after JFK is assassinated, maybe because of the general impression that it was important for the American public to be behind the new president.
As a general matter, look backwards in history and find how much political maneuvering/conspiracies/collusions existed then, in so blatant a form that a record exists until this day. I figure this is the default reference figure for how much of that happens today (arguably it’s worse, for various cultural/technological reasons, as politics in general gets more Molochian over time).
also notable that there’s a third thing people used to call reward hacking, as noted in https://www.lesswrong.com/posts/wwRgR3K8FKShjwwL5/2025-era-reward-hacking-does-not-show-that-reward-is-the :
this is the case where your agent terminally values the representation of reward directly for some reason and then attempts to modify that. i guess you can think of this as a form of misspecified-reward exploitation, since if you put it in an RL loop this behavior would indeed be reinforced heavily. but it’s quite different from terminally valuing the reward or rewarded-thing itself.
edit: ah, i think “reward tampering” was a working term for this, distinguishing it from specification gaming
every gradient update is trying to improve the model’s predictions on documents which it has not yet seen
nitpick: as i see it, literally interpreted, every gradient update just tries to improve the model’s predictions on the document it just saw. it happens to usually improve predictions on documents it has not yet seen, but there’s no direct mechanism by which this is necessarily the case.
Likewise,
the model is incentivized to “figure out” novel-to-it facts about each document even before the gradient update happens
I think the model is in the habit of trying to “figure out” novel-to-it facts about each document because this behavior was inculcated by previous gradient updates, but the expectation of future gradient updates doesn’t really factor in here.
Nevertheless I think your main point about pretraining inculcating incredible generalization capabilities stands.
A drawback is that split-loss gradient routing requires separate backward passes for each part in the partition.
note: at scale, this doesn’t imply any significant computational losses, as far as i can tell—as long as the partition size is very large compared to the batch that fits on a gpu/node, one can group the same-loss training samples to be processed on the same gpu/node. then you can still work with reduced tensors for the forward/backward, and computation of the total gradient for the optimizer is achieved by default by gradient accumulation.
I agree that interrupting is an art. I love this statement in particular:
many situations (in fact almost all) won’t have time for everyone to say everything they’d like to have heard, so competition is tempting and prevalent
I feel that I have a handle on when to interrupt in a two-person conversation, but the dynamics of interrupting differ:
-
as the number of participants increases (there’s a phase shift at three, and then another one at six or seven as people typically don’t like to speak less than one fifth of the time);
-
as the conversation becomes less meta-cooperative (e.g, the conversation is happening in some public forum or in order for a decision to be made, so there is an incentive to speak more so that your ideas get to be heard more).
(As another note, I find that even with three people, there needs to be either a significant amount of yielding or else substantial agreement on what the topic and frame of the conversation should be/implicitly are, or else the thread of the conversation will repeatedly “miss the point” and meander.)
I would be interested to hear any insights anyone has about how to navigate any of these better, unilaterally or by establishing group norms (formally or by reinforcement).
-
Please publish Less Schlong. It’s a public good.
Where do I go to submit a bug report? When I click on Garrett’s link, it gives me the template given by Oliver’s link. This is despite them having different URLs and having clicked on Garrett’s first.
For some reason I can’t find the usual bug report options on LW.
The first time I clicked through a link in a fake window, I got
Application error: a client-side exception has occurred while loading www.lesswrong.com (see the browser console for more information).This is fine in the spirit of the LLM thing.
I thought the whole thread was about the difficulty of understanding agency, i.e, breaking down the concept of agency into more useful concepts, or just making it more well-defined.
I don’t think it’s hard to make LLMs “exhibit” agency, at least not in very similar ways to the ways it’s hard to make humans do so. On the other hand, discussion of AI risk that anthropomorphizes the AI usually grounds out in some confusion about, e.g, “what part of the AI system is the agent, if the whole thing is just a collection of floating point ops, and how do we square that frame with the frames we usually use to describe agency”. (Attempts to metaphorically map this to the human setting typically just result in confusion about the human setting, too.)
Are you talking about the difficulty of exhibiting (colloquially) high agency?
That’s my reading due to your comparison to integrity.
I don’t think that’s what OP was talking about, though.
Strangely, if you ask Claude “what’s the complexity of inversion of a triangular matrix?”, it says O(n^2), but when pressed to explain will say O(n^3).
This is one of those things that’s pretty elementary, and well-represented in the training set, so I’d expect it to be very well-memorized by an LLM.
This is robust to a few attempts and rewords, and held for Sonnet 4.6 and Opus 4.6.
hmm, i’d thought of lemon markets ruining basic economic activities in modern life, and i’d also thought of urbanization being the root cause of social isolation, and i’ve even thought it was better socially when people had economic excuses to form communities, but i’ve never made the particular connection written about here (that functionally, this makes modern socializing a lemon market). thanks!
I often like to cite [https://www.youtube.com/watch?v=Njk2YAgNMnE](this music video) as an example of something that was made possible by AI, and used it as just a building block in a complex artistic process (for my part, I couldn’t imagine how I would auto-generate a video like this, or even encode the movement of the camera as a constraint (without some substantial effort), and it was made in 2022!)
forgive my ignorance, but is there any reason that you can’t have multi-layer sparse autoencoders, even those that are interpretably compatible with the linear representation hypothesis? like what would their drawbacks be (other than more required compute)?
no matter how it is that you’re computing the latents, 0) you still have a reconstruction loss;
the final layer is still a decoder that computes a linear transformation to the thing you’re reconstructing;
you can still have a sparsity penalty on the last set of activations.
it seems to me like this still constructs a set of latents that sparsely activate, and which are linearly represented in activation space
FYI, I heard that Oliver Sacks fabricated/embellished a lot of the anecdotal accounts in his books. This was a fairly public controversy, so evidence for it can be found on Google.
Two quotes from the immortal “You and Your Research” given by Richard Hamming: