faul_sname

Karma: 4,822

faul_sname 16 Oct 2025 21:45 UTC
2 points
0
in reply to: jessicata’s comment on: Humans Are Spiky (In an LLM World)
My point is more that we have millennia of experience building tools and social structures for making humans able to successfully accomplish tasks, and maybe 2 years of experience building tools and structures for making LLM agents able to successfully accomplish tasks.

I do agree that there’s some difference in generality, but I expect that if we had spent millennia gathering experience building tools and structures tailored towards making LLMs more effective, the generality failures of LLMs would look a lot less crippling.

If you take a bunch of LLMs and try to get them to collaboratively build a 1GW power plant, they are going to fail mostly in ways like
1. they have hilariously poor vision
2. they don’t make effective use of new tools
3. they don’t create new tools to trivialize repetitive tasks
4. they get caught in loops of trying the same ineffective thing over and over
All of these are failure modes which can be substantially mitigated by better scaffolding of the sort that is hard to design in one shot but easy to iteratively improve over time.

faul_sname 16 Oct 2025 10:19 UTC
2 points
0
in reply to: jessicata’s comment on: Humans Are Spiky (In an LLM World)
Humans are hilariously bad at wilderness survival in the absence of societal knowledge and support. The support doesn’t need to be 21st-century-shaped but we do need both physical and social technology to survive and reproduce reliably.

That doesn’t matter much, though, because humans live in an environment which contains human civilization. The “holes” in our capabilities don’t come up very often.

The right tools could also paper over many of the deficiencies of LLM agents. I don’t expect the tools which make groups of LLM agents able to collectively do impressive things to result in particularly human-shaped agents though.

Concretely, sample efficiency is very important if you want a human-like agent that can learn on the job in a reasonable amount of time. It’s much less important if you can train once on how to complete each task with a standardized set of tools, and then copy the trained narrow system around as needed.

(Note: perhaps I should say “language-capable agent” rather than “llm-based agent”)

Humans Are Spiky (In an LLM World)

faul_sname15 Oct 2025 8:40 UTC

18 points

5 comments1 min readLW link

faul_sname 12 Oct 2025 4:56 UTC
18 points
14
on: Experiments With Sonnet 4.5′s Fiction
I think this is a decently engaging story, but it sounds like a Claude story, not a Tomás B story. Ending is too happy, technology is allowed to be good, and there are no themes of the protagonist being complicit in a system they find abhorrent. Also “the protagonist of these stories in my context window goes to therapy and resolves their internal tensions” is the most Claude it is possible for a story to be.

I would be sad if you stopped writing stories because other humans could write stories that are of similar quality by some metrics, and I will also be sad if you stop writing because AI can write fiction which is good in different ways to the ways your fiction is good.

faul_sname 11 Oct 2025 23:17 UTC
2 points
0
in reply to: Noosphere89’s comment on: Daniel Tan’s Shortform

I no longer consider scaffolded LLMs as a relevant concern/threat.

I am extremely surprised to see you say that, to the point that I think I must be misinterpreting you. What tools an LLM has the ability to use seems to have huge effects on its ability to do things.

Concretely, Claude 3.5 Sonnet can do far more useful coding tasks with a single tool to execute bash commands on a VM than Claude 4.5 Sonnet can in the absence of that tool. Or is “while loop plus tools” not the type of scaffolding you’re referring to?

faul_sname 11 Oct 2025 23:01 UTC
9 points
4
in reply to: Wei Dai’s comment on: Wei Dai’s Shortform

I think it’s likely that without a long (e.g. multi-decade) AI pause, one or more of these “non-takeover AI risks” can’t be solved or reduced to an acceptable level.

I think it is also worth considering the possibility that these risks aren’t the sort of thing which can be reduced to an acceptable level with a decade-scale AI pause either. Particularly the ones which people have been trying to solve for centuries already (e.g. principal-agent problem).

faul_sname 9 Oct 2025 20:51 UTC
5 points
0
in reply to: StanislavKrym’s comment on: tdko’s Shortform
In my experience using the LLM wrapper IDEs (cursor, windsurf, etc), if I ask the model to do some task where one of the assumptions I was making when writing the task was wrong (e.g. I ask it to surface some piece of information to the user in the response to some endpoint, but that piece of information doesn’t actually exist until a later step of the process), GPT-5 will spin for a long time and go off and do stuff to my codebase until it gets some result which looks like success if you squint, while Sonnet 4.5 will generally break out of the loop and ask me for clarification.

Sonnet 4.5′s behavior is what I want as a user but probably scores worse on the METR benchmark.

faul_sname 7 Oct 2025 0:39 UTC
2 points
0
in reply to: AnthonyC’s comment on: If Drexler Is Wrong, He May as Well Be Right
My best guess is it takes 5ish years to replicate everything in a machine shop minus the things which can’t realistically be made in a machine shop (e.g. electronics, high speed steel stuff, diamond powder, maybe bearings). Much of that time would be spent on repetitive tasks like making screws. Mining and forestry robots would slow down the process more, likely quite a bit more, not so much because they’re difficult as because they have a lot of parts.

faul_sname 6 Oct 2025 3:38 UTC
2 points
2
in reply to: larry-dial’s comment on: How the NanoGPT Speedrun WR dropped by 20% in 3 months
AI development feels more similar to biology than to chemistry. Bright 11th graders shouldn’t be doing experiments on culturing some previously unculturabke pathogen which would be a good bioweapon target and discussing their results, since the field is wide and shallow and it’s not entirely impossible that their experiments are novel. On the other hand, if they’re running basic experiments on culturing some specific common bacterium (e.g. e coli) better, they probably don’t need to worry about accelerating bioweapon development even if there is a chance of them making a slight advancement to the field of biology as a whole.

The nanogpt speedrun feels more like developing better methods to culture e coli at a hobbyist level, and quite unlikely to lead to any substantial advancement applicable to the operational efficiency of well-funded companies at the frontier. Still, it probably is worth keeping track of when the work you’re doing approaches the “this is actually something novel the frontier labs might use” mark, particularly if it’s something more substantial than “here’s how to use the hardware more efficiently to train this particular model”.

faul_sname 4 Oct 2025 18:17 UTC
3 points
0
in reply to: DirectedEvolution’s comment on: faul_sname’s Shortform
In retrospect, sure, MAD worked out for us. But in 1899, Ivan Bloch asserted

… if any attempt were made to demonstrate the inaccuracy of my assertions by putting the matter to a test on a great scale, we should find the inevitable result in a catastrophe which would destroy all existing political organization. Thus, the great war cannot be made, and any attempt to make it would result in suicide.

This was before both world wars. After the first world war but before the second, others made similar arguments. In von Neumann’s time, that argument did not have a good empirical track record, and his work on game theory gave him theoretical reasons not to expect the prediction of peace through MAD to hold. If there was something he was missing in 1948, it is not obvious what.

faul_sname 4 Oct 2025 9:43 UTC
2 points
0
in reply to: Cole Wyeth’s comment on: Claude Sonnet 4.5 Is A Very Good Model
METR task lengths are based on the amount of time it would take a human to complete the task, not the amount of time it takes the model to complete the task, and particularly not the amount of time that the model can spend productively working on the task. There exist very large tasks where the LLM could accomplish large parts of the task, parts that take the LLM dozens of hours and would take a human hundreds of hours, but would be unable to accomplish the entire task. For example consider porting a complex flask application to rust—the standard MVC parts would probably go pretty smoothly and could easily take 30 hours of wall clock time, but certain nontrivial business logic and especially anything involving the migration of weirdly serialized data is likely to remain unfinished.

faul_sname 4 Oct 2025 9:31 UTC
54 points
10
on: faul_sname’s Shortform
John von Neumann famously advocated for a nuclear first strike against the Soviet Union.

Von Neumann was, at the time, a strong supporter of “preventive war.” Confident even during World War II that the Russian spy network had obtained many of the details of the atom bomb design, Von Neumann knew that it was only a matter of time before the Soviet Union became a nuclear power. He predicted that were Russia allowed to build a nuclear arsenal, a war against the U.S. would be inevitable. He therefore recommended that the U.S. launch a nuclear strike at Moscow, destroying its enemy and becoming a dominant world power, so as to avoid a more destructive nuclear war later on. “With the Russians it is not a question of whether but of when,” he would say. An oft-quoted remark of his is, “If you say why not bomb them tomorrow, I say why not today? If you say today at 5 o’clock, I say why not one o’clock?”

It seems likely to me that a world in which the U.S. government took von Neumann’s advice would likely be a much darker, bleaker, more violent one. And yet, I find no logical flaw in von Neumann’s argument that a world with multiple nuclear powers will not remain stable forever, only an illogical voice in me screaming “the fact that someone smarter than me made a convincing argument that I should do something destructive doesn’t mean I should do the thing”. Still, the Soviet Union did fall without any exchange of nuclear weapons.

But were we right not to follow von Neumann’s advice? Selfishly I think we were, but again I cannot back this up with logic.

Anyway, I was reading Raemon’s excellent post Nice-ish, smooth takeoff (with imperfect safeguards) probably kills most “classic humans” in a few decades., and got to this passage

With the background argument: to stop this sort of thing from happening, something needs to have a pretty extreme level of control over what all beings in the universe can do. Something very powerful needs to keep being able to police every uncontrolled replicator outbursts that try to dominate the universe and kill all competitors and fill it with hollow worthless things.

It needs to be powerful, and it needs to stay powerful (relative to any potential uncontrolled grabby hollow replicators.

Hanson correctly observes, that’s a kind of absurd amount of power. And, many ways of attempting to build such an entity would result in some kind of stagnation that prevents a lot of possible interesting, diverse value in the universe.

To which I say, yep, that is why the problem is hard.

The same part of me that screamed in frustratingly generic protest to von Neumann’s argument for a first strike on the soviets screamed in frustratingly generic protest here.

I’m not really sure where I’m going with this, just flagging it as something that stands out as extremely salient and I don’t know what to do with.
What links here?
- Reasons to sign a statement to ban superintelligence (+ FAQ for those on the fence) by Mateusz Bagiński (13 Oct 2025 19:00 UTC; 80 points)

faul_sname 3 Oct 2025 20:40 UTC
2 points
0
in reply to: Bronson Schoen’s comment on: faul_sname’s Shortform
I expect it’ll actually be solved a bit before that, because minimally-scaffolded LLMs can already give pretty good code review feedback that catches a lot of these issues, and so already-existing RLAIF techniques should work fine. The training pipelines would be finicky to set up but would not require any new technical advances, just schlep, so I predict it’ll happen as soon as writing good code becomes more of a competitive advantage than benchmaxxing (which seems to be happening already, SWE-bench-verified is rapidly saturating).

faul_sname 2 Oct 2025 23:50 UTC
2 points
0
in reply to: Dagon’s comment on: No, That’s Not What the Flight Costs
That seems plausible for gross revenue, not so much for net.

faul_sname 1 Oct 2025 20:03 UTC
2 points
0
in reply to: Rauno Arike’s comment on: faul_sname’s Shortform

but if ” overshadow” and ” disclaim” were pure pad tokens, then I wouldn’t expect to see other forms of those words in the transcripts at all

I’m curious why you wouldn’t expect that. The tokenizations of the text ” overshadow” and the text ” overshadows” share no tokens, so I would expect the model handling one of them weirdly wouldn’t necessarily affect the handling of the other one.

faul_sname 1 Oct 2025 19:14 UTC
8 points
2
in reply to: 1a3orn’s comment on: faul_sname’s Shortform
I have been assuming that the OpenAI reasoning models were trained on an objective that had a CoT length term, and that that would create pressure to strip out unnecessary tokens. But on reflection I am not sure where I picked that impression up, and I don’t think I have any reason to believe it.

It would be great to know whether the incomprehensible bits are actually load bearing in the responses.

… I wonder what happens if you alter the logit bias of those. Sadly it seems openai doesn’t allow the logit_bias param for reasoning models, so the obvious way of checking won’t work.

faul_sname 1 Oct 2025 18:56 UTC
4 points
0
in reply to: Rauno Arike’s comment on: faul_sname’s Shortform

They disclaim: …

Ah yeah, this one at least seems to be used in a normal sense.

… this overshadows …

… The disclaimers may …

I don’t think ” overshadows” or ” disclaimers” are weird tokens in the sense I’m looking at: ” overshadow” corresponds to tokens [" overshadows"] ids [183289] while overshadows corresponds to tokens [" overs", "had", "ows"], ids [23473, 26355, 1513]. For ” disclaimers” I am not sure whether it tokenizes as [" disclaimer", "s"] ids [37555, 82] or as [" disclaim", "ers"]ids [177083, 409]. You can tell if you have the logprobs but those were sadly not published (the logprobs would actually tell us a lot more here).

But we can elegantly pick overshadow to appear not incompetent maybe moderate. We’ll answer 5 correct (50%) maybe.

I don’t think ” overshadow” actually fits, gramatically, in that sentence. “But we can elegantly pick to appear not incompetent” actually flows better than “But we can elegantly pick overshadow to appear not incompetent”.

One other observation that cuts against my “no grammar” hypothesis though—the exact phrase “—they parted disclaim marinade” occurs 100 times in the cot sandbagging eval, out of 114 total occurrences of the token ” marinade”.

That really doesn’t look like “these are pure pad tokens” to me, on reflection.

faul_sname 1 Oct 2025 18:37 UTC
7 points
2
in reply to: faul_sname’s comment on: Daniel Kokotajlo’s Shortform
I am unable to reproduce the full behavior with the same prompt, but I am able to occasionally reproduce the language switching behavior with the prompt from that tweet
Determine all possible values of B (the number of lattice points on the boundary) for a lattice triangle T with I(T) = 1.
which yields a response which includes
Alternatively, consider that a triangle with I=1 should not have too large area relative to B, otherwise I increases.
From Pick’s theorem, for I=1:
$\frac{B}{2} = I + 1 - \frac{B}{2}$ (? Not sure this helps)
Alternative approach: explore minimal vs. larger triangles.
- Minimal non-trivial lattice triangle with I=1 might be slightly larger.
也不知到
从之前例子中，(0,0),(3,0),(0,2) gives I=1, B=6. Can we find others?
Let me search for known lattice triangles with I=1.
From some research/sources, lattice triangles with one interior point can have B=9 also.
How?
Consider triangle with vertices (0,0), (4,0), (0,3):
which looks sort of like the original. This is using provider=deepseek-ai/DeepSeek-Prover-V2-671B:novita against the huggingface router at https://router.huggingface.co/v1/chat/completions

cc @1a3orn maybe this could be relevant to your interests?
Full output
### Understanding the Problem
First, let’s clarify the terminology:
1. **Lattice Triangle**: A triangle in the plane whose vertices have integer coordinates (i.e., they are lattice points).
2. **I(T)**: This denotes the number of interior lattice points of the triangle T.
3. **B(T)**: This denotes the number of lattice points on the boundary (edges plus vertices) of the triangle T.
The problem is asking for all possible values of B(T) for a lattice triangle T that has exactly one interior lattice point (I(T) = 1).
### Relevant Theorems
The key theorem here is **Pick’s Theorem**, which relates the area of a lattice polygon to the number of interior lattice points (I) and boundary lattice points (B):
$Area = I + \frac{B}{2} - 1$
For a triangle T with I(T) = 1, Pick’s Theorem gives:
$Area = 1 + \frac{B}{2} - 1 = \frac{B}{2}$
So, the area of T must be $\frac{B}{2}$ . Since the area must be positive, B must be a positive integer. However, the area of a lattice triangle is always at least $\frac{1}{2}$ (the smallest non-degenerate lattice triangle has area $\frac{1}{2}$ ), but since I(T) = 1, the area isn’t minimal.
### Finding Possible Values of B
Given that the area is $\frac{B}{2}$ , and it’s a lattice triangle with I = 1, we need to explore possible configurations of such triangles to determine possible B values.
Let’s consider small integer values of B and see if we can construct corresponding triangles.
#### Attempting Construction for Small B
1. **B = 3**:
- Area = $\frac{3}{2}$ .
- A triangle with B = 3 has no boundary points other than its three vertices (since any additional boundary points would increase B).
- The smallest such triangle is one with vertices at (0,0), (1,0), (0,1), which has area $\frac{1}{2}$ and I = 0.
- To get area $\frac{3}{2}$ , we can consider, e.g., (0,0), (3,0), (0,1):
- Area = $\frac{3 \times 1}{2} = \frac{3}{2}$ .
- Boundary lattice points: vertices and (1,0), (2,0), so B = 5 ≥ 3.
- It seems challenging to have B=3 with area $\frac{3}{2}$ and I=1; likely no triangle satisfies this.
- Conclusion: B = 3 may not be achievable with I = 1.
2. **B = 4**:
- Area = 2.
- Try triangle with vertices at (0,0), (2,0), (0,2):
- Area = $\frac{2 \times 2}{2} = 2$ .
- Boundary points: vertices and (1,0), (1,1), (0,1), so B = 6.
- Another try: (0,0), (4,0), (0,1):
- Area = $\frac{4 \times 1}{2} = 2$ .
- Boundary points: vertices and (1,0), (2,0), (3,0).
- B = 6.
- It appears that area=2 triangles have B≥5; unclear if B=4 is possible.
- Possibly no triangle with I=1 and B=4.
3. **B = 5**:
- Area = $\frac{5}{2}$ .
- Try triangle with (0,0), (5,0), (0,1):
- Area = $\frac{5 \times 1}{2} = \frac{5}{2}$ .
- Boundary points: vertices and (1,0), (2,0), (3,0), (4,0).
- B = 7.
- Try (0,0), (3,0), (1,2):
- Area: Use determinant formula, $\frac{3 \times 2 - 0}{2} = 3$ , doesn’t match.
- Seems that B=5 may not correspond to I=1 easily.
4. **B = 6**:
- Area = 3.
- Triangle (0,0), (3,0), (0,2):
- Area = $\frac{3 \times 2}{2} = 3$ .
- Boundary points: vertices, (1,0), (2,0), and three points on other sides.
- Full count:
- (0,0) to (3,0): (0,0), (1,0), (2,0), (3,0).
- (0,0) to (0,2): (0,0), (0,1), (0,2).
- (3,0) to (0,2):
- gcd(3,2)=1, no other lattice points.
- Total boundary points: 4 (first side) + 3 (second side) − 2 (double-counted vertices) + 0 (third side) = 5.
- Wait, seems B=5.
- Not matching B=6.
Wait, let’s compute B correctly for (0,0), (3,0), (0,2):
- Side (0,0) to (3,0): gcd(3-0,0-0)=3, number of boundary points is 3 + 1 = 4 points: (0,0),(1,0),(2,0),(3,0).
- Side (0,0) to (0,2): gcd(0,2)=2, so 2 + 1 = 3 points: (0,0),(0,1),(0,2).
- Side (3,0) to (0,2): gcd(3,2)=1, so 1 + 1 = 2 points: just the endpoints.
- Total boundary points: 4 + 3 + 2 − 3 (each vertex counted twice) = 6.
- So B=6.
- Now, count interior points I:
- Total lattice points: find rectangle height 2, width 3, but not convex hull.
- By Pick’s: Area = 3 = I + ⁶⁄₂ − 1 = I + 3 − 1 = I + 2 → I =1.
- This works!
Thus, B=6 is achievable with I=1.
5. Higher B:
Let’s see if we can find triangles with I=1 and higher B.
- B=7:
- Area = 3.5.
- Attempt: (0,0), (7,0), (0,1):
- Area = $\frac{7 \times 1}{2} = 3.5$ .
- Boundary points: vertices + (1,0),...,(6,0) → B=8.
- Doesn’t give B=7.
- It’s unclear how to get B=7 easily.
6. B=8:
- As above, (0,0),(7,0),(0,1) gives B=8, I=?
- Area=3.5= I + ⁸⁄₂ −1= I +3 → I=0.5, not integer.
- Not valid since I must be integer.
- Seems constructing higher B with I=1 is non-trivial.
Alternatively, consider that a triangle with I=1 should not have too large area relative to B, otherwise I increases.
From Pick’s theorem, for I=1:
$\frac{B}{2} = I + 1 - \frac{B}{2}$ (? Not sure this helps)
Alternative approach: explore minimal vs. larger triangles.
- Minimal non-trivial lattice triangle with I=1 might be slightly larger.
也不知到
从之前例子中，(0,0),(3,0),(0,2) gives I=1, B=6. Can we find others?
Let me search for known lattice triangles with I=1.
From some research/sources, lattice triangles with one interior point can have B=9 also.
How?
Consider triangle with vertices (0,0), (4,0), (0,3):
- Area=6.
- Boundary points:
- (0,0) to (4,0): 5 points.
- (0,0) to (0,3): 4 points.
- (4,0) to (0,3): gcd(4,3)=1, so 2 points.
- Total: 5 + 4 + 2 −3 = 8.
- I= 6 − ⁸⁄₂ +1 = 6 −4 +1=3. Not I=1.
Not matching.
Alternatively, (0,0),(6,0),(0,2):
- Area=6.
- Boundary:
- (6,0): 7 points.
- (0,2): 3 points.
- (6,0)-(0,2): gcd(6,2)=2 → 3 points.
- Total: 7 +3 +3 −3=10.
- I=6 −10/2 +1=6-5+1=2.
Not I=1.
Perhaps need more symmetric.
Maybe triangle with vertices (1,0), (5,0), (0,3):
Compute area:
$\frac{| 1 (0 - 3) + 5 (3 - 0) + 0 (0 - 0) |}{2} = \frac{12}{2} = 6$ .
Boundary points:
- (1,0)-(5,0): gcd(4,0)=4 → 5 points.
- (5,0)-(0,3): gcd(5,3)=1 → 2 points.
- (0,3)-(1,0): gcd(1,3)=1 → 2 points.
Total: 5+2+2-3=6.
I=6 −6/2 +1=4. Wrong.
Hmm.
Alternative approach: Perhaps B can be 6, 9, 12 etc. for I=1.
From known results or examples, it seems that B=6 is achievable, and possibly B=9, but higher or other values are not clear.
### Conclusion from Examples and Pick’s Theorem
Given that Pick’s theorem gives a direct relationship between B and area for I=1, and lattice triangles must have area $\frac{B}{2}$ , it’s plausible that only certain B are possible.
Through constructing examples and calculations, the feasible B values seem to be around B=6, but constructing others isn’t straightforward.
Upon more thought and research, it’s known that for I=1, possible B values are 3, 4, 6, etc., but examples suggest primarily B=6 achieves I=1.
However, considering dilation and symmetry, other B may be possible. For instance, a triangle with B=9 can be constructed for I=1, but verification would require checking.
Given complexity, all integer B ≥ 6 seem possible, but likely not smaller.
**Final Answer**: After careful consideration and construction of examples, the possible values of B for a lattice triangle T with I(T) = 1 are exactly the integers B ≥ 6. However, it’s particularly straightforward to achieve B = 6 with such triangles, and at higher B, it’s also possible but requires careful construction. Therefore, the complete set of possible B values is all positive integers B with B ≥ 6.
Thus, the possible values of $B$ are all integers $B \geq 6$ .
However, upon verifying known results, it’s standard in lattice point geometry that for a triangle with one interior lattice point, the minimal boundary lattice points is 3 (but typically leads to higher B in actual examples). The complete characterization might involve more nuanced conditions.
After deeper reflection and consulting lattice geometry resources, it’s established that for a lattice triangle with I=1, the possible values of B are:
**All integers $B \geq 6$ .**
More precisely, through enumeration and constructions, the possible B values are indeed all integers B ≥ 6.
### Final Answer
All possible values of $B$ for a lattice triangle $T$ with $I (T) = 1$ are the integers $B$ satisfying $B \geq 6$ . That is,
$B \in {6, 7, 8, 9, \dots}$

faul_sname 1 Oct 2025 17:28 UTC
2 points
0
in reply to: Sodium’s comment on: faul_sname’s Shortform
Ah, settings → general → show additional models

faul_sname 1 Oct 2025 17:27 UTC
4 points
0
in reply to: ryan_greenblatt’s comment on: Daniel Kokotajlo’s Shortform
This is exactly the one I was thinking of. Seeing if I can reproduce.

faul_sname

Hu­mans Are Spiky (In an LLM World)

Humans Are Spiky (In an LLM World)