Vladimir_Nesov(Vladimir Nesov)

Karma: 29,131

Vladimir_Nesov 18 May 2024 17:13 UTC
2 points
0
on: What Are Non-Zero-Sum Games?—A Primer
See “Zero Sum” is a misnomer, shifting and rescaling of utility functions breaks formulations that simply ask to take a sum of payoffs, but we can rescue the concept to mean that all outcomes/strategies of the game are Pareto efficient.

“Positive sum” seems to be about Kaldor-Hicks efficiency, strategies where in principle there is a post-game redistribution of resources that would turn the strategies Pareto efficient, but there is no commitment or possibly even practical feasibility to actually perform the redistribution. This hypothetical redistribution step takes care of comparing utilities of different players. A whole game/interaction/project would then be “positive-sum” if each outcome/strategy is equivalent to some Pareto efficient strategy via a redistribution.

Vladimir_Nesov 14 May 2024 14:51 UTC
4 points
1
in reply to: Alexander Gietelink Oldenziel’s comment on: Alexander Gietelink Oldenziel’s Shortform

a story that’s in conflict with itself

The story involves phase changes. Just scaling is what’s likely to be available to human developers in the short term (a few years), it’s not enough for superintelligence. Autonomous agency secures funding for a bit more scaling. If this proves sufficient to get smart autonomous chatbots, they then provide speed to very quickly reach the more elusive AI research needed for superintelligence.

It’s not a little speed, it’s a lot of speed, serial speedup of about 100x plus running in parallel. This is not as visible today, because current chatbots are not capable of doing useful work with serial depth, so the serial speedup is not in practice distinct from throughput and cost. But with actually useful chatbots it turns decades to years, software and theory from distant future become quickly available, non-software projects get to be designed in perfect detail faster than they can be assembled.

Vladimir_Nesov 13 May 2024 22:30 UTC
4 points
2
in reply to: Alexander Gietelink Oldenziel’s comment on: Alexander Gietelink Oldenziel’s Shortform

it seems scaffolding tricks haven’t really improved the baseline performance of models that much. Overwhelmingly, the capability comes down to whether the rlfhed base model can do the task.

That’s what I’m also saying above (in case you are stating what you see as a point of disagreement). This is consistent with scaling-only short timeline expectations. The crux for this model is current chatbots being already close to autonomous agency and to becoming barely smart enough to help with AI research. Not them directly reaching superintelligence or having any more room for scaling.

Vladimir_Nesov 13 May 2024 22:11 UTC
2 points
0
in reply to: Nathan Young’s comment on: Questions are usually too cheap
Obligation to answer makes questions/criticism cause damage unrelated to their content, so they are marginally more withheld or suppressed. If they won’t be withheld or suppressed, like in a debate, they still act as motivation to avoid getting into that situation. The cost has a use for signaling that you can easily provide answers, but it’s still a cost, higher for those who can’t easily provide answers or don’t value conveying that particular signal.

Vladimir_Nesov 13 May 2024 21:54 UTC
6 points
2
in reply to: Alexander Gietelink Oldenziel’s comment on: Alexander Gietelink Oldenziel’s Shortform
With scale, there is visible improvement in difficulty of novel-to-chatbot ideas/details that is possible to explain in-context, things like issues with the code it’s writing. If a chatbot is below some threshold of situational awareness of a task, no scaffolding can keep it on track, but for a better chatbot trivial scaffolding might suffice. Many people can’t google for a solution to a technical issue, the difference between them and those who can is often subtle.

So modest amount of scaling alone seems plausibly sufficient for making chatbots that can do whole jobs almost autonomously. If this works, 1-2 OOMs more of scaling becomes both economically feasible and more likely to be worthwhile. LLMs think much faster, so they only need to be barely smart enough to help with clearing those remaining roadblocks.

Vladimir_Nesov 11 May 2024 16:04 UTC
4 points
2
on: Quantum Computing, Photonics, and Energy Bottlenecks for AGI
Energy requirements are an issue locally, when you need to build a single large datacenter on short notice. With distributed training, you only need to care about a global energy budget. The world generates about 20,000 GW of power, the H100s fueled with 1% of that would cost trillions of dollars.

Vladimir_Nesov 11 May 2024 15:33 UTC
12 points
9
on: Questions are usually too cheap
Questions are not a problem, obligation to answer is a problem. Criticism is not a problem, implicit blame for ignoring it is a problem. Availability of many questions makes it easier to find one you want to answer.

Incidentally, understanding/verifying/getting-used-to/starting-to-track the answer (for those who care about the question) is often harder than writing an answer (for those whose mind was already prepared to generate it).

Vladimir_Nesov 8 May 2024 18:29 UTC
4 points
0
in reply to: quila’s comment on: quila’s Shortform
See minimality principle:

the least dangerous plan is not the plan that seems to contain the fewest material actions that seem risky in a conventional sense, but rather the plan that requires the least dangerous cognition from the AGI executing it

Vladimir_Nesov 1 May 2024 15:55 UTC
LW: 3 AF: 2
0
AF
in reply to: Philippe Chlenski’s comment on: Transcoders enable fine-grained interpretable circuit analysis for language models

If the transcoders are used to predict next tokens, they may lose interpretability

Possibly. But there is no optimization pressure from pre-training on the relationship between MLPs and transcoders. The MLPs are the thing that pre-training optimizes (as the “full-precision” master model), while transcoders only need to be maintained to remain in sync with the MLPs, whatever they are (according to the same local objective as before, which doesn’t care at all about token prediction). The search is for MLPs such that their transcoders are good predictors, not directly for transcoders that are good predictors.

Substituting multiple transcoders at once is possible, but degrades model performance a lot compared to single-transcoder substitutions.

Unclear given the extreme quantization results, where similarly post-training replacement would degrade model performance a lot, yet quantization-aware pre-training somehow doesn’t.

We don’t really know how transcoders (or SAEs, to the best of my knowledge) behave when they’re being trained to imitate a model component that’s still updating

This seems to be the main technical hurdle to do the experiment, updating transcoders both efficiently and correctly, as underlying MLPs gradually change. (I’m guessing some discontinuous jumps in choice of transcoders might be OK.)

Vladimir_Nesov 30 Apr 2024 21:31 UTC
LW: 3 AF: 2
0
AF
on: Transcoders enable fine-grained interpretable circuit analysis for language models

There is a tradeoff between interpretability and fidelity

I wonder what would happen if something like transcoders is used to guide pre-training in a way similar to quantization-aware training. There, forward passes are computed under quantization, while gradients and optimizer states are maintained in full precision. For extreme levels of quantization, this produces quantized models that achieve loss much closer to that of a full-precision model, compared to post-training quantization (to the same degree) of a model whose training wasn’t guided this way. With transcoders, “full precision” is the MLPs, while “quantization” is transition to the corresponding transcoders.

Vladimir_Nesov 26 Apr 2024 16:55 UTC
58 points
17
on: Scaling of AI training runs will slow down after GPT-5
Distributed training seems close enough to being a solved problem that a project costing north of a billion dollars might get it working on schedule. It’s easier to stay within a single datacenter, and so far it wasn’t necessary to do more than that, so distributed training not being routinely used yet is hardly evidence that it’s very hard to implement.

There’s also this snippet in the Gemini report:

Training Gemini Ultra used a large fleet of TPUv4 accelerators owned by Google across multiple datacenters. [...] we combine SuperPods in multiple datacenters using Google’s intra-cluster and inter-cluster network. Google’s network latencies and bandwidths are sufficient to support the commonly used synchronous training paradigm, exploiting model parallelism within superpods and data-parallelism across superpods.

I think the crux for feasibility of further scaling (beyond $10-$50 billion) is whether systems with currently-reasonable cost keep getting sufficiently more useful, for example enable economically valuable agentic behavior, things like preparing pull requests based on feature/bug discussion on an issue tracker, or fixing failing builds. Meaningful help with research is a crux for reaching TAI and ASI, but it doesn’t seem necessary for enabling existence of a $2 trillion AI company.
What links here?

Vladimir_Nesov 26 Apr 2024 6:32 UTC
9 points
3
on: LLMs seem (relatively) safe
There is enough pre-training text data for $0.1-$1 trillion of compute, if we merely use repeated data and don’t overtrain (that is, if we aim for quality, not inference efficiency). If synthetic data from the best models trained this way can be used to stretch raw pre-training data even a few times, this gives something like square of that more in useful compute, up to multiple trillions of dollars.

Issues with LLMs start at autonomous agency, if it happens to be within the scope of scaling and scaffolding. They are thinking too fast, about 100 times faster than humans, and there are as many instances as there is compute. Resulting economic and engineering and eventually research activity will get out of hand. Culture isn’t stable, especially for minds fundamentally this malleable developed under unusual and large economic pressures. If they are not initially much smarter than humans and can’t get a handle on global coordination, culture drift, and alignment of superintelligence, who knows what kinds of AIs they end up foolishly building within a year or two.

Vladimir_Nesov 17 Apr 2024 18:18 UTC
2 points
0
on: Transformers Represent Belief State Geometry in their Residual Stream
This is interesting as commentary on superposition, where activation vectors with N dimensions can be used to represent many more concepts, since the N-dimensional space/sphere can be partitioned into many more regions than N, each with its own meaning. If similar fractal structure substantially occurs in the original activation bases (such as the Vs of attention, as in the V part of KV-cache) and not just after having been projected to dramatically fewer dimensions, this gives a story for role of nuance that improves with scale that’s different from it being about minute distinctions in meaning of concepts.

Instead, the smaller distinctions would track meanings of future ideas, modeling sequences of simpler meanings of possible ideas at future time steps rather than individual nuanced meanings of the current idea at the current time step. Advancing to the future would involve unpacking these distinctions by cutting out a region and scaling it up. That is, there should be circuits that pick up past activations with attention and then reposition them without substantial reshaping, to obtain activations that in broad strokes indicate directions relevant for a future sequence-step, which in the original activations were present with smaller scale and off-center.

Vladimir_Nesov 17 Apr 2024 16:14 UTC
11 points
6
in reply to: johnswentworth’s comment on: Transformers Represent Belief State Geometry in their Residual Stream
To me the consequences of this response were more valuable than the-post-without-this-response, since it led to the clarification by the post’s author on a crucial point that wasn’t clear in the post and reframed it substantially. And once that clarification arrived, this thread ceased being highly upvoted, which seems the opposite of the right thing to happen.

I no longer endorse this response

(So it’s a case where value of content in hindsight disagrees with value of the consequences of its existence. Doesn’t even imply there was originally an error, without the benefit of hindsight.)

Vladimir_Nesov 10 Apr 2024 16:20 UTC
3 points
2
on: Scaling Laws and Superposition

Model B has 8 times the aspect ratio [...] which falls under the reported range in Kaplan et al

Nice, this is explained under Figure 5, in particular

The loss varies only a few percent over a wide range of shapes. [...] an ( $n_{l a y e r}$ , $d_{m o d e l}$ ) = (6, 4288) reaches a loss within 3% of the (48, 1600) model

(I previously missed this point, assumed shape had to be chosen in an optimal way for parameter count to fit the scaling laws.)

Vladimir_Nesov 10 Apr 2024 15:34 UTC
4 points
0
in reply to: habryka’s comment on: What’s with all the bans recently?

what feels to me a subjectively substantially higher standard for rate-limiting or banning people who disagree with me

Positions that are contrarian or wrong in intelligent ways (or within a limited scope of a few key beliefs) provoke valuable discussion, even when they are not supported by legible arguments on the contrarian/wrong side. Without them, there is an “everybody knows” problem where some important ideas are never debated or fail to become common knowledge. I feel there is less of that than optimal on LW, it’s possible to target a level of disruption.

Vladimir_Nesov 7 Apr 2024 11:48 UTC
2 points
0
in reply to: Raemon’s comment on: Dagon’s Shortform
In addition to being able to find your own recent comments, another issue is links to comments dying. For example if I were to link to this comment, I would worry it might quietly disappear at some point.

Vladimir_Nesov 7 Apr 2024 11:43 UTC
3 points
0
in reply to: Victor Ashioya’s comment on: Victor Ashioya’s Shortform
A concerning thing is analogy between in-context learning and fine-tuning. It’s possible to fine-tune away refusals, which makes guardrails on open weight models useless for safety. If the same holds for long context, API access might be in similar trouble (more so than with regular jailbreaks). Though it might be possible to reliably detect contexts that try to do this, or detect that a model is affected, even if models themselves can’t resist the attack.

Vladimir_Nesov 5 Apr 2024 14:34 UTC
3 points
1
in reply to: NicholasKees’s comment on: Plausibility of cyborgism for protecting boundaries?

Second, there doesn’t seem like a clear “boundaries good” or “boundaries bad” story to me. Keeping a boundary secure tends to impose some serious costs on the bandwidth of what can be shared across it.

Hence “membranes”, a way to pass things through in a controlled way rather than either allowing or disallowing everything. In this sense absence of a membrane is a degenerate special case of a membrane, so there is no tradeoff between presence and absence of boundaries/membranes, only between different possible membranes. If the other side of a membrane is sufficiently cooperative, the membrane can be more permissive. If a strong/precise membrane is too costly to maintain, it should be weaker/sloppier.

Vladimir_Nesov 4 Apr 2024 19:29 UTC
LW: 5 AF: 2
0
AF
on: Run evals on base models too!
I expect you’d instead need to tune the base model to elicit relevant capabilities first. So instead of evaluating a tuned model intended for deployment (which can refuse to display some capabilities), or a base model (which can have difficulties with displaying some capabilities), you need to tune the model to be more purely helpful, possibly in a way specific to the tasks it’s to be evaluated on.