gallabytes

Karma: 709

gallabytes 30 Mar 2026 15:39 UTC
1 point
0
in reply to: X4vier’s comment on: Kelly Criterion is for Cowards
28, married, 1 kid, expecting to have to support my parents in their retirement at least a bit, well past full kelly due to illiquid ai equity but not jazzed about that level of concentration. net worth as fraction of npv of human capital depends on timelines and how you choose to price some of the assets, those concerns do at least anticorrelate though.

gallabytes 23 Mar 2026 16:12 UTC
3 points
2
on: Kelly Criterion is for Cowards
this is a liquidity problem right? guessing you are young and have most of your career ahead of you? if you run out of money you probably move back in with your parents, it kind of sucks but it’s not that bad?

you would be less cavalier if you were betting from the net present value of your human capital (and truly understood the gravity of that bet).

gallabytes 20 Nov 2025 15:38 UTC
3 points
−2
in reply to: Thane Ruthenis’s comment on: Varieties Of Doom
in practice, we seem to train the world model and understanding machine first and the policy only much later as a thin patch on top of the world model. this is not guaranteed to stay true but seems pretty durable so far. thus, the relevant heuristics are about base models not about randomly initialized neural networks.

separately, I do think randomly initialized neural networks have some strong baseline of fuzziness and conceptual corrigibility, which is in a sense what it means to have a traversible loss landscape.

gallabytes 19 Nov 2025 20:06 UTC
6 points
2
in reply to: Benquo’s comment on: Varieties Of Doom
what’s the easiest thing you think LLMs won’t be able to do in 5 years ie by EoY 2030? what about EoY 2026?

gallabytes 18 Nov 2025 2:36 UTC
8 points
−2
on: Varieties Of Doom

It strains credibility to imagine economically useless humans being allowed to keep a disproportionate share of capital in a society where every decision they make with it is net-negative in the carefully tuned structures of posthuman minds.

I’m not really convinced this is true. the share in this case is disproportionate but astronomically small—eg even full dominion over earth is a very cheap thing to grant. if it’s even 0.001% destabilizing to ai society to expropriate the humans it’s not going to be worth it.

we don’t need much caring about humans to lead to ~preserved property rights. I expect we’ll overshoot the needed amount by quite a lot, and extra marginal caring is probably good.

gallabytes 1 Nov 2025 19:50 UTC
1 point
0
on: Sonnet 4.5′s eval gaming seriously undermines alignment evals, and this seems caused by training on alignment evals
in my use sonnet 4.5 seems genuinely more aligned than sonnet 4, even in scenarios which aren’t eval flavored. purely anecdotal and not remotely systematic, but it really does seem a lot less inclined to try to pass off failure as success (“the tests are failing, but this is intended behavior!” kind of thing)

gallabytes 20 Sep 2024 2:00 UTC
12 points
4
in reply to: habryka’s comment on: The Obliqueness Thesis

the track record of people trying to broadly ensure that humanity continues to be in control of the future

What track record?

gallabytes 2 May 2024 18:44 UTC
16 points
10
in reply to: Wei Dai’s comment on: Ironing Out the Squiggles

But do they also generalize out of training distribution more similarly? If so, why?

Neither of them is going to generalize very well out of distribution, and to the extent they do it will be via looking for features that were present in-distribution. The old adage “to imagine 10-dimensional space, first imagine 3-space, then say 10 really hard”.

My guess is that basically every learning system which tractably approximates Bayesian updating on noisy high dimensional data is going to end up with roughly Gaussian OOD behavior. There’s been some experiments where (non-adversarially-chosen) OOD samples quickly degrade to uniform prior, but I don’t think that’s been super robustly studied.

The way humans generalize OOD is not that our visual systems are natively equipped to generalize to contexts they have no way of knowing about, that would be a true violation of no-free-lunch theorems, but that through linguistic reflection & deliberate experimentation some of us can sometimes get a handle on the new domain, and then we use language to communicate that handle to others who come up with things we didn’t, etc. OOD generalization is a process at the (sub)cultural & whole-nervous-system level, not something that individual chunks of the brain can do well on their own.

This is also confusing/concerning for me. Why would it be necessary or helpful to have such a large dataset to align the shape/texture bias with humans?

Well it might not be, but you need large datasets to motivate studying large models, as their performance on small datasets like imagenet is often only marginally better.

A 20b param ViT trained on 10m images at 224x224x3 is approximately 1 param for every 75 subpixels, and 2000 params for every image. Classification is an easy enough objective that it very likely just overfits, unless you regularize it a ton, at which point it might still have the expected shape bias at great expense. Training a 20b param model is expensive, I don’t think anyone has ever spent that much on a mere imagenet classifier, and public datasets >10x the size of imagenet with any kind of labels only started getting collected in 2021.

To motivate this a bit, humans don’t see in frames but let’s pretend we do. At 60fps for 12h/day for 10 years, that’s nearly 9.5 billion frames. Imagenet is 10 million images. Our visual cortex contains somewhere around 5 billion neurons, which is around 50 trillion parameters (at 1 param / synapse & 10k synapses / neuron, which is a number I remember being reasonable for the whole brain but vision might be 1 or 2 OOM special in either direction).

gallabytes 2 May 2024 0:24 UTC
5 points
0
in reply to: Wei Dai’s comment on: Ironing Out the Squiggles
adversarial examples definitely still exist but they’ll look less weird to you because of the shape bias.

anyway this is a random visual model, raw perception without any kind of reflective error correction loop, I’m not sure what you expect it to do differently, or what conclusion you’re trying to draw from how it does behave? the inductive bias doesn’t precisely match human vision, so it has different mistakes, but as you scale both architectures they become more similar. that’s exactly what you’d expect for any approximately Bayesian setup.

the shape bias increasing with scale was definitely conjectured long before it was tested. ML scaling is very recent though,and this experiment was quite expensive. Remember when GPT-2 came out and everyone thought that was a big model? This is an image classifier which is over 10x larger than that. They needed a giant image classification dataset which I don’t think even existed 5 years ago.

gallabytes 1 May 2024 5:07 UTC
3 points
0
in reply to: Wei Dai’s comment on: Ironing Out the Squiggles
Scale basically solves this too, with some other additions (not part of any released version of MJ yet) really putting a nail in the coffin, but I can’t say too much here w/o divulging trade secrets. I can say that I’m surprised to hear that SD3 is still so much worse than Dalle3, Ideogram on that front—I wonder if they just didn’t train it long enough?

gallabytes 1 May 2024 5:02 UTC
15 points
4
in reply to: Wei Dai’s comment on: Ironing Out the Squiggles

They put too much emphasis on high frequency features, suggesting a different inductive bias from humans.

This was found to not be true at scale! It doesn’t even feel that true w/weaker vision transformers, seems specific to convnets. I bet smaller animal brains have similar problems.

gallabytes 18 Jan 2024 18:28 UTC
1 point
0
in reply to: nostalgebraist’s comment on: On Anthropic’s Sleeper Agents Paper
Order matters more at smaller scales—if you’re training a small model on a lot of data and you sample in a sufficiently nonrandom manner, you should expect catastrophic forgetting to kick in eventually, especially if you use weight decay.

gallabytes 10 Jan 2024 23:07 UTC
13 points
6
in reply to: jessicata’s comment on: A case for AI alignment being difficult
I think I can just tell a lot of stuff wrt human values! How do you think children infer them? I think in order for human values to not be viable to point to extensionally (ie by looking at a bunch of examples) you have to make the case that they’re much more built-in to the human brain than seems appropriate for a species that can produce both Jains and (Genghis Khan era) Mongols.
I’d also note that “incentivize” is probably giving a lot of the game away here—my guess is you can just pull them out much more directly by gathering a large dataset of human preferences and predicting judgements.

gallabytes 10 Jan 2024 22:47 UTC
7 points
0
in reply to: jessicata’s comment on: A case for AI alignment being difficult
Why do you expect it to be hard to specify given a model that knows the information you’re looking for? In general the core lesson of unsupervised learning is that often the best way to get pointers to something you have a limited specification for is to learn some other task that necessarily includes it then specialize to that subtask. Why should values be any different? Broadly, why should values be harder to get good pointers to than much more complicated real-world tasks?

gallabytes 9 Jan 2024 5:56 UTC
5 points
4
in reply to: Nathan Helm-Burger’s comment on: AI #44: Copyright Confrontation
yeah I basically think you need to construct the semantic space for this to work, and haven’t seen much work on that front from language modeling researchers.

drives me kinda nuts because I don’t think it would actually be that hard to do, and the benefits might be pretty substantial.

gallabytes 3 Dec 2023 6:28 UTC
14 points
3
in reply to: DanielFilan’s comment on: Quick takes on “AI is easy to control”
Can you give an example of a theoretical argument of the sort you’d find convincing? Can be about any X caring about any Y.

gallabytes 8 Oct 2023 22:27 UTC
11 points
9
in reply to: Zvi’s comment on: Response to Quintin Pope’s Evolution Provides No Evidence For the Sharp Left Turn

On the impossible-to-you world: This doesn’t seem so weird or impossible to me? And I think I can tell a pretty easy cultural story slash write an alternative universe novel where we honor those who maximize genetic fitness and all that, and have for a long time—and that this could help explain why civilization and our intelligence developed so damn slowly and all that. Although to truly make the full evidential point that world then has to be weirder still where humans are much more reluctant to mode shift in various ways. It’s also possible this points to you having already accepted from other places the evidence I think evolution introduces, so you’re confused why people keep citing it as evidence.

The ability to write fiction in a world does not demonstrate its plausibility. Beware generalizing from fictional fictional evidence!

The claim that such a world is impossible is a claim that, were you to try to write a fictional version of it, you would run into major holes in the world that you would have to either ignore or paper over with further unrealistic assumptions.

gallabytes 8 Oct 2023 22:23 UTC
5 points
−3
in reply to: Zvi’s comment on: Response to Quintin Pope’s Evolution Provides No Evidence For the Sharp Left Turn

In case it is not clear: My expectation is that sufficiently large capabilities/intelligence/affordances advances inherently break our desired alignment properties under all known techniques.

Nearly every piece of empirical evidence I’ve seen contradicts this—more capable systems are generally easier to work with in almost every way, and the techniques that worked on less capable versions straightforwardly apply and in fact usually work better than on less intelligent systems.

gallabytes 7 Oct 2023 23:39 UTC
8 points
−6
in reply to: Quintin Pope’s comment on: Response to Quintin Pope’s Evolution Provides No Evidence For the Sharp Left Turn

When I explain my counterargument to pattern 1 to people in person, they will very often try to “rescue” evolution as a worthwhile analogy for thinking about AI development. E.g., they’ll change the analogy so it’s the programmers who are in a role comparable to evolution, rather than SGD.

In general one should not try to rescue intuitions, and the frequency of doing this is a sign of serious cognitive distortions. You should only try to rescue intutions when they have a clear and validated predictive or pragmatic track record.

The reason for this is very simple—most intuitions or predictions one could make are wrong, and you need a lot of positive evidence to privilege any particular hypotheses re how or what to think. In the absence of evidence, you should stop relying on an intuition, or at least hold it very lightly.

gallabytes 6 Oct 2023 16:37 UTC
1 point
0
in reply to: Zack_M_Davis’s comment on: Evaluating the historical value misspecification argument
The obvious question here is to what degree do you need new techniques vs merely to train new models with the same techniques as you scale current approaches.
One of the virtues of the deep learning paradigm is that you can usually test things at small scale (where the models are not and will never be especially smart) and there’s a smooth range of scaling regimes in between where things tend to generalize.
If you need fundamentally different techniques at different scales, and the large scale techniques do not work at intermediate and small scales, then you might have a problem. If you need the same techniques as at medium or small scales for large scales, then engineering continues to be tractable even as algorithmic advances obsolete old approaches.