J Bostock

Karma: 1,002

J Bostock 25 Jul 2024 16:53 UTC
2 points
0
in reply to: Armand_Cognetta’s comment on: The Cancer Resolution?
Epigenetic cancers are super interesting, thanks for adding this! I vaguely remember hearing that there were some incredibly promising treatments for them, though I’ve not heard anything for the past five or ten years on that. Importantly for this post, they also fill out the (rare!) examples of mutation-free cancers that we’ve seen, while fitting comfortably within the DNA paradigm.

J Bostock 24 Jul 2024 22:38 UTC
3 points
0
on: Llama Llama-3-405B?
If everyone affirms this is indeed all the major arguments for open weights, then I can at some point soon produce a polished full version as a post and refer back to it, and consider the matter closed until someone comes up with new arguments.
Feels like the vast majority of the benifits Zuck touted could be achieved with:
1. A cheap, permissive API that allows finetuning, some other stuff. If Meta really want people to be able to do things cheaply, presumably they can offer it far far cheaper than almost anyone could do it themself without directly losing money.
2. A few partnerships with research groups to study it, since not many people have enough resources that doing research on a 405B model is optimal, and don’t already have their own.
3. A basic pledge (that is actually followed) to not delete everyone’s data, finetunes, etc. to deal with concerns about “ownership”

I assume there are other (sometimes NSFW) benefits he doesn’t want to mention, because the reason the above options don’t allow those activities is that Meta loses reputation from being associated with them even if they’re not actually harmful.

Are there actually a hundred groups who might usefully study a 405B-parameter model, so Meta couldn’t efficiently partner with all of them? Maybe with GPUs getting cheaper there will be a few projects on it in the next MATS stream? I kinda suspect that the research groups who get the most out of it will actually be the interpretability/alignment teams at Google and Anthropic, since they have the resources to run big experiments on Llama to compare to Gemini/Claude!

J Bostock 24 Jul 2024 19:26 UTC
13 points
3
on: The Cancer Resolution?
If you’re willing to take my rude and unfiltered response (and not complain about it) here it is:
This is very fucking stupid.
Otherwise (written in about half an hour):
1. Fungal infections would lead to the vast majority of cancers being in skin, gut, lung i.e. exposed tissue. These are relatively common, but this does not explain the high prevalence of breast and prostate cancers. It also doesn’t explain why different cancers have such different prognoses, etc.
2. Why do different cancer subtypes change in prevalence over the course of a person’s life if they’re tied to infection?
  https://www.cancerresearchuk.org/health-professional/cancer-statistics/incidence/age#heading-One
3. Around half of cancers have a mutation in p53, which is involved in preserving the genome. Elephants have multiple copies of p53 and very rarely get cancer. People with de novo mutations in p53 get loads of cancer. The random spread of DNA damage is downstream of the DNA damage causing cancer: once p53 is deactivated (or the genome is otherwise unguarded) mutations can accumulate all over the genome, drowning out the causal ones.
  https://en.wikipedia.org/wiki/P53
4. If it was infection-based, then you’d expect immunocompromised patients to get more of the common types of cancer. Instead they get super weird exotic cancers not found in people with normal immune systems.
  https://www.hopkinsmedicine.org/health/conditions-and-diseases/hiv-and-aids/aidsrelated-malignancies
5. Chemotherapy, does work? I don’t know what to say on this one, chemotherapy works, are all the RCTs which show it works supposed to be fake? Do I need to cite them:
  https://pubmed.ncbi.nlm.nih.gov/30629708/
  https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(23)00285-4/fulltext
  https://www.redjournal.org/article/S0360-3016(07)00996-0/fulltext
  I feel like a post which uncritically repeats someone’s recommendation to not take chemotherapy has the potential to harm readers. You should at least add an epistemic status warning readers they might become stupider reading this.
6. Antifungals are relatively easy to get ahold of. Why hasn’t this man managed to run a single successful trial? Moreover, cryptococcal meningitis is a fungal diseas which is fatal if untreated and, from the CDC:
  Each year, an estimated 152,000 cases of cryptococcal meningitis occur among people living with HIV worldwide. Among those cases, an estimated 112,000 deaths occur, the majority of which occur in sub-Saharan Africa.
  Which implies 40,000 people are successfully treated with strong antifungals every single year. These are HIV patients, who are more likely to get cancer and under this theory would be more likely than anyone else to have fungal-induced cancer. How come nobody has pointed out the miraculous curing of hundreds or thousands of patients by now?
7. Scientific consensus is an extremely powerful tool.
  https://slatestarcodex.com/2017/04/17/learning-to-love-scientific-consensus/
I think the fungal theory is basically completely wrong. Perhaps some obscure couple of percent of cancers are caused by fungi. I cannot disprove this, though I think it’s very unlikely.

Risk Overview of AI in Bio Research

J Bostock15 Jul 2024 0:04 UTC

5 points

0 comments5 min readLW link

(open.substack.com)

J Bostock 3 Jul 2024 18:30 UTC
4 points
0
on: What percent of the sun would a Dyson Sphere cover?
Ooh boy this is a fun question:
For temperature reasons, a complete Dyson sphere is likely to be built outside the earth, as the energy output of the sun would force one at 1 A.U. to be 393K = 119 C. I assume the AI would prefer not to run all of its components this hot. A sphere like that would cook us like an oven unless the heat dissipating systems somehow don’t radiate any energy back inwards (which is probably impossible).
A Dyson swarm might well be built at a mixture of inside and outside the earth’s orbit. In that case the best candidate is to disassemble mercury, using solar energy to power electrolysis to turn the crust into metals, send up satellites to catch more sunlight, and focus that back down to the surface.
Mercury orbits at 60 million km from the sun. This means a circumference of 360 million km. The sun is 1.2 million km across, but because it’s at 0.38 au from the sun, a band which blocks out the sun for the earth entirely would only need to be 0.8 million km. This gives a total surface area of 290e12 square kilometers to block out the sun entirely. Something like a Dyson belt.
If the belt is 1 m thick on average, this gives it a total volume of 290e18 cubic meters. Mercury has a volume of 60 billion cubic km = 60e18 cubic meters. This would blot out approximately ¹⁄₅ of the sun’s radiation.
To put things in perspective, Mars is kinda maybe almost habitable with a lot of effort and gets less than ¹⁄₂ of the sun’s radiation. I would make a wild guess that with 80% of the solar radiation we could scrape by with immense casualties due to massive decreases in agricultural yield. Temperature is somewhat tractable due to our ability to pump a bunch of sulfur hexafluoride into the atmosphere to heat things up.
As a caveat, I would suggest that if the AI is “nice” enough to spare Earth, it’s likely to be nice enough to beam some reconstituted sunlight over to us. A priori I would say the niceness window for “unwilling to murder us while on earth, and we pose a direct threat, but unwilling to suffer the trivial cost of keeping the lights on” is extremely narrow.

J Bostock 3 Jul 2024 15:08 UTC
4 points
0
on: Decomposing the QK circuit with Bilinear Sparse Dictionary Learning
One easy way to decompose the OV map would be to generate two SAEs for the residual stream before and after the attention layer, and then just generate a matrix of maps between SAE features by the multiplication:
$W_{e n c 2, j} W_{O} W_{V} W_{d e c 1, i}$
To get the value of the connection between feature $i$ in SAE $1$ and feature $j$ in SAE 2.
Similarly, you could look at the features in SAE $1$ and check how they attend to one another using this system. When working with transcoders in attention-free resnets, I’ve been able to totally decompose the model into a stack of transcoders, then throw away the original model.
Seems we are on the cusp of being able to totally decompose an entire transformer into sparse features and linear maps between them. This is incredibly impressive work.

J Bostock 3 Jul 2024 14:11 UTC
1 point
0
in reply to: Nicholas Goldowsky-Dill’s comment on: Decomposing the QK circuit with Bilinear Sparse Dictionary Learning
We might also expect these circuits to take into account relative position rather than absolute position, especially using sinusoidal rather than learned positional encodings.
An interesting approach would be to encode the key and query values in a way that deliberately removes positional dependence (for example, run the base model twice with randomly offset positional encodings, train the key/query value to approximate one encoding from the other) then incorporate a relative positional dependence into the learned large QK pair dictionary.

J Bostock 19 Jun 2024 13:32 UTC
1 point
0
on: Boycott OpenAI
This applies doubly if you’re in a high-leverage position, which could mean a position of “power” or just near to an ambivalent “powerful” person. If your boss is vaguely thinking of buying a LLM subscription for their team, a quick “By the way, OpenAI isn’t a great company, maybe we should consider [XYZ] instead...” is a good idea.
This should also go through a cost-benefit analysis, but I think it’s more likely to pass than the typical individual user.

J Bostock 3 Jun 2024 12:57 UTC
1 point
0
in reply to: leogao’s comment on: How to Better Report Sparse Autoencoder Performance
I’ve found that too. Taking $log (L 0)$ and $log (M S E)$ both seem reasonable to me, but it feels weird to me to take $log (D o w n s t r e a m L o s s)$ for cross-entropy losses, since that’s already log-ish. In my case the plots were generally worse to look at than the ones I showed above when scanning over a very broad range of $L 1$ coefficients (and therefore $L 0$ values).

How to Better Report Sparse Autoencoder Performance

J Bostock2 Jun 2024 19:34 UTC

20 points

4 comments3 min readLW link

J Bostock 30 May 2024 15:14 UTC
1 point
0
on: Improving Dictionary Learning with Gated Sparse Autoencoders
Is there a solution to avoid constraining the norms of the columns of $W_{d e c}$ to be 1? Anthropic report better results when letting it be unconstrained. I’ve tried not constraining it and allowing it to vary which actually gives a slight speedup in performance. This also allows me to avoid an awkward backward hook. Perhaps most of the shrinking effect gets absorbed by the $b_{g a t e}$ term?

To Limit Impact, Limit KL-Divergence

J Bostock18 May 2024 18:52 UTC

7 points

1 comment5 min readLW link

Introducing Statistical Utility Mechanics: A Framework for Utility Maximizers

J Bostock15 May 2024 21:56 UTC

9 points

0 comments7 min readLW link

Taming Infinity (Stat Mech Part 3)

J Bostock15 May 2024 21:43 UTC

9 points

0 comments7 min readLW link

J Bostock 6 May 2024 10:55 UTC
1 point
0
on: Biorisk is an Unhelpful Analogy for AI Risk
I agree with this point when it comes to technical discussions. I would like to add the caveat that when talking to a total amateur, the sentence:
AI is like biorisk more than it is like than ordinary tech, therefore we need stricter safety regulations and limits on what people can create at all.
Is the fastest way I’ve found to transmit information. Maybe 30% of the entire AI risk case can be delivered in the first four words.

Conserved Quantities (Stat Mech Part 2)

J Bostock4 May 2024 13:40 UTC

13 points

0 comments5 min readLW link

J Bostock 28 Apr 2024 19:22 UTC
1 point
0
in reply to: EGI’s comment on: So What’s Up With PUFAs Chemically?
I’d be most interested in detecting hydroperoxides, which is easier than detecting trans fats. I don’t know how soluble a lipid hydroperoxide is in hexane, but isopropanol-hexane mixtures are often used for lipid extracts and would probably work better.
Evaporation could probably be done relatively safely by just leaving the extract at room temperature (I would definitely not advise heating the mixture at all) but you’d need good ventilation, preferably an outdoor space.
I think commercial LCMS/GCMS services are generally available to people in the USA/UK, and these would probably be the gold standard for detecting various hydroperoxides. I wouldn’t trust IR spectroscopy to distinguish the hydroperoxides from other OH-group containing contaminants when you’re working with a system as complicated as a box of french fries.

J Bostock 28 Apr 2024 12:16 UTC
1 point
0
on: So What’s Up With PUFAs Chemically?
As far as I’m aware nobody claims trans fats aren’t bad.
See comment by Gilch, allegedly Vaccenic acid isn’t harmful. The particular trans-fats produced by isomerization of oleic and linoleic acid, however, probably are harmful. Elaidic acid for example is a major trans-fat component in margarines, which were banned.

J Bostock 28 Apr 2024 12:14 UTC
3 points
0
in reply to: gilch’s comment on: So What’s Up With PUFAs Chemically?
Yeah i was unaware of vaccenic acid. I’ve edited the post to clarify.

J Bostock 27 Apr 2024 20:07 UTC
9 points
0
in reply to: Joel Burget’s comment on: So What’s Up With PUFAs Chemically?
I’ve also realized that it might explain the anomalous (i.e. after adjusting for confounders) effects of living at higher altitude. The lower the atmospheric pressure, the less oxygen available to oxidize the PUFAs. Of course some foods will be imported already full of oxidized FAs and that will be too late, but presumably a McDonalds deep fryer in Colorado Springs is producing less PUFAs/hour than a correspondingly-hot one in San Francisco.
This feels too crazy to put in the original post but it’s certainly interesting.

J Bostock

Risk Overview of AI in Bio Research

How to Bet­ter Re­port Sparse Au­toen­coder Performance

To Limit Im­pact, Limit KL-Divergence

In­tro­duc­ing Statis­ti­cal Utility Me­chan­ics: A Frame­work for Utility Maximizers

Tam­ing In­finity (Stat Mech Part 3)

Con­served Quan­tities (Stat Mech Part 2)

How to Better Report Sparse Autoencoder Performance

To Limit Impact, Limit KL-Divergence

Introducing Statistical Utility Mechanics: A Framework for Utility Maximizers

Taming Infinity (Stat Mech Part 3)

Conserved Quantities (Stat Mech Part 2)