I am a longtime LessWrong and SSC reader who finally got around to starting a blog. I would love to hear feedback from you! https://harsimony.wordpress.com/
harsimony
Why I’m Optimistic About Near-Term AI Risk
Georgism in Space
Cool project! I was thinking along similar lines when I recently went through a bunch of his posts to find interesting ideas and collected them:
https://harsimony.wordpress.com/2021/10/20/robin-hansons-other-ideas/
Whoever decides to work on this project up might be able to use this as a starting point.
Current AI Models Seem Sufficient for Low-Risk, Beneficial AI
Some other order-of-magnitude estimates on available data, assuming words roughly equal tokens:
Wikipedia: 4B English words, according to this page.
Library of Congress: from this footnote a assume there are at most 100 million books worth of text in the LoC and from this page assume that books are 100k words, giving 10T words at most.
Constant writing: I estimate that a typical person writes at most 1000 words per day, with maybe 100 million people writing this amount of English on the internet. Over the last 10 years, these writers would have produced 370T words.
Research papers: this page estimates ~4m papers are published each year, at 10k words per paper with 100 years of research this amounts to 4T words total.
So it looks like 10T words is an optimistic order-of-magnitude estimate of the total amount of data available.
I assume the importance of a large quantity of clean text data will lead to the construction of a text database of ~1T tokens and that this database (or models trained on it) will eventually be open-sourced.
From there, it seems like really digging in to the sources of irreducible error will be necessary for further scaling. I would guess that a small part of this is “method error” (training details, context window, etc.) but that a significant fraction comes from intrinsic text entropy. Some entropy has to be present, or else text would have no information value.
I would guess that this irreducible error can probably be broken down into:
-
Uncertainty about the specific type of text the model is trying to predict (e.g. it needs some data to figure out that it’s supposed to write in modern English, then more data to learn that the writing is flamboyant/emotional, then more to learn that there is a narrative structure, then more to determine that it is a work of fiction etc.). The model will always need some data to specify which text-generating sub-model to use. This error can be reduced with better prompts (though not completely eliminated)
-
Uncertainty about location within the text. For example, even if the model had memorized a specific play by Shakespeare, if you asked it to do next-word prediction on a random paragraph from the text, it would have trouble predicting the first few words simply because it hasn’t determined which paragraph it has been given. This error should go away when looking at next-word prediction after the model has been fed enough data. Better prompts and a larger context window should help.
-
Uncertainty inherent to the text. This related to the actual information content of the text, and should be irreducible. I’m not sure about the relative size of this uncertainty compared to the other ones, but this paper suggests an entropy of ~10 bits/word in English (which seems high?). I don’t know how entropy translates into training loss for these models. Memorization of key facts (or database access) can reduce the average information content of a text.
EDIT: also note that going from 10T to 100T tokens would only reduce the loss by 0.045, so it may not be worthwhile to increase dataset size beyond the 10T order-of-magnitude.
-
Precise P(doom) isn’t very important for prioritization or strategy
AI Will Multiply
Great post!!
I think the section “Perhaps we don’t want AGI” is the best argument against these extrapolations holding in the near-future. I think data limitations, practical benefits of small models, and profit-following will lead to small/specialized models in the near future.
https://www.lesswrong.com/posts/8e3676AovRbGHLi27/why-i-m-optimistic-about-near-term-ai-risk
Could someone help me collect the relevant literature here?
I think the complete class theorems are relevant: https://www.lesswrong.com/posts/sZuw6SGfmZHvcAAEP/complete-class-consequentialist-foundations
The Non-Existence of Representative Agents: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3302656
Representative Agents: https://en.wikipedia.org/wiki/Representative_agent
John Wentworth on Subagents: https://www.lesswrong.com/posts/3xF66BNSC5caZuKyC/why-subagents
Counterfactual Contracts
Is it known whether predictive coding is easier to train than backprop? Local learning rules seem like they would be more parallelizable.
Every Model Learned by Gradient Descent Is Approximately a Kernel Machine seems relevant to this discussion:
We show, however, that deep networks learned by the standard gradient descent algorithm are in fact mathematically approximately equivalent to kernel machines, a learning method that simply memorizes the data and uses it directly for prediction via a similarity function (the kernel). This greatly enhances the interpretability of deep network weights, by elucidating that they are effectively a superposition of the training examples. The network architecture incorporates knowledge of the target function into the kernel.
Cryosleep
I wrote about the literature around the evolution of virulence previously which people might find useful for finding references on the topic.
In short, it is complicated:
-
zoonotic diseases (like COVID) don’t follow a consistent pattern early in their evolution because they are so far removed from being adapted to humans as a niche.
-
As long as there is a correlation between deadly-ness and ability to reproduce, diseases will face evolutionary pressure to become more/less deadly depending on the circumstances. This is strongly influenced by transmission mode (but isn’t the only factor):
a. Disease transmitted by close person-to-person contact → disease can’t make host too sick, or host won’t come into contact with others → disease evolves to cause milder infection.
Example: HIV has relatively few effects on hosts for years before culminating in AIDS.
b. Disease transmission doesn’t require person-to-person contact → disease can optimize other features of transmission such as producing many copies and using host resources → disease evolves to cause severe infection.
Example: malaria is spread via mosquitos and doesn’t require person-to-person contact in order to spread, which might explain why it is such a debilitating disease.
-
Grey Goo Requires AI
Bootstrapping Language Models
If no near-term alignment strategy, research should aim for the long-term
Do you have a good system for saving and prioritizing the ideas you have?
Building a habit of noticing a new idea and writing it down (even if it’s silly) has increased my overall output dramatically.
Modifying Jones’ “AI Dilemma” Model
I personally don’t expect very high efficacy, and I do expect that Loyal will sell the drug for the next 4.5 years. However, as long as Loyal is clear about the nature of the approval of the drug, I think this is basically fine. People should be allowed to, at their own expense, give their pets experimental treatments that won’t hurt them and might help them. They should also be able to do the same for themselves, but that’s a fight for another day.
Agreed! Beyond potentially developing a drug, think Loyal’s strategy has the potential to change regulations around longevity drugs, raise profits for new trials, and bring attention/capital to the longevity space. I don’t see many downside risks here unless the drug turns out to be unsafe.
Note: I’m not affiliated with Loyal or any other longevity organization, I’m going off the same outside information as the author.
I think there’s a substantial chance that this criticism is misguided. A couple points:
The term “efficacy nod” is a little confusing, the FDA term is “reasonable expectation of effectiveness”, which makes more sense to me, it sounds like the drug has enough promise that the FDA thinks its worth continuing testing. They may not have actual effectiveness data yet, just evidence that it’s safe and a reasonable explanation for why it might work.
I don’t know what the standard practices are for releasing trial data, especially for an initial trial like this. Are we sure this isn’t standard practice? Even if it isn’t, I don’t think this is sufficient to assume that Loyal is being disingenuous.
Take the outside view here, both Loyal and the FDA have veterinarians who seem to think that the drug is promising.
I also think there’s a reasonable argument to be made for an IGF-1 inhibitor in large-breed dogs. Large breed dogs often die of heart disease which is often due to dilated cardiomyopathy (heart becomes enlarged and can’t pump blood effectively). This enlargement can come from hypertrophic cardiomyopathy (overgrowth of the heart muscle). I don’t know if it’s known why large breed dogs have hypertrophic cardiomyopathy, but maybe IGF-1 makes the heart muscle grow over a dog’s lifetime which would suggest that an IGF-1 inhibitor is worth trying. It’s also suggestive that diabetes is a risk-factor for cardiomegaly (enlarged heart).
With this in mind, we can answer the next point:
So my theory says that high IGF-1 over a lifetime progressively increases the size of the heart muscle until you get dilated cardiomyopathy. Stopping IGF-1 even in middle age might help. We can falsify this theory by checking if large breed dogs show heart enlargement over their lifetime (instead of growth stopping after puberty like it should). Why would heart muscle keep growing while nothing else does? I’m not sure.
Now we can turn to the question: do large breed dogs actually have elevated IGF-1?
Looking at your first figure, the answer seems to be yes! There’s a straightforward correlation between bodyweight and IGF-1 concentration, the slope would likely be higher without the 3 outliers on the right. Notice also that the sample doesn’t have many large breed dogs (great danes weigh 110-180 lbs). I would guess that those 3 dogs are large breed dogs, and they do in fact have IGF-1 levels higher than most of the dogs in the sample.
Now lets turn to the second plot, we see that IGF-1 concentration decreases with age. Remember that there is survivorship bias at higher ages, large breed dogs with higher IGF-1 will die at around 72 months while chihuahuas will live over 150 months. Declining IGF-1 with age is exactly what we should see if we expected IGF-1 to correlate with longevity! The plot supports the theory that IGF-1 is important for aging, you can’t cherry-pick outliers and ignore the overall relationship in the plot.
I’m no expert, but I think there’s interest in IGF-1 inhibitors for longevity. To quote Sarah Constantin:
I would guess this is one of the reasons Loyal had an interest in IGF-1 inhibitors from the outset.
The dose makes the poison! Every drug has negative effects at a high enough dose, the trials will determine if these actually arise at the dose they are using.
I’m no expert, but this evidence doesn’t seem sufficient to stop research on this drug. Will it prove safe or effective? Will it also benefit human health? I have no idea, but unless we discover that the drug is hurting patients, I think its fine for Loyal to carry on.