It’s very interesting to see the implications of how well transformers will continue to scale. Here are some stats:
Megatron-Truing NLG performs better, and even if the difference is small, I’ve seen comparisons with smaller models where even small differences of 1% means there is a noticeable difference in intelligence when using the models for text generation.
Thank you aphyer! You’re absolutely right, the size is 3.85, so it needs like 0.8 energy per turn.
5⁄0.8≈6, with this number it’s more likely for the animal to die out by random chance.I redid the experiment on the Benthic biome, and removed the cold resistance, set Seeds in the biome to 1, and the species survives with around 6-21 in population, which is around the population of 12 we would expect.
A question about mechanics:
When I run a simulation with only this animal on Tundra:
Cold (Allows survival in the Tundra): Yes
It goes down to 2-3 population for a while, and then dies after 25 generations.
However, I calculate the size is 1.85 (0.1 base + 0.75 armour + 1 eats seed size), and it should only need 20% of that in food per turn, which is <0.4. Tundra has 1 seeds per turn, which should give 5 food. Does anyone have an explanation why it doesn’t stabilize around a population of 5⁄0.4≈12?
I’m not sure if I just misunderstood the mechanics, or if maybe I messed up something when testing the code, but would be grateful if anyone could clarify what should happen in this situation.
That’s a good question. I can see both the scenario of price increasing both more or less than that.
The compute needed for the training is in this example the only significant factor in price, and that’s the one that scales at 10x cost for 5x size. (However, sadly I can’t find the source where I read it, so again, please feel free to share if someone has a better method for estimation.)
Building the infrastructure for training a model of several trillion parameters could easily create a chip shortage, drastically increasing costs of AI chips, and thus leading to training costing way more than the estimate.
However, it might be possible that building a huge infrastructure would have many benefits of scale. For example Google might build a TPU “gigafactory” and because of the high volume of produced TPUs, the price per TPU would decrease significantly.
Isn’t that math wrong? 17 trillion parameters is 100x more than GPT-3, so the cost should be at least 100x higher, so if the cost is $12M now it should be at least a billion dollars. I think it would be about $3B. It would probably cost a bit more than that since the scaling law will probably bend soon and more data will be needed per parameter. Also there may be inefficiencies from doing things at that scale. I’d guess maybe $10B give or take an order of magnitude.
You are absolutely correct, the cost must be more than 100x if costs scales faster than number of parameters. I have now updated the calculations and got a 727 increase of costs.
Maybe the relevant sort of AI system won’t just stream-of-consciousness generate words like GPT-3 does, but rather be some sort of internal bureaucracy of prompt programming that e.g. takes notes to itself, spins of sub-routines to do various tasks like looking up facts, reviews and edits text before finalizing the product, etc. such that 10x or even 100x compute is spent per word of generated text. This would mean $1 - $10 per 700 words generated, which is maybe enough to be outcompeted by humans for many applications.
I suspect you might be right. If we imagine the human brain, every neuron is reused tens of times by the time it takes to say a single word, so I it doesn’t seem unlikely that a good architecture reuses neurons multiple times before “outputting” something. So I think an increase as you say with about 10-100x is not unlikely.
Maybe I misunderstood something, but is really the odds of getting sick yourself growing linearly?Let’s say if you meet 1 person the odds of getting corona is 1%. That would mean that meeting 101 people would result in a 101% chance in getting corona. Sure, it is almost linear until you get to about 10 people (in my example), but the curve of risk to get corona should follow something like 1-(0.99)^n if we assume the risk of getting corona is equal from every person you have contact with.Spreading it linearly with the number of people you meet however I can understand the reasoning behind.
From the scientific paper I mentioned in the first comment they used different questions, here is an example:
“The questionnaires asked for interval estimates of birth years for ﬁve famous characters from world history (Mohammed, Newton, Mozart, Napoleon, and Einstein), and the years of death for ﬁve other famous persons (Nero, Copernicus, Galileo, Shakespeare, and Lincoln).”
I tested to answer these questions myself with 90% confidence intervals, and my result was that I was correct 7⁄10 questions, so seems like I still am overconfident in my answers even though I just read about it. But to be fair, 10 questions are far from statistical significance.
Wow, really interesting article.
It is really interesting that the median result was negative, although strategic overconfidence as some has pointed out explains some of it.
Found a very interesting paper on the subject of overconfidence: https://www.researchgate.net/publication/227867941_When_90_confidence_intervals_are_50_certain_On_the_credibility_of_credible_intervals
“Estimated confidence intervals for general knowledge items are usually too narrow. We report five experiments showing that people have much less confidence in these intervals than dictated by the assigned level of confidence. For instance, 90% intervals can be associated with an estimated confidence of 50% or less (and still lower hit rates). Moreover, interval width appears to remain stable over a wide range of instructions (high and low numeric and verbal confidence levels).”
Yes, I agree that governments will be likely to “defend” their local fiat currencies, since they both have incentives (like control of the currency and production of more, and often relies on creating more to fund budget deficits), as well as the means to defend it.
I personally would really like such a bank account, that automatically invested the money in the account in the way I want, if the fees were low enough.
Yes, and also usually the currency becomes safer (harder to “hack”) with more miners.
Thank you for the feedback, I changed it to commodities in the post.
Yes, it seems logical that the better we can predict the future, the better our decision will be today.
I find it interesting that some of the people who have had the biggest impact on the world, like Jeff Bezos and Elon Musk, say they have been heavily inspired by sci-fi. Which might indicate it is highly useful to imagine the future in order to do big change.
Great thoughts, it was very interesting to read. Some thoughts occurred to me that might be of interest to others, and others input on them I would find interesting as well.
Imagine an AI was trained as an oracle, trained on a variety of questions and selected based on how “truthful” the answers were. Assuming this approach was possible and could create an AGI, might that be a viable way to “create selection pressures towards agents which think in the ways we want”? In other word, might this create an aligned AI regardless of extreme optima?
Another thought that occurred to me is: let’s say an AI that is “let loose” and spreads to new hardware, encounters the “real world” and is exposed to massive amounts of new data. then the range of circumstances would of course be considered very broad. In the example with the oracle, potentially everything could be the same during training and after the training, except for the questions asked. Could this potentially increase safety, since the range of circumstances it would need to have desirable motivations in would be comparatively narrow?Lastly, I’m new to LessWrong, so I’m extra grateful for all input regarding how I can improve my reasoning and commenting skills.
It does seem like a reasonable analogy that the Neuralink could be like a “sixth sense” or an extra (very complex) muscle.
Elon Musk have argued that humans can take in a lot of information through vision (by looking at a picture for one second, you can take in a lot of information). Text/speech however is not very information dense. He argues that therefore since we use keyboards or speech to communicate information outwards, it takes a long time.
One possibility is that AI could help interpreting the data uploaded, and filling in details to make the uploaded information more useful. For example you could “send” an image of something through the Neuralink, an AI would interpret it, fill in the details that are unclear, and then you would have an image, very close to what you imagined, containing several hundred or maybe thousands kilobytes of information.
The neuralink would only need to increase the productivity of an occupation by a few percent, to be worth the investment of 3000-4000 USD that Elon Musk believes the price will drop to.
That does sound like a rational approach, especially since the complexity of the problem makes it near impossible to promote a single approach.
Yes, I can see why it would be greater motivation for people to act today, if they read a book where the actions today to a greater extent determine the outcome of the first AGI/ASI.
And I can see some ways we today could increase the likelihood of aligned AI, like a international cooperation program, or very high funding of organisations like MIRI and CHAI. I presume the people that aided to the safe creation of AI, could be painted as heroes, which might also work as a motivator for the reader to act.
A clear call to action after the book seems like an effective way to increase the chance that people will act, I will include that in the book if we finish writing it.
If you have a specific approach to aligned AI, that you think is likely to work and would like to write the book about, I think it would be very interesting to discuss, and potentially be included in my book as well.
Great and interesting post!
When it comes to presenting a “path of change” that individuals can contribute too, I can think of two:
1. Donate money to organisations like MIRI, CHAI and others working on AI alignment/safety.
2. Becoming involved in the community and doing research/pushing policies themselves.
Both of these actions likely require “radicalising” the importance of AI safety, which could be used as an argument for why radicalisation of a few people might be more effective aim with a novel, rather than trying to influence the masses. Although to me it seems reasonable that a novel can do both.
My sister and I are currently writing a novel where there is an arms race to develop AGI/ASI. One of the main characters manage to become the leader of one project to create ASI, and she insists on a safer approach, even if it takes longer time. It seems like they will lose the arms race, endangering the entire human species, until the climax where they find a faster way to create AGI and thus have time to do so safely. The book ends with an utopia where the ASI is aligned and everyone lives happily ever after. Also the book will bring up the importance of organisations like MIRI and CHAI that has done work on AI alignment/safety.
Do you believe that sounds like a good approach towards influencing towards taking more consideration towards AI safety/alignment? (assuming the plot is interesting and feel realistic to the reader)
Btw this is my first comment, so any feedback on how I can improve my commenting skill is welcome and appreciated.