Couple of more takeaways I jotted down:
PaLM2 followed closely [to] Chinchilla optimal scaling. No explicit mention of number of parameters, data withheld. Claim performance is generally equivalent to GPT-4. Chain-of-thought reasoning is called out explicitly quite a bit.
Claims of longer context length, but no specific size in the technical report. From the api page: “75+ tokens per second and a context window of 8,000 tokens,”
“The largest model in the PaLM 2 family, PaLM 2-L, is significantly smaller than the largest PaLM model but uses more training compute” “The pre-training corpus is significantly larger than the corpus used to train PaLM [which was 780B tokens]”
I tracked down the exact quote where Prof Marcus was talking about timelines with regards to jobs. He mentioned 20-100 years (right before the timestamp) and then went on to say: https://youtu.be/TO0J2Yw7usM?t=2438
“In the long run, so called AGI really will replace a large fraction of human jobs. We’re not that close to AGI, despite all the media hype and so forth … in 20 years people will laugh at this … but when we get to AGI, let’s say it is 50 years that is really going to have profound effects on labor...”
Christina Montgomery is explicitly asked “Should we have one” [referring to a new agency] by Senator Lindsey Graham and says “I don’t think so” at https://youtu.be/TO0J2Yw7usM?t=4920