alexlyzhov

Karma: 387

alexlyzhov 3 Aug 2020 16:12 UTC
1 point
on: Agentic Language Model Memes
With enough iterations, we could end up with a powerful self replicating memetic agent with arbitrary goals and desires coordinating with copies and variations of itself to manipulate humans and gain influence in the real world.
I felt initially cold towards the whole article, but now I mostly agree.
The goals of text agents might be programmable by humans directly (consider the economic pressure towards creating natural language support agents / recommendation systems / educators / etc). Prompts in their current form 1) only have significant influence over short text window after the prompt and 2) only cause likely text continuations to emerge (whereas you might want to write a text that has low probability conditional on the prompt to achieve your goal). Prompts could be replaced by specific programs by modifying the processes of training and inference. For example, additional sources of self-supervision can be incorporated (debate, or consistency losses).
The closest analogues are probably modern social media memes.
I would name chain letters as the closest analogue. Another one is computer viruses (because humans design viruses with a goal in mind, and then viruses might achieve these goals and self-replicate).

alexlyzhov 18 Dec 2020 21:30 UTC
3 points
on: Commitment and credibility in multipolar AI scenarios
Super thoughtful post!
I get the feeling that I’m more optimistic about post-hoc interpretability approaches working well in the case of advanced AIs. I’m referring to the ability of an advanced AI in the form of a super large neural network-based agent to take another super large neural network-based agent and verify its commitment successfully. I think this is at least somewhat likely to work by default (i.e. scrutinizing advanced neural network-based AIs may be easier than obfuscating intentions). I also think this may potentially not require that much information about the training method and training data.
I thought before that this doesn’t matter in practice because of possibility of self-modification and successor agents. But I now think that at least in some range of potential situations verifying the behavior of a neural network seems enough for credible commitment when an agent pre-commits to using this neural network e.g. via a blockchain.
Also, are you sure that the fact that people can’t simulate nematodes fits well in this argument? I may well be mistaken but I thought that we do not really have neural network weights for nematodes, we only have the architecture. In this case it seems natural that we can’t do forward passes.

alexlyzhov 21 Dec 2020 12:05 UTC
1 point
on: How to eradicate the desire to check time-wasting sites
Great approach. I use it in a slightly different way—I have a rule that each time I open a website from a list, I have to report it to my assistant, and I have to report a good enough reason. I also use website blockers on all platforms as an additional cost (Block Site on Chrome, Screen Timer on Android). But website blockers don’t work that well on their own—I sometimes have to visit those websites for legitimate reasons and so I have to disable a blocker, and after a while I slip and the bar for disabling them gets too low.

alexlyzhov 5 Jan 2021 23:34 UTC
9 points
on: DALL-E by OpenAI
On prior work: they cited l-lxmert (Sep 2020) and TReCS (Nov 2020) in the blogpost. These are the baselines it seems.
https://arxiv.org/abs/2011.03775
https://arxiv.org/abs/2009.11278
The quality of objects and scenes there is far below the new model. They are often just garbled and not looking quite right.
But more importantly, the best they could sometimes understand from the text is something like “a zebra is standing in the field”, i.e. the object and the background, all the other stuff was lost. With this model, you can actually use much more language features for visualization. Specifying spatial relations between the objects, specifying attributes of objects in a semantically precise way, camera view, time and place, rotation angle, printing text on objects, introducing text-controllable recolorations and reflections. I may be mistaken but I think I haven’t seen any convincing demonstrations of any of these capabilities in an open-domain image+text generation before.
One evaluation drawback that I see is they haven’t included any generated human images in the blogpost besides busts. Because of this, there’s a chance scenes with humans are of worse quality, but I think they would nevertheless be very impressive compared to prior work, given how photorealistic everything else looks.
I’m not sure what accounts for this performance, but it may well mostly be more parameters (2-3 orders of magnitude more compared to previous models?) plus more and better data (that new dataset of image-text pairs they used for CLIP?)

alexlyzhov 6 Jan 2021 0:44 UTC
18 points
in reply to: Matt Goldenberg’s comment on: DALL-E by OpenAI
Given that the details in generated objects are often right, you can use superresolution neural models to upscale the images to a needed size.

alexlyzhov 6 Jan 2021 12:55 UTC
26 points
in reply to: nostalgebraist’s comment on: DALL-E by OpenAI
This is the link to Yudkowsky discussion of concept merging with the triangular lightbulb example: https://intelligence.org/files/LOGI.pdf#page=10
Generated lightbulb images: https://i.imgur.com/EHPwELf.png

alexlyzhov 7 Jan 2021 23:08 UTC
3 points
on: Covid 1/7: The Fire of a Thousand Suns
Suppose some variant like the SA one is vaccine-evading and some people will have to vaccinate a second time with an adapted vaccine. What are our priors for the safety of vaccinating repeatedly this way (either with the same or different delivery methods)? If we have two vaccines that are pretty safe, are side effects of vaccinating with the first one and then vaccinating with the second, similar one almost surely on the order of side effects from using just one kind of vaccine?

On language modeling and future abstract reasoning research

alexlyzhov25 Mar 2021 17:43 UTC

3 points

1 comment1 min readLW link

(docs.google.com)

alexlyzhov 25 Mar 2021 18:05 UTC
1 point
on: On language modeling and future abstract reasoning research
Here’s a list of papers related to reasoning and RL for language models that were published in fall 2020 and that have caught my eye—you may also find it useful if you’re interested in the topic.
RL
Learning to summarize from human feedback—finetune GPT-3 to generate pieces of text to accomplish a complex goal, where performance ratings are provided by humans.
Keep CALM and Explore: Language Models for Action Generation in Text-based Games—an instance of the selector approach where a selector chooses between generated text candidates, similarly to “GeDi: Generative Discriminator Guided Sequence Generation”. But here they actually use it for RL training.
Graphs
Graph-based Multi-hop Reasoning for Long Text Generation—two-stage approach to language modeling where on the 1st stage you process a knowledge graph corresponding to the context to obtain paths between the concepts, on the 2nd stage you generate text incorporating these paths. You don’t need to have graph data, they can be built from text automatically. Seems to generate texts that are more diverse, informative and coherent compared to plain transformers. Seems like a quick fix for the problem of language models where they don’t easily have coherent generation intents from letter to letter.
New losses
Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval—a new transformer-based retrieval model (retrieves an answer to a question by predicting the location of an answer in a pool of documents). This one is a multi-hop model which means that it searches for answers iteratively using information gathered during previous searches. Retrieval models have been successfully combined with text generation in the past to boost question answering performance.
MultiCQA: Zero-Shot Transfer of Self-Supervised Text Matching Models on a Massive Scale—take data from 140 StackExchange forums, train a model to match questions to answers. Performs well at answer selection in other domains unrelated to StackExchange.
Multimodal
Beyond Language: Learning Commonsense from Images for Reasoning—previously lots of methods used images + text in transformers to do e.g. visual reasoning. This one differs in that it shows that even if images are not present at test time, commonsense reasoning is still improved.
Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision—solve exactly the same problem as “Beyond Language: Learning Commonsense from Images for Reasoning” which is to use images at training-time to benefit text-only generation.
Controllability
Back to the Future: Unsupervised Backprop-based Decoding for Counterfactual and Abductive Commonsense Reasoning—a language model training method that allows to generate missing text based on the past and the future. In addition to giving more control over the generation, it also improves abductive reasoning (hypothesis generation).
GeDi: Generative Discriminator Guided Sequence Generation—similar to the selector RL training idea I described for controlled generation, but without RL training. After generating a bunch of continuations with a generator, apply a selector trained in a different way to choose between them. This is a more sophisticated way to control generation compared to programming via prompts.
Summarize, Outline, and Elaborate : Long-Text Generation via Hierarchical Supervision from Extractive Summaries—introduces a sampling strategy for text generative models where it first generates a high-level plan of the text with summaries of passages, and then generates the passages. Improves the training efficiency by a lot, and improves the likelihood of generated text as well.
Also here’s list of some earlier papers that I found interesting:
Analyzing mathematical reasoning abilities of neural models (transformers for symbolic math reasoning, Apr′19)
Transformers as Soft Reasoners over Language (evaluating logical reasoning in natural language domain, May′19)
Teaching Temporal Logics to Neural Networks (transformers for logical inference, Jun′19)
REALM: Retrieval-Augmented Language Model Pre-Training (Google, Feb′20)
Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering (FAIR, Jul′20)

“AI and Compute” trend isn’t predictive of what is happening

alexlyzhov2 Apr 2021 0:44 UTC

133 points

16 comments1 min readLW link

alexlyzhov 2 Apr 2021 0:49 UTC
14 points
on: “AI and Compute” trend isn’t predictive of what is happening
My calculation for AlphaStar: 12 agents * 44 days * 24 hours/day * 3600 sec/hour * 420*10^12 FLOP/s * 32 TPUv3 boards * 33% actual board utilization = 2.02 * 10^23 FLOP which is about the same as AlphaGo Zero compute.
For 600B GShard MoE model: 22 TPU core-years = 22 years * 365 days/year * 24 hours/day * 3600 sec/hour * 420*10^12 FLOP/s/TPUv3 board * 0.25 TPU boards / TPU core * 0.33 actual board utilization = 2.4 * 10^21 FLOP.
For 2.3B GShard dense transformer: 235.5 TPU core-years = 2.6 * 10^22 FLOP.
Meena was trained for 30 days on a TPUv3 pod with 2048 cores. So it’s 30 days * 24 hours/day * 3600 sec/hour * 2048 TPUv3 cores * 0.25 TPU boards / TPU core * 420*10^12 FLOP/s/TPUv3 board * 33% actual board utilization = 1.8 * 10^23 FLOP, slightly below AlphaGo Zero.
Image GPT: “iGPT-L was trained for roughly 2500 V100-days”—this means 2500 days * 24 hours/day * 3600 sec/hour * 100*10^12 * 33% actual board utilization = 6.5 * 10^9 * 10^12 = 6.5 * 10^21 FLOP. There’s no compute data for the largest model, iGPT-XL. But based on the FLOP/s increase from GPT-3 XL (same num of params as iGPT-L) to GPT-3 6.7B (same num of params as iGPT-XL), I think it required 5 times more compute: 3.3 * 10^22 FLOP.
BigGAN: 2 days * 24 hours/day * 3600 sec/hour * 512 TPU cores * 0.25 TPU boards / TPU core * 420*10^12 FLOP/s/TPUv3 board * 33% actual board utilization = 3 * 10^21 FLOP.
AlphaFold: they say they trained on GPU and not TPU. Assuming V100 GPU, it’s 5 days * 24 hours/day * 3600 sec/hour * 8 V100 GPU * 100*10^12 FLOP/s * 33% actual GPU utilization = 10^20 FLOP.

alexlyzhov 3 Apr 2021 2:02 UTC
5 points
in reply to: Bucky’s comment on: “AI and Compute” trend isn’t predictive of what is happening
I appreciate questioning of my calculations, thanks for checking!
This is what I think about the previous avturchin calculation: I think that may have been a misinterpretation of DeepMind blogpost. In the blogpost they say “The AlphaStar league was run for 14 days, using 16 TPUs for each agent”. But I think it might not be 16 TPU-days for each agent, it’s 16 TPU for 14/n_agent=14/600 days for each agent. And 14 days was for the whole League training where agent policies were trained consecutively. Their wording is indeed not very clear but you can look at the “Progression of Nash of AlphaStar League” pic. You can see there that, as they say, “New competitors were dynamically added to the league, by branching from existing competitors”, and that the new ones drastically outperform older ones, meaning that older ones were not continuously updated and were only randomly picked up as static opponents.
From the blogpost: “A full technical description of this work is being prepared for publication in a peer-reviewed journal”. The only publication about this is their late-2019 Nature paper linked by teradimich here which I have taken the values from. They have upgraded their algorithm and have spent more compute in a single experiment by October 2019. 12 agents refers to the number of types of agents and 600 (900 in the newer version) refers to the number of policies. About the 33% GPU utilization value—I think I’ve seen it in some ML publications and in other places for this hardware, and this seems like a reasonable estimate for all these projects, but I don’t have sources at hand.

alexlyzhov 3 Apr 2021 14:37 UTC
10 points
on: “AI and Compute” trend isn’t predictive of what is happening
gwern has recently remarked that one cause of this is supply and demand disruptions and this may be a temporary phenomenon in principle.

alexlyzhov 13 Apr 2021 15:46 UTC
3 points
in reply to: Daniel_Burfoot’s comment on: Discussion of concrete near-to-middle term trends in AI
The prediction about CV doesn’t seem to have aged that well in my view. Others are going fairly well!

alexlyzhov 28 Apr 2021 17:53 UTC
1 point
on: Scott Alexander 2021 Predictions: Market Prices
Ethereum above 0.05 BTC: 70%
This already happened today (a day after this post).
I would have put this waay higher due to value proposition of ethereum + massive ethereum ecosystem + the fact that it hasn’t rallied that much yet against BTC compared to its 2017 values + bright future plans for ethereum + competitors forced to integrate with ethereum and lacking some of its properties. IDK if these are objectively good reason for expecting growth but they are there in my personal model.

alexlyzhov 6 Jun 2021 12:33 UTC
4 points
in reply to: dynomight’s comment on: Alcohol, health, and the ruthless logic of the Asian flush
Yep, the first google result http://xn--80akpciegnlg.xn—p1ai/preparaty-dlya-kodirovaniya/disulfiram-implant/ (in Russian) says that you use an implant with 1-2g of the substance for up to 5-24 months and that “the minimum blood level of disulfiram is 20 ng/ml; ”. This paper https://www.ncbi.nlm.nih.gov/books/NBK64036/ says “Mild effects may occur at blood alcohol concentrations of 5 to 10 mg/100 mL.”

alexlyzhov 16 Jun 2021 1:46 UTC
1 point
on: Experiments with a random clock
This is a very neat idea, is there any easy way to enable this for Android and Google Calendar notifications? I guess not

alexlyzhov 16 Jun 2021 2:15 UTC
4 points
on: A Breakdown of AI Chip Companies
“I have heard that they get the details wrong though, and the fact that they [Groq] are still adversing their ResNet-50 performance (a 2015 era network) speaks to that.”
I’m not sure I fully get this criticism: ResNet-50 is the most standard image recognition benchmark and unsurprisingly it’s the only (?) architecture that NVIDIA lists in their benchmarking stats for image recognition as well: https://developer.nvidia.com/deep-learning-performance-training-inference.

alexlyzhov 23 Jul 2021 23:22 UTC
2 points
in reply to: Jsevillamol’s comment on: “AI and Compute” trend isn’t predictive of what is happening
It should be referenced here in Figure 1: https://arxiv.org/pdf/2006.16668.pdf

alexlyzhov 19 Aug 2021 18:08 UTC
2 points
in reply to: Jsevillamol’s comment on: “AI and Compute” trend isn’t predictive of what is happening
Ohh OK I think since I wrote “512 TPU cores” it’s 512x512, because in Appendix C here https://arxiv.org/pdf/1809.11096.pdf they say it corresponds to 512x512.

alexlyzhov

On lan­guage mod­el­ing and fu­ture ab­stract rea­son­ing research

“AI and Com­pute” trend isn’t pre­dic­tive of what is happening

On language modeling and future abstract reasoning research

“AI and Compute” trend isn’t predictive of what is happening