alexlyzhov

Karma: 387

“AI and Compute” trend isn’t predictive of what is happening

alexlyzhov2 Apr 2021 0:44 UTC

133 points

16 comments1 min readLW link

alexlyzhov 6 Jan 2021 12:55 UTC
26 points
in reply to: nostalgebraist’s comment on: DALL-E by OpenAI
This is the link to Yudkowsky discussion of concept merging with the triangular lightbulb example: https://intelligence.org/files/LOGI.pdf#page=10
Generated lightbulb images: https://i.imgur.com/EHPwELf.png

alexlyzhov 9 Apr 2022 8:17 UTC
21 points
in reply to: Ben Pace’s comment on: It’s time for EA leadership to pull the fast-takeoff fire alarm.
No disagreements here; I just want to note that if “the EA community” waits too long for such a pivot, at some point AI labs will probably be faced with people from the general population protesting because even now a substantial share of the US population views the AI progress in a very negative light. Even if these protests don’t accomplish anything directly, they might indirectly affect any future efforts. For example, an EA-run fire alarm might be compromised a bit because the memetic ground would already be captured. In this case, the concept of “AI risk” would, in the minds of AI researchers, shift from “obscure overconfident hypotheticals of a nerdy philosophy” to “people with different demographics, fewer years of education, and a different political party than us being totally unreasonable over something that we understand far better”.

alexlyzhov 1 Oct 2022 0:48 UTC
19 points
17
in reply to: lsusr’s comment on: A Few Terrifying Facts About The Russo-Ukrainian War
I don’t expect Putin to use your interpretation of “d” instead of his own interpretation of it which he is publicly advertising whenever he has a big public speech on the topic.

From the latest speech:

> In the 80s they had another crisis they solved by “plundering our country”. Now they want to solve their problems by “breaking Russia”.

This directly references an existential threat.

From the speech a week ago:

> The goal of that part of the West is to weaken, divide and ultimately destroy our country. They are saying openly now that in 1991 they managed to split up the Soviet Union and now is the time to do the same to Russia, which must be divided into numerous regions that would be at deadly feud with each other.

Same.

Also, consider nuclear false flags—the frame for them, including in these same speeches, was created and maintained throughout the entire year.

alexlyzhov 22 Jun 2022 20:42 UTC
18 points
8
in reply to: Wei Dai’s comment on: The inordinately slow spread of good AGI conversations in ML
I could imagine that OpenAI getting top talent to ensure their level of research achievements while also filtering people they hire by their seriousness about reducing civilization-level risks is too hard. Or at least it could easily have been infeasible 4 years ago.
I know a couple of people at DeepMind and none of them have reducing civilization-level risks as one of their primary motivations for working there, as I believe is the case with most of DeepMind.

alexlyzhov 6 Jan 2021 0:44 UTC
18 points
in reply to: Matt Goldenberg’s comment on: DALL-E by OpenAI
Given that the details in generated objects are often right, you can use superresolution neural models to upscale the images to a needed size.

alexlyzhov 6 Apr 2022 5:51 UTC
14 points
on: Testing PaLM prompts on GPT3
Are PaLM outputs cherry-picked?
I reread the description of the experiment and I’m still unsure.
The protocol is on page 37 goes like this:
- the 2-shot exemplars used for few-shot learning were not selected or modified based on model output. I infer this from the line “the full exemplar prompts were written before any examples were evaluated, and were never modified based on the examination of the model output”.
- greedy decoding is used, so they couldn’t filter outputs given a prompt.
What about the queries (full prompt without the QAQA few-shot data part)? Are they included under “the full exemplar prompts” or not? If they are there’s no output selection, if they aren’t the outputs could be strongly selected with the selection magnitude unreported. On one hand, “full prompts” should refer to full prompts. On the other hand, they only use “exemplar” when talking about the QAQA part they prepend to every query versus “evaluated example” meaning the query.

alexlyzhov 2 Apr 2021 0:49 UTC
14 points
on: “AI and Compute” trend isn’t predictive of what is happening
My calculation for AlphaStar: 12 agents * 44 days * 24 hours/day * 3600 sec/hour * 420*10^12 FLOP/s * 32 TPUv3 boards * 33% actual board utilization = 2.02 * 10^23 FLOP which is about the same as AlphaGo Zero compute.
For 600B GShard MoE model: 22 TPU core-years = 22 years * 365 days/year * 24 hours/day * 3600 sec/hour * 420*10^12 FLOP/s/TPUv3 board * 0.25 TPU boards / TPU core * 0.33 actual board utilization = 2.4 * 10^21 FLOP.
For 2.3B GShard dense transformer: 235.5 TPU core-years = 2.6 * 10^22 FLOP.
Meena was trained for 30 days on a TPUv3 pod with 2048 cores. So it’s 30 days * 24 hours/day * 3600 sec/hour * 2048 TPUv3 cores * 0.25 TPU boards / TPU core * 420*10^12 FLOP/s/TPUv3 board * 33% actual board utilization = 1.8 * 10^23 FLOP, slightly below AlphaGo Zero.
Image GPT: “iGPT-L was trained for roughly 2500 V100-days”—this means 2500 days * 24 hours/day * 3600 sec/hour * 100*10^12 * 33% actual board utilization = 6.5 * 10^9 * 10^12 = 6.5 * 10^21 FLOP. There’s no compute data for the largest model, iGPT-XL. But based on the FLOP/s increase from GPT-3 XL (same num of params as iGPT-L) to GPT-3 6.7B (same num of params as iGPT-XL), I think it required 5 times more compute: 3.3 * 10^22 FLOP.
BigGAN: 2 days * 24 hours/day * 3600 sec/hour * 512 TPU cores * 0.25 TPU boards / TPU core * 420*10^12 FLOP/s/TPUv3 board * 33% actual board utilization = 3 * 10^21 FLOP.
AlphaFold: they say they trained on GPU and not TPU. Assuming V100 GPU, it’s 5 days * 24 hours/day * 3600 sec/hour * 8 V100 GPU * 100*10^12 FLOP/s * 33% actual GPU utilization = 10^20 FLOP.

alexlyzhov 3 Apr 2021 14:37 UTC
10 points
on: “AI and Compute” trend isn’t predictive of what is happening
gwern has recently remarked that one cause of this is supply and demand disruptions and this may be a temporary phenomenon in principle.

Papers on protein design

alexlyzhov27 May 2023 1:18 UTC

9 points

0 comments3 min readLW link

alexlyzhov 4 Jul 2023 23:52 UTC
9 points
2
in reply to: Garrett Baker’s comment on: Douglas Hofstadter changes his mind on Deep Learning & AI risk (June 2023)?

with the only recursive element of its thought being that it can pass 16 bits to its next running

I would name activations for all previous tokens as the relevant “element of thought” here that gets passed, and this can be gigabytes.

From how the quote looks, I think his gripe is with the possibility of in-context learning, where human-like learning happens without anything about how the network works (neither its weights nor previous token states) being ostensibly updated.

alexlyzhov 19 Oct 2022 2:16 UTC
9 points
1
on: My tentative interpretability research agenda—topology matching.
This idea tries to discover translations between the representations of two neural networks, but without necessarily discovering a translation into our representations.
I think this has been under investigation for a few years in the context of model fusion in federated learning, model stitching, and translation between latent representations in general.

Relative representations enable zero-shot latent space communication—an analytical approach to matching representations (though this is a new work, it may be not that good, I haven’t checked)

Git Re-Basin: Merging Models modulo Permutation Symmetries—recent model stitching work with some nice results

Latent Translation: Crossing Modalities by Bridging Generative Models—some random application of unsupervised translation to translation between autoencoder latent codes (probably not the most representative example)

alexlyzhov 8 May 2022 0:52 UTC
9 points
on: Is there a convenient way to make “sealed” predictions?
To receive epistemic credit, make sure that people would know you haven’t made all possible predictions on a topic this way and then revealed the right one after the fact. You can probably publish plaintext metadata for this.

alexlyzhov 5 Jan 2021 23:34 UTC
9 points
on: DALL-E by OpenAI
On prior work: they cited l-lxmert (Sep 2020) and TReCS (Nov 2020) in the blogpost. These are the baselines it seems.
https://arxiv.org/abs/2011.03775
https://arxiv.org/abs/2009.11278
The quality of objects and scenes there is far below the new model. They are often just garbled and not looking quite right.
But more importantly, the best they could sometimes understand from the text is something like “a zebra is standing in the field”, i.e. the object and the background, all the other stuff was lost. With this model, you can actually use much more language features for visualization. Specifying spatial relations between the objects, specifying attributes of objects in a semantically precise way, camera view, time and place, rotation angle, printing text on objects, introducing text-controllable recolorations and reflections. I may be mistaken but I think I haven’t seen any convincing demonstrations of any of these capabilities in an open-domain image+text generation before.
One evaluation drawback that I see is they haven’t included any generated human images in the blogpost besides busts. Because of this, there’s a chance scenes with humans are of worse quality, but I think they would nevertheless be very impressive compared to prior work, given how photorealistic everything else looks.
I’m not sure what accounts for this performance, but it may well mostly be more parameters (2-3 orders of magnitude more compared to previous models?) plus more and better data (that new dataset of image-text pairs they used for CLIP?)

alexlyzhov 27 Oct 2021 4:25 UTC
8 points
on: Mental health benefits and downsides of psychedelic use in ACX readers: survey results
Does anyone have a good model of how do they reconcile
1) a pretty large psychosis rate in this survey, a bunch of people in https://www.lesswrong.com/posts/MnFqyPLqbiKL8nSR7/my-experience-at-and-around-miri-and-cfar-inspired-by-zoe saying that their friends got mental health issues after using psychedelics, anecdotal experiences and stories about psychedelic-induced psychosis in the general cultural field
and
2) Studies https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3747247/ https://journals.sagepub.com/doi/10.1177/0269881114568039 finding no correlation, or, in some cases, negative correlation between psychedelic consumption and mental health issues?
- Studies done wrong?
- Studies don’t have enough statistical power?
- Something with confounding and Simpson’s paradox? Maybe there’s a particular subgroup in the population where psychedelic use correlates negatively with likelihood of mental health issues within this subgroup, or a subgroup where there is more psychedelic use and simultaneously lower likelihood of mental health issues on average across the subgroup?
- Psychedelics impart mental well-being and resilience to some people to such a degree that it cancels out the negative mental health effects in other people, so that in expectation psychedelics wouldn’t affect your mental health negatively?

alexlyzhov 17 Feb 2023 19:19 UTC
7 points
4
in reply to: gwern’s comment on: Bing Chat is blatantly, aggressively misaligned

why it is so good in general (GPT-4)

What are the examples indicating it’s at the level of performance at complex tasks you would expect from GPT-4? Especially performance which is clearly attributable to improvements that we expect to be made in GPT-4? I looked through a bunch of screenshots but haven’t seen any so far.

alexlyzhov 4 Apr 2022 4:01 UTC
6 points
on: Working Out in VR Really Works
These games are really engaging for me and haven’t been named:
Eleven Table Tennis. Ping-pong in VR (+ multiplayer and tournaments):
Racket NX. This one is much easier but you still move around a fair bit. The game is “Use the racket to hit the ball” as well.
Synth Riders. An easier and more chill Beat Saber-like game:
Holopoint. Archery + squats, gets very challenging on later levels:
Some gameplay videos for excellent games that have been named:
Beat Saber. “The VR game”. You can load songs from the community library using mods.
Thrill of the Fight (boxing):

alexlyzhov 5 Jul 2023 20:39 UTC
5 points
2
in reply to: Holly_Elmore’s comment on: Douglas Hofstadter changes his mind on Deep Learning & AI risk (June 2023)?
For every token, model activations are computed once when the token is encountered and then never explicitly revised → “only [seems like it] goes in one direction”

alexlyzhov 12 Jun 2022 22:47 UTC
5 points
on: alexlyzhov’s Shortform
Every other day I have a bunch of random questions related to AI safety research pop up but I’m not sure where to ask them. Can you recommend any place where I can send these questions and consistently get at least half of them answered or discussed by people who are also thinking about it a lot? Sort of like an AI safety StackExchange (except there’s no such thing), or a high-volume chat/discord. I initially thought about LW shortform submissions, but it doesn’t really look like people are using the shortform for asking questions at all.

alexlyzhov 9 Apr 2022 8:54 UTC
5 points
in reply to: Not Relevant’s comment on: It’s time for EA leadership to pull the fast-takeoff fire alarm.
Actually, the Metaculus community prediction has a recency bias:
> approximately sqrt(n) new predictions need to happen in order to substantially change the Community Prediction on a question that already has n players predicting.
In this case, n=298, the prediction should change substantially after sqrt(n)=18 new predictions (usually it takes up to a few days). Over the past week, there were almost this many predictions and the AGI community median has shifted 2043 → 2039, and the 30th percentile is 8 years.

alexlyzhov

“AI and Com­pute” trend isn’t pre­dic­tive of what is happening

Papers on pro­tein design

“AI and Compute” trend isn’t predictive of what is happening

Papers on protein design