Jay Bailey

Karma: 668

Jay Bailey 7 Jun 2022 15:32 UTC
32 points
13
in reply to: Cookiecarver’s comment on: AGI Safety FAQ / all-dumb-questions-allowed thread
My model of Eliezer says something like this:

AI will not be aligned by default, because AI alignment is hard and hard things don’t spontaneously happen. Rockets explode unless you very carefully make them not do that. Software isn’t automatically secure or reliable, it takes lots of engineering effort to make it that way.

Given that, we can presume there needs to be a specific example of how we could align AI. We don’t have one. If there was one, Eliezer would know about it—it would have been brought to his attention, the field isn’t that big and he’s a very well-known figure in it. Therefore, in the absence of a specific way of aligning AI that would work, the probability of AI being aligned is roughly zero, in much the same way that “Throw a bunch of jet fuel in a tube and point it towards space” has roughly zero chance of getting you to space without specific proof of how it might do that.

So, in short—it is reasonable to assume that AI will be aligned only if we make it that way with very high probability. It is reasonable to assume that if there was a solution we had that would work, Eliezer would know about it. You don’t need to know everything about AGI x-risk for that—anything that promising would percolate through the community and reach Eliezer in short order. Since there is no such solution, and no attempts have come close according to Eliezer, we’re in trouble.

Reasons you might disagree with this:
- You think AI is a long way away, and therefore it’s okay that we don’t know how to solve it yet.
- You think “alignment by default” might be possible.
- You think some approaches that have already been brought up for solving the problem are reasonably likely to succeed when fleshed out more.

Jay Bailey 21 Oct 2023 1:44 UTC
28 points
19
on: Holly Elmore and Rob Miles dialogue on AI Safety Advocacy
I think there’s an aesthetic clash here somewhere. I have an intuition or like… an aesthetic impulse, telling me basically… “advocacy is dumb”. Whenever I see anybody Doing An Activism, they’re usually… saying a bunch of… obviously false things? They’re holding a sign with a slogan that’s too simple to possibly be the truth, and yelling this obviously oversimplified thing as loudly as they possibly can? It feels like the archetype of overconfidence.
This is exactly the same thing that I have felt in the past. Extremely well said. It is worth pointing out explicitly that this is not a rational thought—it’s an Ugh Field around advocacy, and even if the thought is true, that doesn’t mean all advocacy has to be this way.

Jay Bailey 6 Mar 2023 20:33 UTC
27 points
4
on: Introducing Leap Labs, an AI interpretability startup
This is a for-profit company, and you’re seeking investment as well as funding to reduce x-risk. Given that, how do you expect to monetise this in the future? (Note: I think this is well worth funding for altruistic reduce-x-risk reasons)

Jay Bailey 31 Jul 2022 15:19 UTC
23 points
0
on: chinchilla’s wild implications
I am curious about this “irreducible” term in the loss. Apologies if this is covered by the familiarity with LM scaling laws mentioned as a prerequisite for this article.

When you say “irreducible”, does that mean “irreducible under current techniques” or “mathematically irreducible”, or something else?

Do we have any idea what a model with, say, 1.7 loss (i.e, a model almost arbitrarily big in compute and data, but with the same 1.69 irreducible) would look like?

Jay Bailey 1 Oct 2022 13:28 UTC
21 points
9
in reply to: Roko’s comment on: Why I think strong general AI is coming soon
I’m curious as to what you think “getting weird” might mean. From my perspective, things are already “getting weird”. Three years ago, AI couldn’t generate good art, write college essays, write code, solve Minerva problems, beat players at Starcraft II, or generalise across multiple domains. Now, it can do all of those things. People who work in the field have trouble keeping up. People outside the field are frequently blindsided by things that appear to come out of nowhere, like “Did you know that I can generate artwork from text prompts?” and “Did you know I can use GPT-3 to write a passable essay?” and, just for me a few weeks ago “Holy shit, Github Copilot just answered the question I was going to use as a linear algebra exercise.”

So, my definition of “weird” is something like “It’s hard for professionals in a field to keep up with developments, and non-professionals will be frequently blindsided by seemingly discontinuous jumps” and I think ML has been doing that over the last few years.

What would you consider “getting weird” to mean?

Jay Bailey 17 Oct 2023 15:11 UTC
14 points
5
in reply to: jbash’s comment on: Fertility Roundup #2
I would suggest responding with your points (Top 3-5, if you have too many to easily list) on why this is incredibly obviously not a problem, seeing where you get pushback if anywhere, and iterating from there. Don’t be afraid to point out “incredibly obvious” things—it might not be incredibly obvious to other people. And if you’re genuinely unsure why anyone could think this is a problem, the responses to your incredibly obvious points should give you a better idea.

Jay Bailey 11 Oct 2023 9:10 UTC
13 points
4
on: Truthseeking when your disagreements lie in moral philosophy
I think Tristan is totally right, and it puts an intuition I’ve had into words. I’m not vegan—I am sympathetic to the idea of having this deep emotional dislike of eating animals, I feel like the version of me who has this is a better person, and I don’t have it. From a utilitarian perspective I could easily justify just donating a few bucks to animal charities...but veganism isn’t about being optimally utilitarian. I see it as more of a virtue ethics thing. It’s not even so much that I want to be vegan, but I want to be the kind of person who chooses it. But I’m not sufficiently good of a person to actually do it, which does make me feel somewhat guilty at times. As a salve to my conscience, I’ve recently decided to try giving up chicken entirely, which seems like a solid step forward that is still pretty easy to make.

Jay Bailey 1 Aug 2022 2:05 UTC
12 points
0
on: Jay Bailey’s Shortform
Speedrunners have a tendency to totally break video games in half, sometimes in the strangest and most bizarre ways possible. I feel like some of the more convoluted video game speedrun / challenge run glitches out there are actually a good way to build intuition on what high optimisation pressure (like that imposed by a relatively weak AGI) might look like, even at regular human or slightly superhuman levels. (Slightly superhuman being a group of smart people achieving what no single human could)

Two that I recommend:

https://www.youtube.com/watch?v=kpk2tdsPh0A—Tool-assisted run where the inputs are programmed frame by frame by a human, and executed by a computer. Exploits idiosyncracies in Super Mario 64 code that no human could ever use unassisted in order to reduce the amount of times the A button needs to be pressed in a run. I wouldn’t be surprised if this guy knows more about SM64 code than the devs at this point.

https://www.youtube.com/watch?v=THtbjPQFVZI—A glitch using outside-the-game hardware considerations to improve consistency on yet another crazy in-game glitch. Also showcases just how large the attack space is.

These videos are also just incredibly entertaining in their own right, and not ridiculously long, so I hypothesise that they’re a great resource to send more skeptical people if they understand the idea of AGI but are systematically underestimating the difference between “bug-free” (Program will not have bugs during normal operation) and secure. (Program will not have bugs when deliberately pushed towards narrow states designed to create bugs)

For a more serious overview, you could probably find obscure hardware glitches and such to achieve the same lesson.

Jay Bailey 25 Aug 2023 8:55 UTC
10 points
0
in reply to: kuira’s comment on: AI-assisted post
How about “AI-assisted post”? Shouldn’t clash with anything else, and should be clear what it means on seeing the tag.

Jay Bailey 27 Mar 2023 22:04 UTC
10 points
6
in reply to: penudddff’s comment on: Steelmanning OpenAI’s Short-Timelines Slow-Takeoff Goal
I think what the OP was saying was that in, say, 2013, there’s no way we could have predicted the type of agent that LLM’s are and that they would be the most powerful AI’s available. So, nobody was saying “What if we get to the 2020s and it turns out all the powerful AI are LLM’s?” back then. Therefore, that raises a question on the value of the alignment work done before then.

If we extend that to the future, we would expect most good alignment research to happen within a few years of AGI, when it becomes clear what type of agent we’re going to get. Alignment research is much harder if, ten years from now, the thing that becomes AGI is as unexpected to us as LLM’s were ten years ago.

Thus, there’s not really that much difference, goes the argument, if we get AGI in five years with LLM’s or fifteen years with God only knows what, since it’s the last few years that matters.

A hardware overhang, on the other hand, would be bad. Imagine we had 2030′s hardware when LLM’s came onto the scene. You’d have Vaswani et al. coming out in 2032 and by 2035 you’d have GPT-8. That would be terrible.

Therefore, says the steelman, the best scenario is if we are currently in a slow takeoff that gives us time. Hardware overhang is never going to be lower again than it is now, and that ensures we are bumping up against not only conceptual understanding or engineering requirements but also the fundamental limits of compute, which limits how fast we can scale the LLM paradigm. This may not happen if we get a new type of agent in ten years.

Jay Bailey 17 Jan 2023 19:23 UTC
10 points
1
on: What’s the Least Impressive Thing GPT-4 Won’t be Able to Do
From recent research/theorycrafting, I have a prediction:

Unless GPT-4 uses some sort of external memory, it will be unable to play Twenty Questions without cheating.

Specifically, it will be unable to generate a consistent internal state for this game or similar games like Battleship and maintain it across multiple questions/moves without putting that state in the context window. I expect that, like GPT-3, if you ask it what the state is at some point, it will instead attempt to come up with a state that has been consistent with the moves of the game so far on the fly, which will not be the same as what it would say if you asked it for the state as the game started. I do expect it to be better than GPT-3 at maintaining the illusion.

Jay Bailey 27 Jun 2023 23:10 UTC
9 points
1
on: The Weight of the Future (Why The Apocalypse Can Be A Relief)
That may explain why these scenarios have never been all that appealing to me, because I do think about the future in these hypothetical scenarios. I ask myself “Okay, what would the plan be in five years, when the scavenged food has long since run out?” and that feels scary and overwhelming. (Admittedly, rollercoaster scary, since it’s a fantasy, but I find myself spending just as much time asking how the hell I’d learn to recreate agriculture and how miserable day-to-day farming would be as I do imagining myself as a badass hero who saves someone from zombies—and that’s assuming I survive at all, which is a pretty big if!)

Jay Bailey 21 Dec 2022 4:11 UTC
9 points
3
in reply to: shminux’s comment on: I believe some AI doomers are overconfident
This seems both inaccurate and highly controversial. (Controversially, this implies there is nothing that AI alignment can do—not only can we not make AI safer, we couldn’t even deliberately make AI more dangerous if we tried)

Accuracy-wise, you may not be able to know much about superintelligences, but even if you were to go with a uniform prior over outcomes, what that looks like depends tremendously on the sample space.

For instance, take the following argument: When transformative AI emerges, all bets are off, which means that any particular number of humans left alive should not be a privileged hypothesis. Thus, it makes sense to consider “number of humans alive after the singularity” to be a uniform distribution between 0 and N, where N is the number of humans in an intergalactic civilisation, so the chance of humanity being wiped out is almost zero.

If we want to use only binary hypotheses instead of numerical ones, I could instead say that each individual human has a ⁵⁰⁄₅₀ chance of survival, meaning that when you add these together, roughly half of humanity lives and again the chance of humanity being wiped out is basically zero.

This is not a good argument, but it isn’t obvious to me how its structure differs from your structure.

Jay Bailey 10 Dec 2022 10:08 UTC
9 points
3
on: Setting the Zero Point
I like this article a lot. I’m glad to have a name for this, since I’ve definitely used this concept before. My usual argument that invokes this goes something like:

“Humans are terrible.”

“Terrible compared to what? We’re better than we’ve ever been in most ways. We’re only terrible compared to some idealised perfect version of humanity, but that doesn’t exist and never did. What matters is whether we’re headed in the right direction.”

I realise now that this is a zero-point issue—their zero point was where they thought humans should be on the issue at hand (e.g, racism) and my zero point was the historical data for how well we’ve done in the past.

The zero point may also help with imposter syndrome, as well as a thing I have not named, which I now temporarily dub the Competitor’s Paradox until an existing name is found.

The rule is—if you’re a serious participant in a competitive endeavor, you quickly narrow your focus to only compare yourself to people who take it at least as seriously as you do. You can be a 5.0 tennis player (Very strong amateur) but you’ll still get your ass kicked in open competition. You may be in the top 1% of tennis players*, but the 95-98% of players who you can clean off the court with ease never even get thought of when you ask yourself if you’re “good” or not. The players who can beat you easily? They’re good. This remains true no matter how high you go, until there’s nobody in the world who can beat you easily, which is like, 20 guys.

So it may help our 5.0 player to say something like “Well, am I good? Depends on what you consider the baseline. For a tournament competitor? No. But for a club player, absolutely.”

*I’m not sure if 5.0 is actually top 1% or not.

Jay Bailey 24 Oct 2022 14:43 UTC
9 points
4
on: AGI in our lifetimes is wishful thinking
I feel like you’ve significantly misrepresented the people who think AGI is 10-20 years away.
Two things you mention:
Notice that this a math problem, not an engineering problem...They’re sweeping all of the math work—all of the necessary algorithmic innovations—under the rug. As if that stuff will just fall into our lap, ready to copy into PyTorch.
But creative insights do not come on command. It’s not unheard of that a math problem remains open for 1000 years.
And with respect to scale maximialism, you write:
Some people say that we’ve already had the vast majority of the creative insights that are needed for AGI. For example, they argue that GPT-3 can be made into AGI with a little bit of tweaking and scaling...”But these can be solved with a layer of prompt engineering!” Give me a break. That’s obviously a brittle solution that does not address the underlying issues.
So—AGI is not (imo) a pure engineering problem as you define it, in a sense of “We have all the pieces, and just need to put them together”. Some people have suggested this, but a lot of sub-20-year timelines people don’t believe this. And I haven’t heard of anyone saying GPT-3 can be made into AGI with a bit of tweaking, scaling, and prompt engineering.

But I wouldn’t call it a math problem as you define it either, where we have no idea how to make progress and the problem is completely unsolvable until suddenly it isn’t. We have clearly made steady progress on deep learning, year after year, for at least the last decade. These include loads of algorithmic innovations which people went out and found, the same required innovations you claim we’re “sweeping under the rug”. We’re not sweeping them under the rug, we’re looking at the last ten years of progress and extrapolating it forward! We have solved problems that were thought impossible or highly intractable, like Go. We don’t know exactly how long the path is, but we can definitely look back and think there is a pretty solid probability we’re closer than we were last year. Maybe we need a paradigm shift to get AGI, and our current efforts will be dead ends. On the other hand, deep learning and transformers have both been paradigm shifts and they’ve happened in the last couple of decades—transformers are only a few years old. We could need two more paradigm shifts and still get them in the next 20 years.

The general argument I would make for <50 year timelines is this:

Over the last decade, we have been making incredibly fast progress, both in algorithms and in scaling, in deep learning.
Deep learning and transformers are both recent paradigm shifts that have led to huge capability increases in AI. We see no signs of this slowing down.
We have solved, or made massive progress on, multiple problems that people have previously predicted were highly intractable, despite not fundamentally understanding intelligence. (Go, natural language processing)
Given this, we can see that we’ve made fast progress, our current paradigms are scaling strongly, and we seem to have the ability to create paradigm shifts when needed. While this far from guarantees AI by 2040, or even 2070, it seems like there is a very plausible path to have AGI in this timeframe that I’d assign much more than 10% probability mass of.

Also, for what it’s worth—I did your thought experiment. Option 1 feels me with a deep sense of relief, and Option 2 fills me with dread. I don’t want AGI, and if I was convinced you were correct about <10% AGI by 2070, I would seriously consider working on something else. (Direct work in software engineering, earning to give for global poverty, or going back to school and working to prevent aging if I found myself reluctant to give up the highly ambitious nature of the AI alignment problem)

Jay Bailey 4 Jun 2022 4:43 UTC
9 points
on: D&D.Sci June 2022: A Goddess Tried To Reincarnate Me Into A Fantasy World, But I Insisted On Using Data Science To Select An Optimal Combination Of Cheat Skills!
I did some basic analysis to start this one off. I’m not a data scientist, but I’m curious how people’s optimisations compare to my baseline.

I wrote a quick Python function to filter for our specific combination of character traits, then wrote up a dictionary of how often each combination won. I treated (Skill 1, Skill 2) and (Skill 2, Skill 1) as identical for our purposes. The top three here were [‘Enlightenment, Radiant Splendor’, 0.943], [‘Anomalous Agility, Temporal Distortion’, 0.918], [‘Monstrous Regeneration, Temporal Distortion’, 0.9]. Our winner is Enlightenment/Radiant Splendor with a total win rate of ²³⁰⁄₂₄₄ or 94.3% among non-sociopath non-otaku nerdy office working non-hikkikomori heroes. Looks good!

But then I thought—what if the 94.3% was “People who would pick these choices” and not the skills themselves? So I took a look at the results for our personality that were picked by the Chaos Deity. Enlightnment / Radiant fell to 50%, and the top 3 now were [‘Anomalous Agility, Temporal Distortion’, 0.95], [‘Barrier Conjuration, Mind Palace’, 0.923], [‘Monstrous Regeneration, Rapid XP Gain’, 0.917]. The problem is...now our sample size is vastly reduced! 0.95 is actually just ¹⁹⁄₂₀.

The clear winner from this analysis so far appears to be Agility/Temporal, but I haven’t done any probability analysis on it, nor do I have the maths to confidently do so, AND the sample size is low. When picked at random, there’s a 95% chance that Agility/Temporal wins. When someone specifically selects it, it’s still 91.8%. This is still pretty high, and we’re not worrying so much about what kind of person we are since we intend to pick purely on the data, but I’m still curious if this matters. Does it matter that the kind of person who selects Agility/Temporal from the list loses more often than chance, or have we sidestepped that with our data science approach? We have selected for our own personality as best we can with the data available, after all.

So, it seems we have a strange setup here—do we pick the low sample size items that seemed to give us the most victories, or do we pick the thing that people like us were most likely to win with?

Even so, I gave myself only an hour or two on this problem, and that’s what I’ve come up with so far—Agility/Temporal should give us a 95% chance of victory with high error bars, Enlightenment/Radiant is 94.3% if we trust that we are sufficiently similar to the subset of our personality archetype that would have picked E/R without the data science approach.

I think Agility/Temporal is better. I think we should be taking both possibilities into acount. If the strategy of “Select the skills that won the most among our personality archetype” is correct, selecting A/T reduces our winrate from 94.3% to 91.8%. If the strategy of “Select the skills most likely to win if they are randomly assigned to you” is right, selecting A/T brings our winrate up to 95% from 50%. These are not equal payoffs. In the absence of more evidence, I’m selecting A/T, since I’m confident our win-rate with it should be above 90%.

Looking forward to see how people improve on this!
What links here?
- D&D.Sci June 2022 Evaluation and Ruleset by abstractapplic (13 Jun 2022 10:31 UTC; 30 points)

Jay Bailey 31 Oct 2023 1:30 UTC
8 points
0
on: Apply to the Constellation Visiting Researcher Program and Astra Fellowship, in Berkeley this Winter
For the Astra Fellowship, what considerations do you think people should be thinking about when deciding to apply for SERI MATS, Astra Fellowship, or both? Why would someone prefer one over the other, given they’re both happening at similar times?

Jay Bailey 20 Dec 2022 0:01 UTC
8 points
5
on: Towards Hodge-podge Alignment
I notice that I’m confused about quantilization as a theory, independent of the hodge-podge alignment. You wrote “The AI, rather than maximising the quality of actions, randomly selects from the top quantile of actions.”

But the entire reason we’re avoiding maximisation at all is that we suspect that the maximised action will be dangerous. As a result, aren’t we deliberately choosing a setting which might just return the maximised, potentially dangerous action anyway?

(Possible things I’m missing—the action space is incredibly large, the danger is not from a single maximised action but from a large chain of them)

Jay Bailey 8 Dec 2022 13:29 UTC
8 points
5
in reply to: 307th’s comment on: Thoughts on AGI organizations and capabilities work
“Working on AI capabilities” explicitly means working to advance the state-of-the-art of the field. Skilling up doesn’t do this. Hell, most ML work doesn’t do this. I would predict >50% of AI alignment researchers would say that building an AI startup that commercialises the capabilities of already-existing models does not count as “capabilities work” in the sense of this post. For instance, I’ve spent the last six months studying reinforcement learning and Transformers, but I haven’t produced anything that has actually reduced timelines, because I haven’t improved anything beyond the level that humanity was capable of before, let alone published it.

If you work on research engineering in a similar manner, but don’t publish any SOTA results, I would say you haven’t worked on AI capabilities in the way this post refers to them.

Jay Bailey 12 Aug 2022 1:09 UTC
8 points
5
in reply to: jacob_cannell’s comment on: Language models seem to be much better than humans at next-token prediction
I don’t think there is an implied conclusion here—it’s meant to be taken at face value. All that we care about in this analysis is how well humans and machines perform at the task. This comparison doesn’t require a fair fight. LLM’s have thousands of times more samples than any human will ever read—so what? As long as we can reliably train LM’s on those samples, that is the LM’s performance. Similarly, it doesn’t matter how good the human subconscious is at next-word prediction if we can’t access it.

I feel like the implied conclusion you’re arguing against here is something like “LM’s are more efficient per sample than the human brain at predicting language”, but I don’t think this conclusion is implied by the text at all. I think the conclusion is exactly as stated—in the real world, LM’s outperform humans. It doesn’t matter if it does so by “cheating” with huge, humanly-impossible datasets, or by having full access to its knowledge in a way a human doesn’t, because those are part of the task constraints in the real world.