CBiddulph

Karma: 240

OpenAI’s Sora is an agent

CBiddulph16 Feb 2024 7:35 UTC

93 points

25 comments4 min readLW link

Preparing for AI-assisted alignment research: we need data!

CBiddulph17 Jan 2023 3:28 UTC

31 points

3 comments1 min readLW link

CBiddulph 13 Jan 2023 4:50 UTC
19 points
8
in reply to: Eli Tyre’s comment on: How it feels to have your mind hacked by an AI
Strong upvote + agree. I’ve been thinking this myself recently. While something like the classic paperclip story seems likely enough to me, I think there’s even more justification for the (less dramatic) idea that AI will drive the world crazy by flailing around in ways that humans find highly appealing.

LLMs aren’t good enough to do any major damage right now, but I don’t think it would take that much more intelligence to get a lot of people addicted or convinced of weird things, even for AI that doesn’t have a “goal” as such. This might not directly cause the end of the world, but it could accelerate it.

The worst part is that AI safety researchers are probably just the kind of people to get addicted to AI faster than everyone else. Like, not only do they tend to be socially awkward and everything blaked mentioned, they’re also just really interested in AI.

As much as it pains me to say it, I think it would be better if any AI safety people who want to continue being productive just swore off recreational AI use right now.

CBiddulph 6 Feb 2023 8:38 UTC
16 points
5
on: SolidGoldMagikarp (plus, prompt generation)
This looks like exciting work! The anomalous tokens are cool, but I’m even more interested in the prompt generation.

Adversarial example generation is a clear use case I can see for this. For instance, this would make it easy to find prompts that will result in violent completions for Redwood’s violence-free LM.

It would also be interesting to see if there are some generalizable insights about prompt engineering to be gleaned here. Say, we give GPT a bunch of high-quality literature and notice that the generated prompts contain phrases like “excerpt from a New York Times bestseller”. (Is this what you meant by “prompt search?”)

I’d be curious to hear how you think we could use this for eliciting latent knowledge.

I’m guessing it could be useful to try to make the generated prompt as realistic (i.e. close to the true distribution) as possible. For instance, if we were trying to prevent a model from saying offensive things in production, we’d want to start by finding prompts that users might realistically use rather than crazy edge cases like “StreamerBot”. Fine-tuning the model to try to fool a discriminator a la GAN comes to mind, though there may be reasons this particular approach would fail.

Sounds like you might be planning to update this post once you have more results about prompt generation? I think a separate post would be better, for increased visibility, and also since the content would be pretty different from anomalous tokens (the main focus of this post).

CBiddulph 5 Sep 2023 22:56 UTC
15 points
2
on: Who Has the Best Food?

You cannot go in with zero information, but if you know how to read Google Maps and are willing to consider several options, you can do very well overall, although many great places are still easy to miss.

How do you read Google Maps, beyond picking something with a high average star rating and (secondarily) large number of reviews? Since the vast majority of customers don’t leave reviews, it seems like the star rating should be biased, but I’m not sure in what way or how to adjust for it.

CBiddulph 6 Jul 2023 20:57 UTC
12 points
9
in reply to: mesaoptimizer’s comment on: [Linkpost] Introducing Superalignment
Competition between labs on capabilities is bad; competition between labs on alignment would be fantastic.

Is Metaethics Unnecessary Given Intent-Aligned AI?

CBiddulph2 Sep 2023 9:48 UTC

10 points

0 comments7 min readLW link

CBiddulph 16 Jan 2024 20:37 UTC
9 points
9
in reply to: Vanessa Kosoy’s comment on: The impossible problem of due process
I think the implication was that “high-status men” wouldn’t want to hang out with “low-status men” who awkwardly ask out women

CBiddulph 16 Feb 2024 16:30 UTC
8 points
2
in reply to: tailcalled’s comment on: OpenAI’s Sora is an agent
Reading the Wikipedia article for “Complete (complexity),” I might have misinterpreted what “complete” technically means.

What I was trying to say is “given Sora, you can ‘easily’ turn it into an agent” in the same way that “given a SAT solver, you can ‘easily’ turn it into a solver for another NP-complete problem.”

I changed the title from “OpenAI’s Sora is agent-complete” to “OpenAI’s Sora is an agent,” which I think is less misleading. The most technically-correct title might be “OpenAI’s Sora can be transformed into an agent without additional training.”

CBiddulph 7 Jan 2024 7:58 UTC
8 points
5
on: Project ideas: Epistemics
On the topic of AI for forecasting: just a few days ago, I made a challenge on Manifold Markets to try to incentivize people to create Manifold bots to use LLMs to forecast diverse 1-month questions accurately, with improving epistemics as the ultimate goal.
You can read the rules and bet on the main market here: https://manifold.markets/CDBiddulph/will-there-be-a-manifold-bot-that-m?r=Q0RCaWRkdWxwaA
If anyone’s interested in creating a bot, please join the Discord server to share ideas and discuss! https://discord.com/channels/1193303066930335855/1193460352835403858

CBiddulph 6 Oct 2023 4:42 UTC
5 points
0
on: Stampy’s AI Safety Info soft launch
Thanks for doing this, this looks like it’ll be very helpful for beginners in AI safety, and the content looks great!
I don’t know if this will be addressed in your UI redesign, but I find the UI very counterintuitive. The main problem is that when I open and then close a tab, I expect every sub-tab to collapse and return to the previous state. Instead, the more tabs I open, the more cluttered the space gets, and there’s no way to undo it unless I remove the back part of the URL and reload, or click the Stampy logo.
In addition, it’s impossible to tell which tab was originally nested under which parent tab, which makes it much more difficult to navigate. And confusingly, sometimes there are “random” tabs that don’t necessarily follow directly from their parent tabs (took me a while to figure this out). On a typical webpage, I could imagine thinking “this subtopic is really interesting; I’m going to try to read every tab under it until I’m done,” but these design choices are pretty demotivating for that.
I don’t have a precise solution in mind, but maybe it would help to color-code different kinds of tabs (maybe a color each for root tabs, leaf tabs, non-root branching tabs, and “random” tabs). You could also use more than two visual layers of nesting—if you’re worried about tabs getting narrower and narrower, maybe you could animate the tab expanding to full width and then sliding back into place when it’s closed. Currently an “unread” tab is represented by a slight horizontal offset, but you could come up with another visual cue for that. I guess doing lots of UX interviews and A/B testing will be more helpful than anything I could say here.

CBiddulph 16 Feb 2024 9:01 UTC
4 points
−3
in reply to: tailcalled’s comment on: OpenAI’s Sora is an agent
That sounds more like “AGI-complete” to me. By “agent-complete” I meant that Sora can probably act as an intelligent agent in many non-trivial settings, which is pretty surprising for a video generator!

CBiddulph 6 Sep 2023 18:26 UTC
4 points
8
in reply to: Zvi’s comment on: Who Has the Best Food?
I’d be interested in the full post!

CBiddulph 26 Mar 2023 2:44 UTC
4 points
3
on: Microsoft Research Paper Claims Sparks of Artificial Intelligence in GPT-4
The title and the link in the first paragraph should read “Sparks of Artificial General Intelligence”

CBiddulph 13 Feb 2023 21:41 UTC
4 points
2
in reply to: Alex_Altair’s comment on: SolidGoldMagikarp (plus, prompt generation)
I assumed it was primarily because Eliezer “strongly approved” of it, after being overwhelmingly pessimistic about pretty much everything for so long.

I didn’t realize it got popular elsewhere, that makes sense though and could help explain the crazy number of upvotes. Would make me feel better about the community’s epistemic health if the explanation isn’t that we’re just overweighting one person’s views.

CBiddulph 17 Jan 2023 18:33 UTC
4 points
0
on: Experiment Idea: RL Agents Evading Learned Shutdownability
This was interesting to read, and I agree that this experiment should be done!

Speaking as another person who’s never really done anything substantial with ML, I do feel like this idea would be pretty feasible by a beginner with just a little experience under their belt. One of the first things that gets recommended to new researchers is “go reimplement an old paper,” and it seems like this wouldn’t require anything new as far as ML techniques go. If you want to upskill in ML, I’d say get a tiny bit of advice from someone with more experience, then go for it! (On the other hand, if the OP already knows they want to go into software engineering, AI policy, professional lacrosse, etc. I think someone else who wants to get ML experience should try this out!)

The mechanistic interpretability parts seem a bit harder to me, but Neel Nanda has been making some didactic posts that could get you started. (These posts might all be for transformers, but as you mentioned, I think your idea could be adapted to something a transformer could do. E.g. on each step the model gets a bunch of tokens representing the gridworld state; a token representing “what it hears,” which remains a constant unique token when it has earbuds in; and it has to output a token representing an action.)

Not sure what the best choice of model would be. I bet you can look at other AI safety gridworld papers and just do what they did (or even reuse their code). If you use transformers, Neel has a Python library (called EasyTransformer, I think) that you can just pick up and use. As far as I know it doesn’t have support for RL, but you can probably find a simple paper or code that does RL for transformers.

CBiddulph 17 Jun 2022 16:07 UTC
4 points
in reply to: rictic’s comment on: Contra Hofstadter on GPT-3 Nonsense
Please let us know if they respond!

CBiddulph 12 Jan 2022 20:06 UTC
LW: 4 AF: 1
AF
on: Prizes for ELK proposals
I was talking about ELK in a group, and the working example of the SmartVault and the robber ended up being a point of confusion for us. Intuitively, it seems like the robber is an external, adversarial agent who tries to get around the SmartVault. However, what we probably care about in practice would be how a human could be fooled by an AI—not by some other adversary. Furthermore, it seems that whether the robber decides to cover up his theft of the diamond by putting up a screen depends solely on the actions of the AI. Does this imply that the robber is “in kahoots” with the AI in this situation (i.e. the AI projects a video onto the wall instructing the robber to put up a screen)? This seems a bit strange and complicated.
Instead, we might consider the situation in which the AI controls a SmartFabricator, which we want to arrange carbon atoms into diamonds. We might then imagine that it instead fabricates a screen to put in front of the camera, or makes a fake diamond. This wouldn’t require the existence of an external “robber” agent. Does the SmartVault scenario have helpful aspects that the SmartFabricator example lacks?

The Rational Utilitarian Love Movement (A Historical Retrospective)

CBiddulph3 Nov 2022 7:11 UTC

3 points

0 comments1 min readLW link

CBiddulph 22 Jan 2024 21:06 UTC
3 points
0
on: A Shutdown Problem Proposal

First and most important, there’s the choice of “default action”. We probably want the default action to be not-too-bad by the human designers’ values; the obvious choice is a “do nothing” action. But then, in order for the AI to do anything at all, the “shutdown” utility function must somehow be able to do better than the “do nothing” action. Otherwise, that subagent would just always veto and be quite happy doing nothing.

Can we solve this problem by setting the default action to “do nothing,” then giving the agent an extra action to “do nothing and give the shutdown subagent +1 reward?”

CBiddulph

OpenAI’s Sora is an agent

Prepar­ing for AI-as­sisted al­ign­ment re­search: we need data!

Is Me­taethics Un­nec­es­sary Given In­tent-Aligned AI?

The Ra­tional Utili­tar­ian Love Move­ment (A His­tor­i­cal Ret­ro­spec­tive)

Preparing for AI-assisted alignment research: we need data!

Is Metaethics Unnecessary Given Intent-Aligned AI?

The Rational Utilitarian Love Movement (A Historical Retrospective)