GPT-4 Right Out of the Box
In some ways, the best and worst thing about GPT-4 was its cutoff date of September 2021. After that date it had no access to new information, and it had no ability to interact with the world beyond its chat window.
As a practical matter, that meant that a wide range of use cases didn’t work. GPT-4 would lack the proper context. In terms of much of mundane utility, this would often destroy most or all of the value proposition. For many practical purposes, at least for now, I instead use Perplexity or Bard.
That means that GPT-4 is going to be able to browse the internet directly, and also use a variety of websites, with the first plug-ins coming from Expedia, FiscalNote, Instacart, KAYAK, Klarna, Milo, OpenTable, Shopify, Slack, Speak, Zapier and Wolfram. Wolfram means Can Do Math. There’s also a Python code interpreter.
Already that’s some exciting stuff. Zapier gives it access to your data to put your stuff into context.
Also there’s over 80 secret plug-ins already, that can be revealed by removing a specific parameter from an API call. And you can use them, there are only client-side checks stopping you. Sounds secure.
We continue to build everything related to AI in Python, almost as if we want to die, get our data stolen and generally not notice that the code is bugged and full of errors. Also there’s that other little issue that happened recently. Might want to proceed with caution.
(I happen to think they’re also downplaying the other risks even more, but hey.)
Perhaps this wasn’t thought out as carefully as we might like. Which raises a question.
So, About That Commitment To Safety and Moving Slowly And All That
That’s all obviously super cool and super useful. Very exciting. What about safety?
Well, we start off by having everyone to write their instructions in plain English and let the AIs figure it out, because in addition to being super fast, that’s the way to write secure code that does what everyone wants that is properly unit tested and fails gracefully and doesn’t lead to any doom loops whatsoever.
Also one of the apps in the initial batch is Zapier, which essentially hooks GPT up to all your private information and accounts and lets it do whatever it wants. Sounds safe.
No, no, that’s not fair, they are concerned about safety, look, concern, right here.
At the same time, there’s a risk that plugins could increase safety challenges by taking harmful or unintended actions, increasing the capabilities of bad actors who would defraud, mislead, or abuse others.
I mean, yes, you don’t say, having the ability to access the internet directly and interface with APIs does seem like the least safe possible option. I mean, I get why you’d do that, there’s tons of value here, but let’s not kid ourselves. So, what’s the deal?
We’ve performed red-teaming exercises, both internally and with external collaborators, that have revealed a number of possible concerning scenarios. For example, our red teamers discovered ways for plugins—if released without safeguards—to perform sophisticated prompt injection, send fraudulent and spam emails, bypass safety restrictions, or misuse information sent to the plugin.
We’re using these findings to inform safety-by-design mitigations that restrict risky plugin behaviors and improve transparency of how and when they’re operating as part of the user experience.
This does not sound like ‘the red teams reported no problems,’ it sounds like ‘the red teams found tons of problems, while checking for the wrong kind of problem, and we’re trying to mitigate as best we can.’
Better than nothing. Not super comforting. What has OpenAI gone with?
The plugin’s text-based web browser is limited to making GET requests, which reduces (but does not eliminate) certain classes of safety risks.
I wonder how long that restriction will last. For now, you’ll have to use a different plug-in to otherwise interact with the web.
Browsing retrieves content from the web using the Bing search API. As a result, we inherit substantial work from Microsoft on (1) source reliability and truthfulness of information and (2) “safe-mode” to prevent the retrieval of problematic content.
I would have assumed the RLHF would mostly censor any problematic content anyway. Now it’s doubly censored, I suppose. I wonder if GPT will censor your own documents if you ask them to be read back to you. I am getting increasingly worried in practical terms about universal censorship applied across the best ways to interact with the world, that acts as if we are all 12 years old and can never think unkind thoughts or allow for creativity. An in turn, I am worried that this will continue to drive open source alternatives.
What about the AI executing arbitrary code that it writes? Don’t worry. Sandbox.
The primary consideration for connecting our models to a programming language interpreter is properly sandboxingthe execution so that AI-generated code does not have unintended side-effects in the real world. We execute code in a secured environment and use strict network controls to prevent external internet access from executed code.
Disabling internet access limits the functionality of our code sandbox, but we believe it’s the right initial tradeoff.
So yes, real restrictions that actually matter for functionality, so long as you don’t use a different plug-in to get around those restrictions.
Four days later:
(Jan Leike is on the alignment team at OpenAI, here for you if you ever want to talk.)
Take it away, everyone.
Yeah, sure, access to all my personal documents, what could go wrong.
Ships That Have Sailed
That was fun. The problem is that if GPT-4 was actually dangerous when hooked up to everything we did already have the problem that API access already hooks GPT up to everything, even if it requires slightly more effort. There is nothing OpenAI is doing here that can’t be done by the user. You can build GPT calls into arbitrary Python programs, instill it into arbitrary feedback loops, already.
Given that, is there that much new risk in the room here, that wasn’t already accounted for by giving developers API access?
Four ways that can happen.
Lowering the threshold for end users to use the results. You make it easier on them logistically, make them more willing to trust it, make it the default, let them not know how to code at all and so on.
Lowering the threshold and reward for plug-in creation. If you make it vastly easier to set up such systems, as well as easier to get people to use them and trust them, then you’re going to do a lot more of this.
We could all get into very bad habits this way.
OpenAI could be training GPT on how to efficiently use the plug-ins, making their use more efficient than having to constantly instruct GPT via prompt engineering.
It also means that we have learned some important things about how much safety is actually being valued, and how much everyone involved values developing good habits.
That last one is something I initially missed. I had presumed that they were using some form of prompt engineering to enable plug-ins. I was wrong. The Wolfram-Alpha blog post (that also has a lot of cool other stuff) on its plug-in says this explicitly.
What’s happening “under the hood” with ChatGPT and the Wolfram plugin? Remember that the core of ChatGPT is a “large language model” (LLM) that’s trained from the web, etc. to generate a “reasonable continuation” from any text it’s given. But as a final part of its training ChatGPT is also taught how to “hold conversations”, and when to “ask something to someone else”—where that “someone” might be a human, or, for that matter, a plugin. And in particular, it’s been taught when to reach out to the Wolfram plugin.
This is not something that you can easily do on your own, and not something you can do at all for other users. OpenAI has trained GPT to know when and how to reach out and attempt to use plug-ins when they are available, and certain particular plug-ins like Wolfram in particular. That training could be a big game.
So this change really is substantially stronger than improvised alternatives. You save part of the context window, you save complexity, you save potential confusions.
There’s a screenshot of him running the ‘AgentExecutor’ chain. Oh good. Safe.
That’s also even more confirmation that underlying capabilities are not changing, this is simply making it easier to do both a lot of useful things and some deeply unwise things, and people are going to do a lot of both.
Without going too far off track, quite a lot of AI plug-ins and offerings lately are following the Bard and Copilot idea of ‘share all your info with the AI so I have the necessary context’ and often also ‘share all your permissions with the AI so I can execute on my own.’
I have no idea how we can be in position to trust that. We are clearly not going to be thinking all of this through.