That was fun. The problem is that if GPT-4 was actually dangerous when hooked up to everything we did already have the problem that API access already hooks GPT up to everything, even if it requires slightly more effort. There is nothing OpenAI is doing here that can’t be done by the user. You can build GPT calls into arbitrary Python programs, instill it into arbitrary feedback loops, already.
Given that, is there that much new risk in the room here, that wasn’t already accounted for by giving developers API access?
Zvi this is straight misinformation and you should update your post. This is highly misleading.
Yes, before, you could describe to GPT-4, within it’s context window (the description costs you context tokens) what tool interfaces would do. However, if the model misunderstands the interface, or is too “hyped” about the capability of a tool that doesn’t work very well, it will keep making the same mistakes as it cannot learn.
This is not something the user can do. And it will cause a large increase in the model’s ability to use tools, probably a general ability if it gets enough training examples and a large variety of tools to practice with.
Thank you. I will update the post once I read Wolfram’s post and decide what I think about this new info.
In the future, please simply say ‘this is wrong’ rather than calling something like this misinformation, saying highly misleading with the bold highly, etc.
EDIT: Mods this is ready for re-import, the Wordpress and Substack versions are updated.
See the openAI blog post. They say in the same post that they have made a custom model, as in they weight updated gpt-4, so it can use plugins. It’s near the bottom of the entry in plugins.
Probably they also weight updated it so it knows to use the browser and local interpreter plugins well, without needing to read the description.
Since it disagrees with the authoritative source, is obviously technically wrong, I called it misinformation.
I apologize but you have so far failed to respond to most criticism and this was a pretty glaring error.
Does OpenAI say this, or are you inferring it entirely from the Wolfram blog post? Isn’t that an odd place to learn such a thing?
And where does the Wolfram blog post say this? It sounds to me like he’s doing something like this outsider, making one call to Wolfram, then using the LLM to evaluate the result and determine if it produced an error and retry.
I am inferring this because plugins would simply not work otherwise.
Please think about what it would mean for each “plugin supported query” for the AI to have to read all of the tokens of all of the plugins. Remember every piece of information OAI doesn’t put into the model weights costs you tokens from your finite length context window. Remember you can go look at the actual descriptions of many plugins and they eat 1000+ tokens alone, or 1⁄8 your window to remember what one plugin does.
Or that what it would cost OAI to keep generating GPT-4 tokens again and again and again for the machine to fail to make a request over and over and over. Or for a particular plugin to essentially lie in it’s description and be useless. Or the finer points of when to search bing vs wolfram alpha, for example for pokemon evolutions and math, wolfram, but for current news, bing...
I’m pretty confident that I have been using the “Plugins” model with a very long context window. I was copy-pasting entire 500-line source files and asking questions about it. I assume that I’m getting the 32k context window.
The entire conversation is over 60,000 characters according to wc. OpenAI’s tool won’t even let me compute the tokens if I paste more than 50k (?) characters, but when I deleted some of it, it gave me a value of >18,000 tokens.
I’m not sure if/when ChatGPT starts to forgot part of the chat history (drops out of the context window) but it still seemed to remember the first file after long, winding discussion.
Since you have to manually activate plugins, they don’t take any context until you do so. In particular, multiple plugins don’t compete for context and the machine doesn’t decide which one to use.
Please read the documentation and the blog post you cited.
And this makes GPT-4 via API access a general purpose tool user generator, which it wouldn’t be as reliably if it wasn’t RLed into this capability. Turns out system message is not about enacting user-specified personalities, but about fluent use of user-specified tools.
So the big news is not ChatGPT plugins, those are just the demo of GPT-4 being a bureaucracy engine. Its impact is not automating the things that humans were doing, but creating a new programming paradigm, where applications have intelligent rule-followers sitting inside half of their procedures, who can invoke other procedures, something that nobody seriously tried to do with real humans, not on the scale of software, because it takes the kind of nuanced level of rule-following you’d need lawyers with domain-specific expertise for, multiple orders of magnitude more expensive than LLM API access, and too slow for most purposes.
Maybe. Depends on how good it gets. It is possible that gpt-4 with plugins it has learned to use well (so each query it doesn’t read the description of the plugin it just “knows” to use Wolfram alpha and it’s first query is properly formatted) will be functionally an AGI.
Not an AGI without it’s helpers but in terms of user utility, an AGI in that it has approximately the breadth and depth of skills of the average human being.
Plugins would exist where it can check its answers, look up all unique nouns for existence, check its url references all resolve, and so on.
That was fun. The problem is that if GPT-4 was actually dangerous when hooked up to everything we did already have the problem that API access already hooks GPT up to everything, even if it requires slightly more effort. There is nothing OpenAI is doing here that can’t be done by the user. You can build GPT calls into arbitrary Python programs, instill it into arbitrary feedback loops, already.
Given that, is there that much new risk in the room here, that wasn’t already accounted for by giving developers API access?
Zvi this is straight misinformation and you should update your post. This is highly misleading.
Yes, before, you could describe to GPT-4, within it’s context window (the description costs you context tokens) what tool interfaces would do. However, if the model misunderstands the interface, or is too “hyped” about the capability of a tool that doesn’t work very well, it will keep making the same mistakes as it cannot learn.
We can infer both from base knowledge and Steven Wolfram’s blog entry on this (https://writings.stephenwolfram.com/2023/03/chatgpt-gets-its-wolfram-superpowers/) that OpenAI is auto refining the model’s ability to use plugins via RL learning. (or there are ways to do supervised learning on this)
This is not something the user can do. And it will cause a large increase in the model’s ability to use tools, probably a general ability if it gets enough training examples and a large variety of tools to practice with.
Thank you. I will update the post once I read Wolfram’s post and decide what I think about this new info.
In the future, please simply say ‘this is wrong’ rather than calling something like this misinformation, saying highly misleading with the bold highly, etc.
EDIT: Mods this is ready for re-import, the Wordpress and Substack versions are updated.
See the openAI blog post. They say in the same post that they have made a custom model, as in they weight updated gpt-4, so it can use plugins. It’s near the bottom of the entry in plugins.
Probably they also weight updated it so it knows to use the browser and local interpreter plugins well, without needing to read the description.
Since it disagrees with the authoritative source, is obviously technically wrong, I called it misinformation.
I apologize but you have so far failed to respond to most criticism and this was a pretty glaring error.
Does OpenAI say this, or are you inferring it entirely from the Wolfram blog post? Isn’t that an odd place to learn such a thing?
And where does the Wolfram blog post say this? It sounds to me like he’s doing something like this outsider, making one call to Wolfram, then using the LLM to evaluate the result and determine if it produced an error and retry.
I am inferring this because plugins would simply not work otherwise.
Please think about what it would mean for each “plugin supported query” for the AI to have to read all of the tokens of all of the plugins. Remember every piece of information OAI doesn’t put into the model weights costs you tokens from your finite length context window. Remember you can go look at the actual descriptions of many plugins and they eat 1000+ tokens alone, or 1⁄8 your window to remember what one plugin does.
Or that what it would cost OAI to keep generating GPT-4 tokens again and again and again for the machine to fail to make a request over and over and over. Or for a particular plugin to essentially lie in it’s description and be useless. Or the finer points of when to search bing vs wolfram alpha, for example for pokemon evolutions and math, wolfram, but for current news, bing...
I’m pretty confident that I have been using the “Plugins” model with a very long context window. I was copy-pasting entire 500-line source files and asking questions about it. I assume that I’m getting the 32k context window.
How many characters is your 500 line source file? It probably fits in 8k tokens. You can find out here
The entire conversation is over 60,000 characters according to wc. OpenAI’s tool won’t even let me compute the tokens if I paste more than 50k (?) characters, but when I deleted some of it, it gave me a value of >18,000 tokens.
I’m not sure if/when ChatGPT starts to forgot part of the chat history (drops out of the context window) but it still seemed to remember the first file after long, winding discussion.
Since you have to manually activate plugins, they don’t take any context until you do so. In particular, multiple plugins don’t compete for context and the machine doesn’t decide which one to use.
Please read the documentation and the blog post you cited.
“An experimental model that knows when and how to use plugins”
Sounds like they updated the model.
And it says you have to activate third party plugins. Browser, python interpreter will probably always be active.
That’s rather useless then.
And this makes GPT-4 via API access a general purpose tool user generator, which it wouldn’t be as reliably if it wasn’t RLed into this capability. Turns out system message is not about enacting user-specified personalities, but about fluent use of user-specified tools.
So the big news is not ChatGPT plugins, those are just the demo of GPT-4 being a bureaucracy engine. Its impact is not automating the things that humans were doing, but creating a new programming paradigm, where applications have intelligent rule-followers sitting inside half of their procedures, who can invoke other procedures, something that nobody seriously tried to do with real humans, not on the scale of software, because it takes the kind of nuanced level of rule-following you’d need lawyers with domain-specific expertise for, multiple orders of magnitude more expensive than LLM API access, and too slow for most purposes.
Maybe. Depends on how good it gets. It is possible that gpt-4 with plugins it has learned to use well (so each query it doesn’t read the description of the plugin it just “knows” to use Wolfram alpha and it’s first query is properly formatted) will be functionally an AGI.
Not an AGI without it’s helpers but in terms of user utility, an AGI in that it has approximately the breadth and depth of skills of the average human being.
Plugins would exist where it can check its answers, look up all unique nouns for existence, check its url references all resolve, and so on.