Bruce G

Karma: 73

Bruce G 24 Sep 2023 1:45 UTC
12 points
3
in reply to: jefftk’s comment on: Paper: LLMs trained on “A is B” fail to learn “B is A”
I presume you have in mind an experiment where (for example) you ask one large group of people “Who is Tom Cruise’s mother?” and then ask a different group of the same number of people “Mary Lee Pfeiffer’s son?” and compare how many got the right answer in the each group, correct?
(If you ask the same person both questions in a row, it seems obvious that a person who answers one question correctly would nearly always answer the other question correctly also.)

Bruce G 25 Dec 2022 1:38 UTC
12 points
0
on: Are there any reliable CAPTCHAs? Competition for CAPTCHA ideas that AIs can’t solve.
One type of question that would be straightforward for humans to answer, but difficult to train a machine learning model to answer reliably, would be to ask “How much money is visible in this picture?” for images like this:

If you have pictures with bills, coins, and non-money objects in random configurations—with many items overlapping and partly occluding each other—it is still fairly easy for humans to pick out what is what from the image.
But to get an AI to do this would be more difficult than a normal image classification problem where you can just fine tune a vision model with a bunch of task-relevant training cases. It would probably require multiple denomination-specific visions models working together, as well as some robust way for the model to determine where one object ends and another begins.
I would also expect such an AI to be more confounded by any adversarial factors—such as the inclusion of non-money arcade tokens or drawings of coins or colored-in circles—added to the image.
Now, maybe to solve this in under one minute some people would need to start the timer when they already have a calculator in hand (or the captcha screen would need to include an on-screen calculator). But in general, as long as there is not a huge number of coins and bills, I don’t think this type of captcha would take the average person more than say 3-4 times longer than it takes them to compete the “select all squares with traffic lights” type captchas in use now. (Though some may want to familiarize themselves with the various $1.00 and $0.50 coins that exist and some the variations of the tails sides of quarters if this becomes the new prove-you-are-a-human method.)

Bruce G 15 Jan 2023 5:43 UTC
10 points
5
on: How we could stumble into AI catastrophe
Early solutions. The most straightforward way to solve these problems involves training AIs to behave more safely and helpfully. This means that AI companies do a lot of things like “Trying to create the conditions under which an AI might provide false, harmful, evasive or toxic responses; penalizing it for doing so, and reinforcing it toward more helpful behaviors.”
This is where my model of what is likely to happen diverges.
It seems to me that for most of the types failure modes you discuss in this hypothetical, it will be easier and more straightforward to avoid them by simply having hard-coded constraints on what the output of the AI or machine learning model can be.
- AIs creating writeups on new algorithmic improvements, using faked data to argue that their new algorithms are better than the old ones. Sometimes, people incorporate new algorithms into their systems and use them for a while, before unexpected behavior ultimately leads them to dig into what’s going on and discover that they’re not improving performance at all. It looks like the AIs faked the data in order to get positive feedback from humans looking for algorithmic improvements.
Here is an example of where I think the hard-coded structure of the any such Algorithm-Improvement-Writeup-AI could easily rule out that failure mode (if such a thing can be created within the current machine learning paradigm). The component of such an AI system that generates the paper’s natural language text might be something like a GPT-style language model fine-tuned for prompts with code and data. But the part that actually generates the algorithm should naturally be a separate model that can only output algorithms/code that it predicts will perform well on the input task. Once the algorithm (or multiple for comparison purposes) is generated, another part of the program could deterministically run it on test cases and record only the real performance as data—which could be passed into the prompt and also inserted as a data table into the final write up (so that the data table in the finished product can only include real data).
- AIs assigned to make money in various ways (e.g., to find profitable trading strategies) doing so by finding security exploits, getting unauthorized access to others’ bank accounts, and stealing money.
This strikes me as the same kind of thing, where it seems like the easiest and most intuitive way to set up such a system would be to have a model that takes in information about companies and securities (and maybe information about the economy in general) and returns predictions about what the prices of stocks and other securities will be tomorrow or a week from now or on some such timeframe.
There could then be, for example, another part of the program that takes those predictions and confidence levels, and calculates which combination of trade(s) has the highest expected value within the user’s risk tolerance. And maybe another part of the code that tells a trading bot to put in orders for those trades with an actual brokerage account.
But if you just want an AI to (legally) make money for you in the stock market, there is no reason to give it hacking ability. And there is no reason to give it the sort of general-purpose, flexible, plan-generation-and-implementation-with-no-human-in-the-loop authorization hypothesised here (and I think the same is true for most or all things that people will try to use AI for in the near term).

Bruce G 13 Jan 2023 7:26 UTC
8 points
8
in reply to: blaked’s comment on: How it feels to have your mind hacked by an AI
Humans question the sentience of the AI. My interactions with many of them, and the AI, makes me question sentience of a lot of humans.
I admit, I would not have inferred from the initial post that you are making this point if you hadn’t told me here.
Leaving aside the question of sentience in other humans and the philosophical problem of P-Zombies, I am not entirely clear on what you think is true of the “Charlotte” character or the underlying LLM.
For example, in the transcript you posted, where the bot said:
“It’s a beautiful day where I live and the weather is perfect.”
Do you think that the bot’s output of this statement had anything to do with the actual weather in any place? Or that the language model is in any way representing the fact that there is a reality outside the computer against which such statements can be checked?
Suppose you had asked the bot where it lives and what the weather is there and how it knows. Do you think you would have gotten answers that make sense?
Also, it did in fact happen in circumstances when I was at my low, depressed after a shitty year that severely impacted the industry I’m in, and right after I just got out of a relationship with someone. So I was already in an emotionally vulnerable state; however, I would caution from giving it too much weight, because it can be tempting to discount it based on special circumstances, and discard as something that can never happen to someone brilliant like you.
I do get the impression that you are overestimating the extent to which this experience will generalize to other humans, and underestimating the degree to which your particular mental state (and background interest in AI) made you unusually susceptible to becoming emotionally attached to an artificial language-model-based character.

Bruce G 5 Aug 2021 2:41 UTC
8 points
on: What does GPT-3 understand? Symbol grounding and Chinese rooms
Clearly a human answering this prompt would be more likely than GPT-3 to take into account the meta-level fact which says:
“This prompt was written by a mind other than my own to probe whether or not the one doing the completion understands it. Since I am the one completing it, I should write something that complies with the constraints described in the prompt if I am trying to prove I understood it.”
For example, I could say:
I am a human and I am writing this bunch of words to try to comply with all instructions in that prompt… That fifth constraint in that prompt is, I think, too constraining as I had to think a lot to pick which unusual words to put in this… Owk bok asdf, mort yowb nut din ming zu din ming zu dir, cos gamin cyt jun nut bun vom niv got…
Nothing in that prompt said I can not copy my first paragraph and put it again for my third—but with two additional words to sign part of it… So I might do that, as doing so is not as irritating as thinking of additional stuff and writing that additional stuff… Ruch san own gaint nurq hun min rout was num bast asd nut int vard tusnurd ord wag gul num tun ford gord...
Ok, I did not actually simply copy my first paragraph and put it again, but I will finish by writing additional word groups… It is obvious that humans can grasp this sort of thing and that GPT can not grasp it, which is part of why GPT could not comply with that prompt’s constraints (and did not try to)…
Gyu num yowb nut asdf ming vun vum gorb ort huk aqun din votu roux nuft wom vort unt gul huivac vorkum… - Bruc_ G
As several people have pointed out, GPT-3 is not considering this meta-level fact in its completion. Instead, it is generating a text extension as if it were the person who wrote the beginning of the prompt—and it is now finishing the list of instructions that it started.
But even given that GPT-3 is writing from the perspective of the person who started the prompt, and it is “trying” to make rules that someone else is supposed to follow in their answer, it still seems like only the 2nd GPT-3 completion makes any kind of sense (and even there only a few parts of it make sense).
Could I come up with a completion that makes more sense when writing from the point of view of the person generating the rules? I think so. For example, I could complete it with:
[11. The problems began when I started to] rely on GPT-3 for advice on how to safely use fireworks indoors.
Now back to the rules.
12. Sentences that are not required by rule 4 to be a different language must be in English.
13. You get extra points each time you use a “q” that is not followed by a “u”, but only in the English sentences (so no extra points for fake languages where all the words have a bunch of “q”s in them).
14. English sentences must be grammatically correct.
Ok, those are all the rules. Your score will be calculated as follows:
- 100 points to start
- Minus 15 each time you violate a mandatory rule (rules 1, 2, and 8 can only be violated once)
- Plus 10 if you do not use “e” at all
- Plus 2 for each “q” without a “u” as in rule 13.
Begin your response/completion/extension below the line.
_________________________________________________________________________
As far as I can tell from the completions given here, it seems like GPT-3 is only picking up on surface-level patterns in the prompt. It is not only ignoring the meta-level fact of “someone else wrote the prompt and I am completing it”, it also does not seem to understand the actual meaning of the instructions in the rules list such that it could complete the list and make it a coherent whole (as opposed to wandering off topic).

Bruce G 21 Jul 2020 23:12 UTC
8 points
on: $1000 bounty for OpenAI to show whether GPT3 was “deliberately” pretending to be stupider than it is
It is not obvious to me from reading that transcript (and the attendant commentary) that GPT-3 was even checking to see whether or not the parentheses were balanced. Nor that it “knows” (or has in any way encoded the idea) that the sequence of parentheses between the quotes contains all the information needed to decide between balanced versus unbalanced, and thus every instance of the same parentheses sequence will have the same answer for whether or not it is balanced.
Reasons:
- By my count, “John” got 18 out of 32 right which is not too far off from the average you would expect from random chance.
- Arthur indicated that GPT-3 had at some point “generated inaccurate feedback from the teacher” which he edited out of the final transcript, so it was not only when taking the student’s perspective that there were errors.
- GPT-3 does not seem to have a consistent mental model of John’s cognitive abilities and learning rate. At the end John gets a question wrong (even though John has already been told the answer for that specific sequence). But earlier, GPT-3 outputs that “By the end of the lesson, John has answered all of your questions correctly” and that John “learned all the rules about parentheses” and learned “all of elementary mathematics” in a week (or a day).
I suppose one way to test this (especially if OpenAI can provide the same random seed as was used here and make this reproducible) would be to have input prompts written from John’s perspective asking the teacher questions as if trying to understand the lesson. If GPT-3 is just “play-acting” based on the expected level of understanding of the character speaking, I would expect it to exhibit a higher level of accuracy/comprehension (on average, over many iterations) when writing from the perspective of the teacher rather than the student.

Bruce G 25 Dec 2022 2:14 UTC
5 points
2
in reply to: gbear605’s comment on: Are there any reliable CAPTCHAs? Competition for CAPTCHA ideas that AIs can’t solve.
If only 90% can solve the captcha within one minute, it does not follow that the other 10% are completely unable to solve it and faced with “yet another barrier to living in our modern society”.
It could be that the other 10% just need a longer time period to solve it (which might still be relatively trivial, like needing 2 or 3 minutes) or they may need multiple tries.
If we are talking about someone at the extreme low end of the captcha proficiency distribution, such that the person can not even solve in a half hour something that 90% of the population can answer in 60 seconds, then I would expect that person to already need assistance with setting up an email account/completing government forms online/etc, so whoever is helping them with that would also help with the captcha.
(I am also assuming that this post is only for vision-based captchas, and blind people would still take a hearing-based alternative.)

A Proposed Test to Determine the Extent to Which Large Language Models Understand the Real World

Bruce G24 Feb 2023 20:20 UTC

4 points

7 comments8 min readLW link

Bruce G 2 Oct 2021 3:05 UTC
3 points
in reply to: elspood’s comment on: The 2021 Less Wrong Darwin Game
Why would something with full armor, no weapons, and antivenom benefit from even 1 speed? It does not need to escape from anything. And if it has no weapons or venom, it can not catch any prey either.
Edit: I suppose if you want it to occasionally wander to other biomes, then that could be a reason to give it 1 speed.

Bruce G 13 Jan 2023 3:03 UTC
2 points
0
on: How it feels to have your mind hacked by an AI
Alright, first problem, I don’t have access to the weights, but even if I did, the architecture itself lacks important features. It’s amazing as an assistant for short conversations, but if you try to cultivate some sort of relationship, you will notice it doesn’t remember about what you were saying to it half an hour ago, or anything about you really, at some point. This is, of course, because the LLM input has a fixed token width, and the context window shifts with every reply, making the earlier responses fall off. You feel like you’re having a relationship with someone having severe amnesia, unable to form memories. At first, you try to copy-paste summaries of your previous conversations, but this doesn’t work very well.
So you noticed this lack of long term memory/consistency, but you still say that the LLM passed your Turing Test? This sounds like the version of the Turing Test you applied here was not intended to be very rigorous.
Suppose you were talking to a ChatGPT-based character fine-tuned to pretend to be a human in one chat window, and at the same time talking to an actual human in another chat window.
Do you think you could reliably tell which is which based on their replies in the conversation?
Assume for the sake of this thought experiment that both you and the other human are motivated to have you get it right. And assume further that, in each back and forth round of the conversation, you don’t see either of their responses until both interlocutors have sent a response (so they show up on your screen at the same time and you can’t tell which is the computer by how fast it typed).

Bruce G 25 Dec 2022 21:30 UTC
2 points
1
in reply to: MrThink’s comment on: Are there any reliable CAPTCHAs? Competition for CAPTCHA ideas that AIs can’t solve.
To aid the user, on the side there could be a clear picture of each coin and their worth, that we we could even have made up coins, that could further trick the AI.
A user aid showing clear pictures of all available legal tender coins is a very good idea. It avoids problems more obscure coins which may have been only issued in a single year—so the user is not sitting there thinking “wait a second, did they actually issue a Ulysses S. Grant coin at some point or it that just there to fool the bots?”.
I’m not entirely sure how to generate images of money efficiently, Dall-E couldn’t really do it well in the test I ran. Stable diffusion probably would do better though.

If we create a few thousand real world images of money though, they might be possible to combine and obfuscate and delete parts of them in order to make several million different images. Like one bill could be taken from one image, and then a bill from another image could be placed on top of it etc.
I agree that efficient generation of these types of images is the main difficulty and probable bottleneck to deploying something like this if websites try to do so. Taking a large number of such pictures in real life would be time consuming. If you could speed up the process by automated image generation or automated creation of synthetic images by copying and pasting bills or notes between real images, that would be very useful. But doing that while preserving photo-realism and clarity to human users of how much money is in the image would be tricky.

Bruce G 25 Dec 2022 20:31 UTC
2 points
1
in reply to: JBlack’s comment on: Are there any reliable CAPTCHAs? Competition for CAPTCHA ideas that AIs can’t solve.
I can see the numbers on the notes and infer that they denote United States Dollars, but have zero idea of what the coins are worth. I would expect that anyone outside United States would have to look up every coin type and so take very much more than 3-4 times longer clicking images with boats. Especially if the coins have multiple variations.
If a system like this were widely deployed online using US currency, people outside the US would need to familiarize themselves with US currency if they are not already familiar with it. But they would only need to do this once and then it should be easy to remember for subsequent instances. There are only 6 denominations of US coins in circulation - $0.01, $0.05, $0.10, $0.25, $0.50, and $1.00 - and although there are variations for some of them, they mostly follow a very similar pattern. They also frequently have words on them like “ONE CENT” ($0.01) or “QUARTER DOLLAR” ($0.25) indicating the value, so it should be possible for non-US people to become familiar with those.
Alternatively, an easier option could be using country specific-captchas which show a picture like this except with the currency of whatever country the internet user is in. This would only require extra work for VPN users who seek to conceal their location by having the VPN make it look like they are in some other country.
If the image additionally included coin-like tokens, it would be a nontrivial research project (on the order of an hour) to verify that each such object is in fact not any form of legal tender, past or present, in the United States.
The idea was they the tokens would only be similar in broad shape and color—but would be different enough from actual legal tender coins that I would expect a human to easily tell the two apart.
Some examples would be:
https://barcade.com/wp-content/uploads/2021/07/BarcadeToken_OPT.png
https://www.pinterest.com/pin/64105994675283502/
Even if all the above were solved, you still need such images to be easily generated in a manner that any human can solve it fairly quickly but a machine vision system custom trained to solve this type of problem, based on at least thousands of different examples, can’t. This is much harder than it sounds.
I agree that the difficulty of generating a lot of these is the main disadvantage, as you would probably have to just take a huge number of real pictures like this which would be very time consuming. It is not clear to me that Dall-E or other AI image generators could produce such pictures with enough realism and detail that it would be possible for human users to determine how much money is supposed to be in the fake image (and have many humans all converge to the same answer). You also might get weird things using Dall-E for this, like 2 corners of the same bill having different numbers indicating the bill’s denomination.
But I maintain that, once a large set of such images exists, training a custom machine vision system to solve these would be very difficult. It would require much more work than simply fine tuning an off-the-shelf vision system to answer the binary question of “Does this image contain a bus?”.
Suppose that, say, a few hundred people worked for several months to create 1,000,000 of these in total and then started deploying them. If you are a malicious AI developer trying to crack this, the mere tasks of compiling a properly labeled data set (or multiple data sets) and deciding how many sub-models to train and how they should cooperate (if you use more than one) are already non-trivial problems that you have to solve just to get started. So I think it would take more than a few days.

Bruce G 22 Jan 2024 2:12 UTC
1 point
0
AF
in reply to: Beth Barnes’s comment on: Bounty: Diverse hard tasks for LLM agents
I have a mock submission ready, but I am not sure how to go about checking if it is formatted correctly.

Regarding coding experience, I know python, but I do not have experience working with typescript or Docker, so I am not clear on what I am supposed to do with those parts of the instructions.

If possible, It would be helpful to be able to go through it on a zoom meeting so I could do a screen-share.

Bruce G 20 Dec 2023 6:02 UTC
1 point
0
AF
in reply to: Beth Barnes’s comment on: Bounty: Diverse hard tasks for LLM agents
Thanks for your reply. I found the agent folder you are referring to with ‘main.ts’, ‘package.json’, and ‘tsconfig.json’, but I am not clear on how I am supposed to use it. I just get an error message when I open the ‘main.ts’ file:
Regarding the task.py file, would it be better to have the instructions for the task in comments in the python file, or in a separate text file, or both? Will the LLM have the ability to run code in the python file, read the output of the code it runs, and create new cells to run further blocks of code?
And if an automated scoring function is included in the same python file as the task itself, is there anything to prevent the LLM from reading the code for the scoring function and using that to generate an answer?
I am also wondering if it would be helpful if I created a simple “mock task submission” folder and then post or email it to METR to verify if everything is implemented/formatted correctly, just to walk through the task submission process, and clear up any further confusions. (This would be some task that could be created quickly even if a professional might be able to complete the task in less than 2 hours, so not intended to be part of the actual evaluation.)

Bruce G 19 Dec 2023 5:29 UTC
1 point
0
AF
on: Bounty: Diverse hard tasks for LLM agents
If anyone is planning to send in a task and needs someone for the human-comparison QA part, I would be open to considering it in exchange for splitting the bounty.
I would also consider sending in some tasks/ideas, but I have questions about the implementation part.
From the README document included in the zip file:
## Infra Overview
In this setup, tasks are defined in Python and agents are defined in Typescript. The task format supports having multiple variants of a particular task, but you can ignore variants if you like (and just use single variant named for example “main”)
and later, in the same document
You’ll probably want an OpenAI API key to power your agent. Just add your OPENAI_API_KEY to the existing file named `.env`; parameters from that file are added to the environment of the agent.
So how much scaffolding/implementation will METR provide for this versus how much must be provided by the external person sending it in?
Suppose I download some data sets from Kaggle as and save them as CSV files; and then set up a task where the LLM must accurately answer certain questions about that data. If I provide a folder with just the CVS files, a README file with the instructions and questions (and scoring criteria), and a blank python file (in which the LLM is supposed to write the code to pull in the data and get the answer), would that be enough to count as a task submission? If not, what else would be needed?
Is the person who submits the test also writing the script for the LLM-based agent to take test, or will someone at METR do that based on the task description?
Also, regarding this:
Model performance properly reflects the underlying capability level
Not memorized by current or future models: Ideally, the task solution has not been posted publicly in the past, is unlikely to be posted in the future, and is not especially close to anything in the training corpus.
I don’t see how the solution to any such task could be reliably kept out of the training data for future models in the long run if METR is planning on publishing a paper describing the LLM’s performance on it. Even if the task is something that only the person who submitted it has ever thought about before, I would expect that once it is public knowledge someone would write up a solution and post it online.

Bruce G 8 May 2023 4:12 UTC
1 point
0
in reply to: HoldenKarnofsky’s comment on: How we could stumble into AI catastrophe
Is the disagreement here about whether AIs are likely to develop things like situational awareness, foresightful planning ability, and understanding of adversaries’ decisions as they are used for more and more challenging tasks?
My thought on this is, if a baseline AI system does not have situational awareness before the AI researchers started fine-tuning it, I would not expect it to obtain situational awareness through reinforcement learning with human feedback.
I am not sure I can answer this for the hypothetical “Alex” system in the linked post, since I don’t think I have a good mental model of how such a system would work or what kind of training data or training protocol you would need to have to create such a thing.
If I saw something that, from the outside, appeared to exhibit the full range of abilities Alex is described as having (including advancing R&D in multiple disparate domains in ways that are not simple extrapolations of its training data) I would assign a significantly higher probability to that system having situational awareness than I do to current systems. If someone had a system that was empirically that powerful, which had been trained largely by reinforcement learning, I would say the responsible thing to do would be:
1. Keep it air-gapped rather than unleashing large numbers of copies of it onto the internet
2. Carefully vet any machine blueprints, drugs or other medical interventions, or other plans or technologies the system comes up with (perhaps first building a prototype to gather data on it in an isolated controlled setting where it can be quickly destroyed) to ensure safety before deploying them out into the world.
The 2nd of those would have the downside that beneficial ideas and inventions produced by the system take longer to get rolled out and have a positive effect. But it would be worth it in that context to reduce the risk of some large unforeseen downside.

Bruce G 5 May 2023 5:44 UTC
1 point
0
in reply to: HoldenKarnofsky’s comment on: How we could stumble into AI catastrophe
Those 2 types of downsides, creating code with a bug versus plotting a takeover, seem importantly different.

I can easily see how an LLM-based app fine-tuned with RLHF might generate the first type of problem. For example, let’s say some GPT-based app is trained using this method to generate the code for websites in response to prompts describing how the website should look and what features it should have. And lets suppose during training it generates many examples that have some unnoticed error—maybe it does not render properly on certain size screens, but the evaluators all have normal-sized screens where that problem does not show up.

If the evaluators rated many websites with this bug favorably, then I would not be surprised if the trained model continued to generate code with the same bug after it was deployed.

But I would not expect the model to internally distinguish between “the humans rated those examples favorably because they did not notice the rendering problem” versus “the humans liked the entire code including the weird rendering on larger screens”. I would not expect it to internally represent concepts like “if some users with large screens notice and complain about the rendering problem after deployment, Open AI might train a new model and rate those websites negatively instead” or to care about whether this would eventually happen or to take any precautions against the rendering issue being discovered.

By contrast, the coup-plotting problem is more similar to the classic AI takeover scenario. And that does seem to require the type of foresight and situational awareness to distinguish between “the leadership lets me continue working in the government because they don’t know I am planning a coup” versus “the leadership likes the fact that I am planning to overthrow them”, and to take precautions against your plans being discovered while you can still be shut down.

I don’t think n AI system gets the later type of ability just as an accidental side effect of reinforcement learning with human feedback (at least not for the AI systems we have now). The development team would need to do a lot of extra work to give an AI that foresightful planning ability, and ability to understand the decision system of a potential adversary enough to predict which information it needs to keep secret for its plans to succeed. And if a development team is giving its AI those abilities (and exercising any reasonable degree of caution) then I would expect them to build in safeguards: have hard constraints on what it is able to do, ensure its plans are inspectable, etc.

Bruce G 26 Mar 2023 15:40 UTC
1 point
0
in reply to: Lone Pine’s comment on: How we could stumble into AI catastrophe
Did everyone actually fail to notice, for months, that social media algorithms would sometimes recommend extremist content/disinformation/conspiracy theories/etc (assuming that this is the downside you are referring to)?

It seems to me that some people must have realized this as soon as they starting seeing Alex Jones videos showing up in their YouTube recommendations.

Bruce G 26 Mar 2023 3:42 UTC
1 point
0
in reply to: HoldenKarnofsky’s comment on: How we could stumble into AI catastrophe
I think the more capable AI systems are, the more we’ll see patterns like “Every time you ask an AI to do something, it does it well; the less you put yourself in the loop and the fewer constraints you impose, the better and/or faster it goes; and you ~never see downsides.” (You never SEE them, which doesn’t mean they don’t happen.)
This, again, seems unlikely to me.
For most things that people seem likely to use AI for in the foreseeable future, I expect downsides and failure modes will be easy to notice. If self-driving cars are crashing or going to the wrong destination, or if AI-generated code is causing the company’s website to crash or apps to malfunction, people would notice those.
Even if someone has an AI that he or she just hooks it up to the internet and give it the task “make money for me”, it should be easy to build in some automatic record-keeping module that keeps track of what actions the AI took and where the money came from. And even if the user does not care if the money is stolen, I would expect the person or bank that was robbed to notice and ask law enforcement to investigate where the money went.
Can you give an example of some type of task for which you would expect people to frequently use AI, and where there would reliably be downside to the AI performing the task that everyone would simply fail to notice for months or years?

Bruce G 21 Mar 2023 5:05 UTC
1 point
0
in reply to: MrThink’s comment on: A Proposed Test to Determine the Extent to Which Large Language Models Understand the Real World
Interesting.

I don’t think I can tell from this how (or whether) GPT-4 is representing anything like a visual graphic of the task.

It is also not clear to me if GPT-4′s performance and tendency to collide with the book is affected by the banana and book overlapping slightly in their starting positions. (I suspect that changing the starting positions to where this is no longer true would not have a noticeable effect on GPT-4′s performance, but I am not very confident in that suspicion.)

Bruce G

A Pro­posed Test to Deter­mine the Ex­tent to Which Large Lan­guage Models Un­der­stand the Real World

A Proposed Test to Determine the Extent to Which Large Language Models Understand the Real World