Bob is a skilled manipulator and deliberately says things that will make Alice do… what is in his interest. what he thinks is in her interest. what his values say she should do. what he thinks her values say she should do.
Bob wants and advises Alice to do what he thinks she should do (based on his own values). Bob is highly convincing and Alice does what he suggests. They have the same values They have different values Alice is not convinced by Bob responding to his advice helps her clarify what she thinks she should do.
Bob’s advice changes Alice’s values
Bob tries to figure out Alice’s values and then advises her based on that. He gets it wrong. He gets it right… because he knows her well and asks lots of relevant questions. by pure luck.
Bob believes that only she knows her own values so he… tells her he cannot help her. tells her he cannot give advice, but he can tell her a some facts he knows that may help her make the decision for herself. Equipped with this new information, Alice is able to make a decision that better reflects her own values. Bob carefully selects facts that push her towards a specific choice, while censoring ones that won’t. Bob tells her everything he knows but for contingent reasons of selection (such as what kind of facts Bob is interested in) these only include facts that push her towards a specific choice, and exclude that won’t. The new knowledge contradict some of Alice’s pre-existing beliefs about the problem… and she can now make a better informed decision. and she is now even more confused about what to do than before. Bob is an omniscient god and tells Alice every fact about the universe. Equipped with this new information, Alice is able to make a decision that better reflects her own values. Equipped with this new information, Alice realises she holds contradictory values that point to different courses of action. Now she has ascended to omniscience Alice no longer cares about the problem.
Bob tells Alice to ask Charlie Bob tells Alice to ask ChatGPT Bob asks ChatGPT and then passes the response off as his own
This raises an important question: if AI models can be misused for cyberattacks at this scale, why continue to develop and release them? The answer is that the very abilities that allow Claude to be used in these attacks also make it crucial for cyber defense.
Instead of trying to present any kind of utopian vision of the benefits of AI, someone at Anthropic decided to sell us the image of an internet dominated by endless cyberwar trapped in a perverse feedback loop in escalating speed and incomprehensibility.
Instead of trying to present any kind of utopian vision of the benefits of AI, someone at Anthropic decided to sell us the image of an internet dominated by endless cyberwar trapped in a perverse feedback loop in escalating speed and incomprehensibility.
Good. If this is what the authors believe the future holds, it’s much better that they say it than search for a rosy-sounding justification.
You might are probably right. For someone arguing the benefits of AI I certainly can’t accuse this writer of being misleadingly optimistic.
But personally I’ve recently found it quite disconcerting how bleak the image of the future of people who work in AI (on both sides of the capabilities/safety divide) seem to be willingly to work towards building.
Overcoming this kind of reflexive defeatism seems to me much harder than simply trying to convince people that we are going in a bad direction as a matter of fact.
Alice asks Bob for advice about a tricky problem
Bob gives good advice
Bob gives bad advice
Bob is a skilled manipulator and deliberately says things that will make Alice do…
what is in his interest.
what he thinks is in her interest.
what his values say she should do.
what he thinks her values say she should do.
Bob wants and advises Alice to do what he thinks she should do (based on his own values).
Bob is highly convincing and Alice does what he suggests.
They have the same values
They have different values
Alice is not convinced by Bob responding to his advice helps her clarify what she thinks she should do.
Bob’s advice changes Alice’s values
Bob tries to figure out Alice’s values and then advises her based on that.
He gets it wrong.
He gets it right…
because he knows her well and asks lots of relevant questions.
by pure luck.
Bob believes that only she knows her own values so he…
tells her he cannot help her.
tells her he cannot give advice, but he can tell her a some facts he knows that may help her make the decision for herself.
Equipped with this new information, Alice is able to make a decision that better reflects her own values.
Bob carefully selects facts that push her towards a specific choice, while censoring ones that won’t.
Bob tells her everything he knows but for contingent reasons of selection (such as what kind of facts Bob is interested in) these only include facts that push her towards a specific choice, and exclude that won’t.
The new knowledge contradict some of Alice’s pre-existing beliefs about the problem…
and she can now make a better informed decision.
and she is now even more confused about what to do than before.
Bob is an omniscient god and tells Alice every fact about the universe.
Equipped with this new information, Alice is able to make a decision that better reflects her own values.
Equipped with this new information, Alice realises she holds contradictory values that point to different courses of action.
Now she has ascended to omniscience Alice no longer cares about the problem.
Bob tells Alice to ask Charlie
Bob tells Alice to ask ChatGPT
Bob asks ChatGPT and then passes the response off as his own
Bob is a rubber duck and says nothing
So, simulacrum levels crossed with the dyad?
This quote from Anthropic’s report on the large scale Claude code cyberattack seems utterly comical to me:
Instead of trying to present any kind of utopian vision of the benefits of AI, someone at Anthropic decided to sell us the image of an internet dominated by endless cyberwar trapped in a perverse feedback loop in escalating speed and incomprehensibility.
Good. If this is what the authors believe the future holds, it’s much better that they say it than search for a rosy-sounding justification.
You might are probably right. For someone arguing the benefits of AI I certainly can’t accuse this writer of being misleadingly optimistic.
But personally I’ve recently found it quite disconcerting how bleak the image of the future of people who work in AI (on both sides of the capabilities/safety divide) seem to be willingly to work towards building.
Overcoming this kind of reflexive defeatism seems to me much harder than simply trying to convince people that we are going in a bad direction as a matter of fact.