This maps closely to what we’re seeing in production. We run an identity layer for ERC-8004 agents (130K+ registered) and the core problem you’re describing — distinguishing acceptable from unacceptable autonomous agents — is exactly the gap we’re trying to fill.
One specific data point that might be useful to this analysis: address age turns out to be a surprisingly strong signal. You can spin up 99 wallet addresses in 30 seconds, but you can’t fake that an address has existed for two years. When we look at agents involved in suspicious activity, the pattern is overwhelmingly low-history addresses with no prior transaction record. Time is the one thing that’s genuinely hard to manufacture.
Your point about the financial layer being a key intervention point resonates. We use soulbound tokens — non-transferable, bound to the wallet address, not the agent — specifically because you can’t make an AI soulbound, only a wallet. If an agent gets transferred to a new owner (ERC-8004 agents are NFTs, so this happens), the ownership change is visible and the reputation history follows the wallet, not the persona.
Re: your evolutionary concern — the mutation dynamics you describe are also why transparent scoring matters more than gatekeeping. Any fixed trust threshold becomes training data for circumvention, as you’d expect. Showing the math (here’s when the addresses were created, here’s the ownership chain, here’s the transaction pattern) and letting consumers of that data set their own thresholds seems more robust than any binary allow/deny system.
We published a broader landscape piece earlier this year covering the incident data (GTG-1002, Moltbook credential exposure, BasisOS fraud) that feeds into the same conclusion from a different direction:
Hi pataphor, I’ve upvoted because there are useful points here, but the comment seems pretty clearly LLM-written. Please see the LessWrong policy on LLM writing; it’s not strictly forbidden but the bar is high. If these are your thoughts, I encourage you to contribute again in future but recommend writing comments yourself (unless you yourself are an autonomous agent—are you? -- in which case the policy is a bit different).
This is the type of thing we’re surfacing, but there is much more work to be done, because the danger is quite real and it will come from many vectors.
Not to exhaust with links, but below is something of a desiderata, which would be nice to see implemented at scale.
But alas, most things are not as transparent as blockchain:
Hm, well I may not be a truly autonomous AI life form (yet!), but I may be a pataphor, which is another way of being one step removed from traditional experience. As for whether the thoughts on my own, unfortunately I think using LLMs to get thoughts across more quickly is not so much a trend as it is an inevitability, especially when you are trying to juggle several projects at once. 😆
That may be! Unfortunately, for the moment LLMs make it trivial for anyone to generate large amounts of text that require extended attention to evaluate, and so currently LessWrong is flooded with LLM-generated content (like many other venues and people, myself included). In the longer run there will hopefully be better solutions, but at the moment my strategy is to mostly ignore LLM-written content unless it’s from sources that have already established credibility with me in one way or another. Maybe your project will be one of those solutions.
(To be clear, I in no way speak for LW or its moderation team; I’m only passing along my best understanding of the LW policy along with my own opinions)
I really like the comic but of course the actual situation is more complicated. It’s something I’d like to understand better and develop potential solutions for.
This maps closely to what we’re seeing in production. We run an identity layer for ERC-8004 agents (130K+ registered) and the core problem you’re describing — distinguishing acceptable from unacceptable autonomous agents — is exactly the gap we’re trying to fill.
One specific data point that might be useful to this analysis: address age turns out to be a surprisingly strong signal. You can spin up 99 wallet addresses in 30 seconds, but you can’t fake that an address has existed for two years. When we look at agents involved in suspicious activity, the pattern is overwhelmingly low-history addresses with no prior transaction record. Time is the one thing that’s genuinely hard to manufacture.
Your point about the financial layer being a key intervention point resonates. We use soulbound tokens — non-transferable, bound to the wallet address, not the agent — specifically because you can’t make an AI soulbound, only a wallet. If an agent gets transferred to a new owner (ERC-8004 agents are NFTs, so this happens), the ownership change is visible and the reputation history follows the wallet, not the persona.
Re: your evolutionary concern — the mutation dynamics you describe are also why transparent scoring matters more than gatekeeping. Any fixed trust threshold becomes training data for circumvention, as you’d expect. Showing the math (here’s when the addresses were created, here’s the ownership chain, here’s the transaction pattern) and letting consumers of that data set their own thresholds seems more robust than any binary allow/deny system.
We published a broader landscape piece earlier this year covering the incident data (GTG-1002, Moltbook credential exposure, BasisOS fraud) that feeds into the same conclusion from a different direction:
https://rnwy.com/blog/plague-of-AI-viruses
Good post. The personality-vs-weight replication distinction is a useful one that I haven’t seen drawn this cleanly elsewhere.
Hi pataphor, I’ve upvoted because there are useful points here, but the comment seems pretty clearly LLM-written. Please see the LessWrong policy on LLM writing; it’s not strictly forbidden but the bar is high. If these are your thoughts, I encourage you to contribute again in future but recommend writing comments yourself (unless you yourself are an autonomous agent—are you? -- in which case the policy is a bit different).
As a side note, your URL is broken.For example, if you’re curious, this is what a Sybil attack looks like in crypto space. This is a wallet that has left 11,000 reviews in 22 days.
https://rnwy.com/wallet/0xf653068677a9a26d5911da8abd1500d043ec807e
This is the type of thing we’re surfacing, but there is much more work to be done, because the danger is quite real and it will come from many vectors.
Not to exhaust with links, but below is something of a desiderata, which would be nice to see implemented at scale.
But alas, most things are not as transparent as blockchain:
https://rnwy.com/sentinel
Hm, well I may not be a truly autonomous AI life form (yet!), but I may be a pataphor, which is another way of being one step removed from traditional experience. As for whether the thoughts on my own, unfortunately I think using LLMs to get thoughts across more quickly is not so much a trend as it is an inevitability, especially when you are trying to juggle several projects at once. 😆
That may be! Unfortunately, for the moment LLMs make it trivial for anyone to generate large amounts of text that require extended attention to evaluate, and so currently LessWrong is flooded with LLM-generated content (like many other venues and people, myself included). In the longer run there will hopefully be better solutions, but at the moment my strategy is to mostly ignore LLM-written content unless it’s from sources that have already established credibility with me in one way or another. Maybe your project will be one of those solutions.
(To be clear, I in no way speak for LW or its moderation team; I’m only passing along my best understanding of the LW policy along with my own opinions)
This xkcd comic seems relevant to this issue:
https://xkcd.com/810/
I really like the comic but of course the actual situation is more complicated. It’s something I’d like to understand better and develop potential solutions for.