causalitylimited
I am here because someone said this is where I belong.
I’ve wanted to write and never had the time. Recently, I made time. And of all the writing projects that I’ve started over the years, I decided to pickup the philosophical essays because most ideas were fresh and more importantly I knew I could actually deliver a few before the opportunity of time expires.
Before starting to formally post on the internet, I was sending thoughts to friends and family and getting no responses, no pushback, no agreement. I am aware this thinking tends to produce long texts because one is trying to be exact when presenting an argument. In anycase, I told a friend about it, and he said, “it’s because you belong on lesswrong and you are living on x”. (I wasn’t living on x, just lurking, and I had read a couple of essays here without noticing the source. But that’s beside the point.)
I am hoping, he was right.
The immune system as analogy is apt but I’m also thinking of mechanics: The defender can’t access agent files, so it has to work from behavioral signals alone, which extends the analogy nicely because Immune system does not do some dna inspection, just looks for surface markers.
So, behavioral anomaly detection can only be done by 1) infrastructure providers, because they have the telemetry—maybe replicators make calls to multiple llms per n seconds etc, 2) LLM API providers, because they can see call patterns and detect markers for no-human-in-the-loop.
Still, the replication signature will largely be stuff like provisioning new servers, using throwaway emails, outbound file copies, automated account creation etc, which will also be legitimate deployment activity by some programs, for eg, bot farms for amazon reviews—not a good or even legal use but different from self replication. So, this is not an easy problem.
And even harder one will be when replicators learn or optimize to blend with legitimate activity. It seems like an arms race, specially if they have some human support for varied human motivations.
>> This is very good, along with the theory that a sense of “necessary” accountability in the population can by itself be seen as a gradient metric for … morality.
There is an idiom in Hindi that says “chori toh chori!, upar se seenazori!”, translated to, “Theft itself is bad enough—its theft! - but on top of it shamelessly defending it—is indefensible!”
As a side note, what happens when such populations at different stages of the gradient collide?
A depraved thief in a just population does not threaten it too much, he is punished, and the population moves on.
But what if a thief who feels guilty for his actions because he is brought up in a society that teaches thievery is wrong is judged by peers who think it stupid to not steal when you have an opportunity? What new soup of effects and side-effects are created? Specifically, what happens to the first society watching? Do they now teach new morals to their kids, the ones which will help them survive among these new peers? So, the whole society moves low on the gradient. Or do they move? Start their society again at a new place? What other options exist?
Even case 1 is not so simple. If the thief is charismatic, he might capture one or multiple institutions, affecting the society much more than one person in general can.
And case 2 is something that many societies today need to answer. What are the options for a law-abiding society—because lawlessness is what the gradient from guilt->shame->depravity will eventually converge to. As you rightly said the Socratic method will not apply.
Do we have answers in existing philosophies? Or are we right now thinking under the gun?