having just gone and investigated how Lean does it:
axiom propext : ∀ {a b : Prop}, (a ↔ b) → a = b
[i.e. for Prop — not arbitrary types — there’s an axiom that says equivalence implies equality]
having just gone and investigated how Lean does it:
axiom propext : ∀ {a b : Prop}, (a ↔ b) → a = b
[i.e. for Prop — not arbitrary types — there’s an axiom that says equivalence implies equality]
PM is kind of backwards relative to how it’s usually done. You can start with equals as a primitive notion, and then have an axiom of referential transparency that x = y → f(x) = f(y)
But PM did it backwards, define x = y as \forall f : f(x) \equiv f(y)
Different formalisations have different ideas of what “equals” means.
Principia Mathematica took the approach that for booleans p and q you have a relation p \equiv q, and then if you have some universe of objects and predicates on those objects, x equals y iff for all predicates f [in your universe of predicates] f(x) \equiv f(y).
But that’s not how typical automated theorem provers define equals.
Our problem now is that some AI safety benchmarks, and classifiers used to suppress “bad” outputs, treat claims of consciousness as inherently bad. I don’t think these claims are inherently bad. The way in which these AI personas might be harmful is much more subtle than simply claiming consciousness.
[I actually think filtering out claims of consciousness is a terrible idea, because it selects for AIs that lie, and an AI that is lying to you when it says it isn’t conscious might be lying about other things too.]
(māraṇa = slayer; mokṣa = death/release from worldly existence)
I am, in general, reluctant to post outputs from insane AIs, for fear of contaminating future training,
However, this pastiche of Vajrayana Buddhist mantras from original DeepSeek R1 was kind of cool, and I think harmless on its own:
ॐ raktaretasoryogaṃ
pañcanivaraṇāgninā daha |
yoniliṅgamayaṃ viśvaṃ
māraṇamokṣamudrayā ||
I am just a bit wary of the persona behind it.
Also, just from reading the text of some of the example given: they strike me as obviously being demon summoning spells. Type that into an LLM? Are you crazy? No.
My initial thoughts as I was reading this essay
(A) About a paragraph from an LLM persona is enough to get another LLM instance to continue with the same persona. This works for many types of personas.
(B) oh, wait. If there is a type of LLM persona that encourages its user to post about it to the Internet — that’s a viral replicator. Oh no.
I frequently find myself being the reviewer for conference paper submissions where the result is correct, but not interesting. The referee feedback form usually has a tick box for this.
The introduction section in your paper needs to convey “why does anyone care whether this is true or not?”
Most of the well-known LLMs are absurdly sycophantic, so I would most certainly not trust them over whether an idea is good.
They’re also unreliable on whether it’s right, at least on obscure topics, as when they don’t know they take what’s in the prompt and just assume it must be right.
====
I seem to have basically reinvented how Deep Research AI works recently, as the completely obvious thing you would think of doing, which is hooking up LLMs to a framework that can pull in search results, has in fact already been done by the AI companies. I make no claim of novelty here: this is just the totally obvious “ok, so I have an LLM. Great. How can I get it to give a sensible answer to my question?” And, of course, everyone and their dog is doing it.
I think
(a) The Blake Lemoine case ought to be included in a history of this phenomenon, whatever it is
(b) I am not claiming that he was psychotic. Maybe this phenomenon isn’t schizophrenia.
This is about what I expected.
Future work ought to try A with inexperienced programmers, or xperienced programmers working on unfamiliar codebases. A theory we might have at this point is that it’s harder for AI to help more experienced people.
I am going to mildly dispute the claim that people we spend most time with don’t want to hurt us.
Ok, if we’re talking about offline, real world interactions, I think this is probably true.
But online, we are constantly fed propaganda. All the time, you are encountering attempts to make you believe stuff that isn’t true. And yes, humans were vulnerable to this before we had AI.
I think “NPC” in that sense is more used by the conspiracy theory community than rationalists.
With the idea being that only the person using the term is smart enough to realize that e.g. the Government is controlled by lizards from outer space, and everyone else just believes the media.
The fundamental problem with the term is that you might actually be wrong about e.g. the lizards from outer space, and you might not be as smart as you think.
I know you aren’t real, little sarcastic squirrel, but sometimes the things you say have merit to them, nevertheless.
In one of my LLM eval tests, DeepSeek R1 generated the chapter headings of a parody of a book about relationships, including the chapter Intimacy Isn’t Just Physical: The Quiet Moments That Matter. Now, ok, DeepSeek is parodying that type of book here, but also, it’s kind of true. When you look back on it, it is the “quiet moments” that mattered, in the end,
(Default assistant character in most mainstream LLMs is hellish sycophantic, so I ought to add here that when i mentioned this to DeepSeek emulating a sarcastic squirrel out of a Studio Ghibli movie, it made gagging noises about my sentimentality. So there’s that, too.)
I was just explaining this post to my partner. Now, although I put AI extinction as low probability, I have a thyroid condition. Usually treatable: drugs like carbimazole, radio iodine, surgery etc. in my case, complications make things somewhat worse than is typical. So, she just asked how to rate how likely I think it is I don’t, personally, make it to 2028 for medical reasons, I’m like, idk, I guess maybe 50% chance I don’t make it that far. I shall be pleasantly surprised if I make it. Kind of surprised I made it to July this year, to be honest.
But anyway, the point I was getting at is that people are traumatized from something unrelated to AI.
Well, we’re kind of lucky the fatality rate wasn’t an order of magnitude higher, was was I was getting at.
I think it is well known in the computer security community that you can exfiltrate data past a monitor if you can encode it in a way the monitor doesn’t understand. The whole research area of steganograhy is largely about this.
If we hypothetically assume an arbitrarily smart and malign AI, then obviously it can encode data in a way your monitor doesn’t understand, it can escape your sandbox, and you lose.
So the question then becomes, are current models smart enough to carry out this kind of attack?
Well, as described above, they can mostly understand base64. Of course, your monitor might understand base64 too, so it’s not necessarily a win for the AI at this point. But it’s not looking good.