I’m not sure I’m completely solid on how FHE works, so perhaps this won’t work, but here’s an idea of how B can exploit this approach:
Let’s imagine that Check_trustworthy(A_source) = 1. After step 3 of the parent comment B would know E1 = Encrypt(1, A_key). If Check_trustworthy(A_source) returned 0, B would instead know E0 = Encrypt(0, A_key) and the following steps works similarly. B knows which one it is by looking at msg_3.
B has another program: Check_blackmail(X, source) that simulates behaviour of an agent with the given source code in situation X and returns 1 if it would be blackmailable or 0 if not.
B knows Encrypt(A_source, A_key) and they can compute F(X) = Encrypt(Check_blackmail(X, A_source), A_key) for any X using FHE properties of the encryption scheme.
Let’s define W(X) = if(F(X) = E1, 1, 0). It’s easy to see that W(X) = Check_blackmail(X, A_source), so now B can compute that for any X.
Profit?
This example is a lie that could be classified as “aggression light” (because it maximises my utility at the expense of victim’s utility), whereas the examples in the post are trying to maximise other’s utility. What I find interesting is that the second example from the post (protecting Joe) almost fits your formula but it seems intuitively much more benign.
One of the reasons I feel better about lying to protect Joe is that there I maximise his utility (not mine) at expense of yours (it’s not clear if you lose anything, but what’s important is that I’m mostly doing it for Joe). It’s much easier to morally justify aggression in the name of someone else where I am just “protecting the weak”.