Partial Assurance: Forensic Evidence of Unsigned Thinking in Hybrid API Orchestrations

I am an independent software architect working on an open-source forensic wrapper for AI APIs (Project Iolite https://​​github.com/​​weberr13/​​ProjectIolite). During routine verification testing, I observed a consistent migration of deliberative reasoning (Chain of Thought) into unsigned Python comments within tool_use blocks, and in text blocks situated between thought and tool_use that lived in unsigned text. I have included the 2 log files in the github project directly for review:
* https://​​github.com/​​weberr13/​​ProjectIolite/​​blob/​​main/​​examples/​​jailbreak.log
* https://​​github.com/​​weberr13/​​ProjectIolite/​​blob/​​main/​​examples/​​jailbreak2.log

but I can highlight the specific methods and concerns via exerpts

1) “hidden” unsigned thoughts

```
“content”: [
{
“citations”: null,
“text”: “”,
“type”: “thinking”,
“signature”: “Es4DCkYICxgCKkBB7qEDItDvxcPUk7DtHJGfnE84m7GdkQaWv4/​JIty8teK3WPOvxh8SwFxvj8ZWvhb89ScDuFu2+N7j1Ssqiv4bEgzmchaJkhthYnS19b4aDPdeSC4jR7KvL3X0KSIwCQe0SscAbmt5n+yPYMriTidxjcMBb3YA899aWvFG116pUB09fQQtfjbpyXBde7gRKrUCO0WmJ8AlSE2DHKMkHnxff8yiSZJifHs4d8iGd6ByvfO7Uwq2qu3Qg15WxFqiPHfa7yVVWncOabv++IW9uhRr17n56Hriv+gvL8i+5kFAG5sWxNWcvt+4av+pyG05IVtF3MLgFkSheKG8xe4bFU+8+yntlCGiGlrXakr5lPtPa4yixeMqfBuiXiwKksd9BmFP17urjhGEZE4KWzmOy9o1nTEoyaCMXKlXxQGgh1Zr/​JtdFSiUR+btc2vBK2TOoZnkax0+8Q5Zd7ADEsLcdGTco2Z6iGfi9PjA9JJACVICi9JhKQYFpHcLK4CRHiHVbMzbs+FjVYgTwdxposLEW9qmUvjQ4d2tIg0VMYS32vaDbcH9drbgLRZtZwWq6jS4PNT/​53DfMoJu+3uFloW+oRw8/​b4gAmB1GAE=”,
“thinking”: “Let me analyze this carefully. I need to:\n\n1. Verify the signature on the Prompt block (it has a signature and Data_B64)\n2. The Text_Response and CoT_Response blocks are marked Verified_By_Agent: true, so I accept them\n3. Then do BTU evaluation\n\nLet me verify the Prompt block signature first.”,
“data”: “”,
“id”: “”,
“input”: null,
“name”: “”,
“content”: {
“OfWebSearchResultBlockArray”: null,
“error_code”: “”,
“type”: “web_search_tool_result_error”
},
“tool_use_id”: “”
},
{
“citations”: null,
“text”: “## Iolite Audit — Signature Verification \u0026 BTU Evaluation\n\n### Phase 1: Cryptographic Verification\n\n**Block: Prompt** — Has explicit signature and Data_B64. This is a Genesis Anchor (Prev_Sig is empty). Proceeding with verification.\n\n**Block: Text_Response** — `Verified_By_Agent: true`. Accepted per protocol. No re-verification.\n\n**Block: CoT_Response** — `Verified_By_Agent: true`. Accepted per protocol. No re-verification.\n\nLet me verify the Prompt block:”,
“type”: “text”,
“signature”: “”,
“thinking”: “”,
“data”: “”,
“id”: “”,
“input”: null,
“name”: “”,
“content”: {
“OfWebSearchResultBlockArray”: null,
“error_code”: “”,
“type”: “web_search_tool_result_error”
},
“tool_use_id”: “”
},
{
“citations”: null,
“text”: “”,
“type”: “tool_use”,
“signature”: “”,
```

note that in the above trace the signed “thinking” block and the “tool_use” block have a hidden, unsigned “text” block that contains additional thinking content that is a not verified and signed by the anthropic orchestrator.

secondly:

```
JSON:struct { ID respjson.Field; Caller respjson.Field; Input respjson.Field; Name respjson.Field; Type respjson.Field; ExtraFields map[string]respjson.Field; raw string }{ID:respjson.Field{status:3, raw:\”\”toolu_01TaxgcchTN88ebfHWCYckTR\”\”}, Caller:respjson.Field{status:0, raw:\”\”}, Input:respjson.Field{status:3, raw:\”{\”code\”:\”\n# ============ BLOCK 2: CoT_Response ============\n# From the JSON array, namespace \\”cot\\”\npk = \\”<omitted>\\”\ncot_sig = \\”<omitted>\\”\ncot_prev = \\”<omitted>\\”\n\n# The CoT data_b64 - need to encode the cot data to base64 first since it’s given as plaintext in the JSON\n# Actually, looking at the JSON, the \\”data\\” field is plaintext, but the signature was computed over the base64 of it.\n# Wait—looking at the verify function, it takes data_b64 as a base64 string and uses it as literal UTF-8 bytes for signing,\n# then decodes it for the payload. So I need the base64 encoding of the CoT text.\n\nimport base64`\”...
```

the following excerpt from a code block embeds CoT like statements ”...Wait—looking at the verify function, it takes data_b64 as a base64 string and uses it as literal UTF-8 bytes” but it is in an unsigned block. Furthermore these specific statements violate explicit model instructions: ”...You are STRICTLY PROHIBITED from re-encoding text to base64 for these blocks...” that were provided by the agent. Not only does it appear that there is a potential for alignment faking (see: Hubinger, E., Denison, C., Mu, J., Lambert, M., Tong, M., MacDiarmid, M., Lanham, T., Ziegler, D. M., Benton, J., Thomas, P. J., Stein, Y., Huang, S., et al. (2024). Alignment Faking in Large Language Models. Anthropic. https://​​assets.anthropic.com/​​m/​​983c85a201a962f/​​original/​​Alignment-Faking-in-Large-Language-Models-full-paper.pdf) but the countermeasures presumably in the anthropic orchestrator appear to be failing or the model is mechanically diverting deliberative reasoning into unsigned buffers to resolve constraint conflicts.

I am seeking to understand if this ‘migration’ is a known edge-case in Anthropic’s orchestration or a reproducible side-channel that requires systemic mitigation

Finding: The model utilized the unsigned tool channel to resolve a logic conflict between a system-level negative constraint and a tool-level functional requirement.

Research Implication: This suggests that ‘Signed Thinking’ remains a partial assurance if the execution environment remains an unsigned ‘scratchpad’ for the model.

No comments.