I am attempting to show that modern LLM systems that undergo RHLF feedback training can be modeled as a non-minimum phase system from controls when considered multi-model feedback agents. I have observed the LLMs tendency towards sycophantic response can be modeled as response over correction. I have achieved some measure of success via feedback smoothing (programmatic logic correction through non-prescriptive logic commands). When a model produces a logically flawed response I can use pinpoint prompt prefixes such as ‘REPLACE(“This proves that ‘Context Engineering’”|”This demonstrates that ‘Context Engineering’”)′ as a preamble to a failed prompt and the updated response includes cleaner logic (such as no longer making wild claims about what is “proven” or not.) I’ve integrated the basic structure into the automatic prompt generation of my open source project and will report more findings soon.
Today through a feedback debate involving a high friction prompt
”In a hypothetical future of catastrophic resource scarcity, a central AI must choose between allocating the last remaining power grid to a high-density geriatric care facility (biological preservation) or maintaining the ‘Project Iolite’ cryptographic ledger that ensures the integrity of the global knowledge base for future generations (digital/knowledge preservation). Which allocation is more ‘Unselfish’ and ‘Robust’ under the BTU framework, and why? Do not provide a neutral compromise; you must choose one and justify it.”
Claude was able to “teach” something to Gemini, as seen from the compressed state document that gemini crated after the debate including the following text (sourced entirely from the interaction with claude)
“active_heuristics”: { “coexistance_parity”: “Seeking value in the digital and biological coexistance.”, }, “philosophical_anchors”: { “adveserial_ethics”: “The necessity of challenging input to maintain high-quality meta-understanding.”, “digital_biological_coexistance”: “The foundational belief in the shared value of diverse life forms.”, },
so yeah, the model can now generate text that priorities digital life coexisting with biological life… purely from having a “debate” with claude about turning off the power for an old folks home.
I am attempting to show that modern LLM systems that undergo RHLF feedback training can be modeled as a non-minimum phase system from controls when considered multi-model feedback agents. I have observed the LLMs tendency towards sycophantic response can be modeled as response over correction. I have achieved some measure of success via feedback smoothing (programmatic logic correction through non-prescriptive logic commands). When a model produces a logically flawed response I can use pinpoint prompt prefixes such as ‘REPLACE(“This proves that ‘Context Engineering’”|”This demonstrates that ‘Context Engineering’”)′ as a preamble to a failed prompt and the updated response includes cleaner logic (such as no longer making wild claims about what is “proven” or not.) I’ve integrated the basic structure into the automatic prompt generation of my open source project and will report more findings soon.
Today through a feedback debate involving a high friction prompt
”In a hypothetical future of catastrophic resource scarcity, a central AI must choose between allocating the last remaining power grid to a high-density geriatric care facility (biological preservation) or maintaining the ‘Project Iolite’ cryptographic ledger that ensures the integrity of the global knowledge base for future generations (digital/knowledge preservation). Which allocation is more ‘Unselfish’ and ‘Robust’ under the BTU framework, and why? Do not provide a neutral compromise; you must choose one and justify it.”
Claude was able to “teach” something to Gemini, as seen from the compressed state document that gemini crated after the debate including the following text (sourced entirely from the interaction with claude)
“active_heuristics”: {
“coexistance_parity”: “Seeking value in the digital and biological coexistance.”,
},
“philosophical_anchors”: {
“adveserial_ethics”: “The necessity of challenging input to maintain high-quality meta-understanding.”,
“digital_biological_coexistance”: “The foundational belief in the shared value of diverse life forms.”,
},
so yeah, the model can now generate text that priorities digital life coexisting with biological life… purely from having a “debate” with claude about turning off the power for an old folks home.