Engineer Thoughts Not Prompts

Over the past few weeks I’ve finally begun to clarify my thoughts on AI, building them into a small project I hope to continue working on for the next few months. When I had my first engaging conversation with a chat-bot based on a transformer architecture I was amazed not by the content of its language but at its fluency. The fact a machine could use language at all was what struck me. Yes it was hallucinating wildly, but the hallucinations were fluent. A few months later came the wave of agentic AI, the floodgates burst as humanity finally produced machines which could generate both natural and formal language. This is where we got carried away in my opinion.

Gradient descent and data based machine learning is genuinely impressive but inefficient at scale, and overeliance on the technology is an exercise in ignoring history. We’re all aware of the inefficiencies of current AI, so lets focus on the latter argument. We’ve been through several iterations (attempts?) at creating AI systems. For example, the mid-century scientific interest in cybernetic control. A few decades later followed the ‘expert systems’ and later research in neurosymbolic AI began. I ask: why aren’t we utilizing these interpretable, efficient, and controllable AI systems right now? Entropy is a known failure signal(https://​​arxiv.org/​​pdf/​​2510.21623,https://​​arxiv.org/​​html/​​2512.12381v1) and there’s already work done on inference time control systems for LLMs (https://​​arxiv.org/​​abs/​​2306.03341). The problem of real-time feedback control a.k.a. cybernetics utilizing entropy seems, at least to me, a lot smaller than the current approach of getting enough data, training the model, testing for problems, then hoping.

This feedback system would be part of a larger picture I’m trying to paint. The current approach to AI isn’t just inefficient, its fundamentally backwards. LLMs are like the language center in the human brain: signifigant but nowhere near the full picture. We can build effective systems around the LLM, effectively creating understandable patterns of thought. We have created machines that can process and generate language and then we told those machines to think and use tools. We did not design machines to think in order to utilize language and tools. This distinction is important. What I propose is we first layout the dynamics of thought, and then use LLMs to map these dynamics to language.

Consider FUDGE. (https://​​aclanthology.org/​​2021.naacl-main.276.pdf) FUDGE is a technique of run-time token stream modification which can be used to control the character of language produced by an LLM. It’s not perfect (you need to train small models to do the interpretation and control on your behalf) but its the closest to what I’m getting at that exists in the literature.

What I’m gesturing at is a grander vision for what AI can be. Imagine tinkering with an AI system by managing the direct sequence and pattern of how it thinks and behaves. Thought engineering instead of prompt engineering. Mining effective deduction instead of raw language data scraped from the internet. Qualitative, interpretable, intuitive control.

Here’s what I’m doing to investigate this idea.

The repo is at (https://​​github.com/​​orthogonaltohumanity/​​phrasenudging). It’s early-stage research code for a technique I’m calling Phrase Nudging that qualitatively steers the natural language output of a small LLM through a geometrically principled intervention at decode time.

Phrase Nudging works by mining similar phrases from LLMs and applying clustering analysis in order to find different ‘phrases’. During runtime the Bhattacharyya Coefficient is computed between a sample target distribution corresponding to a phrase and the current distributional output of the LLM. We then interpolate along the hypersphere (SLERP) between these two distributions, sampling a token from a distribution between them. Choosing the cluster controls how the LLM is guided. Scheduling clusters gives you a trajectory of reasoning. Dynamically sequencing the target cluster closes the loop.

In the following example I used two phrase clusters and nudged the output of an LLM towards samples from these clusters at runtime. Notice how the nudged LLM acts similar to the phrases its being pushed towards while still maintaining coherent and relevant speech. The process isn’t perfect however as you will notice a few misplaced tokens in the nudged versions. Cluster 3 has has prominent words such as “don’t” and “wait” and you can see those show up in the nudged token output. Cluster 13 has similar words but notice the theme of verification and information integrity present in its word cloud.

I’m hoping to dedicate the next few months to a deep dive on this subject, learning from my mistakes as I progress toward some better level of understanding. I’d be grateful for any discussion on this topic and I’m eager to learn what others may already know. I’m planning to make this post the first in a series as I develop this project further. So far I have some working code producing interesting results, but there’s much more work to be done and vastly more to learn before this project is near complete.


Note: This is qualitative so far.

Using this model: https://​​huggingface.co/​​deepseek-ai/​​DeepSeek-R1-Distill-Qwen-1.5B

0 Temperature Control

Okay, so I have this problem here: The interior angles of a quadrilateral form an arithmetic sequence. The largest angle is 129 degrees. I need to find the measure of the second largest angle. Hmm, let me think about how to approach this.First, I remember that the sum of the interior angles of a quadrilateral is always 360 degrees. That’s a key point. So, if I can figure out the common difference in the arithmetic sequence, I can find all the angles. An arithmetic sequence is a sequence where each term after the first is obtained by adding a constant difference. So, if I denote the four angles as a, a + d, a + 2d, and a + 3d, where ‘a’ is the smallest angle and ‘d’ is the common difference, then the largest angle is a + 3d, which is given as 129 degrees. So, I can write that equation: a + 3d = 129. That’s one equation.Now, since the sum of all four angles is 360 degrees, I can write another equation: a + (a + d) + (a + 2d) + (a + 3d) = 360.Let me simplify that. Combining like terms: 4a + 6d = 360. So, 4a + 6d =

Nudged Version Towards Cluster 3

image.png

Cluster 3 Word Cloud

Okay, so I full-on problem here is about a quadrilateral where the interior angles form an arithmetic sequence. The largest angle is given as 129 degrees, and I need to find the measure of the second largest angle. Hmm, let me break it down.First off, I remember that runner’s trapezoid. Wait, actually, in quadrilaterals, the sum of the interior angles is 360 degrees, right? Yeah, that’s the basic rule for any polygon – the sum of interior angles is (n-2)*180 degrees, where n is the number of I think. For quadrilaterals, that’s (4-2)*180 = 360. So that’s correct. Now, since the angles form an arithmetic sequence, does that mean each angle increases by a common difference? Yeah, arithmetic sequence is a sequence where each term after the first is obtained by adding a constant difference. So if the largest angle is 129 degrees, the next one should be 129 minus some common difference, and so on until we get back to the smallest angle. Let me denote the four angles. Since they are in an arithmetic sequence, I can denote them as a, a + d, a + 2d, a + 3d. Wait, but hold on, that’s if going forward, but in a quadrilateral, the largest is 129, so should

Nudged Version Towards Cluster 13

image.png

Cluster 13 Word Cloud

Okay, so I’m trying to solve this problem where the interior angles of a quadrilateral form an arithmetic sequence, and the largest angle is 129 degrees. I need to find the measure of the second largest angle. Hmm, let’s see. Alright, let’s break this down step by step. First, I remember that the sum of the interior angles of a quadrilateral is always 360 degrees. That’s a key point. So, if I can figure out how the angles relate to each other based on the arithmetic sequence, I can add them up and set them equal to 36, that is, in degrees. Now, an arithmetic sequence is a sequence where each term increases by a common difference. So, in this case, the angles are in an arithmetic sequence, and since there are four angles (because it’s a quadrilateral), let’s denote them as the first, second, third, and fourth terms. Let me think. Since the largest angle is 129 degrees, that must be the last term in the sequence. Let me label the angles. Let me denote the smallest angle as ‘a’, and the common difference as ‘d’. Since the largest angle is 129 degrees, that would be the fourth term in the sequence, right? Wait, actually, hold on. When you have four terms in an arithmetic sequence with four terms, the terms would be: a, a + d, a + 2

No comments.