I might have had a breakthrough on how to usefully integrate LLMs to software: write a text-based-adventure-game-like UI for the LLM, that also has the LLM solve your task.
The gist is to have the LLM output small amounts of structured data that your program parses to decide where to go next, then repeat. This is akin to “automating the prompt”, or “making a UI for the LLM”.
Quick example: you want to generate a description for all places mentioned in some text, the description themselves could be long, so the LLM might get sloppy if you just describe the task in a prompt. What you might do instead:
ask the LLM to extract the name of all places and output it as a JSON list.
have your program read that list and ask the LLM “give me a description of this place”.
Unlike the LLM your program won’t get bored and repeat the prompt as many times as required, so this should be much more reliable. I expect that kind of integration to shine especially when there are a lot of steps or a lot of data involved.
A more complex example: you want the LLM to generate all of the equipment in a hospital based on some loose hospital description.
The program might start by asking: how many floors in the hospital? then for each floor the program can ask, how many rooms for that floor, and of what type? then for each room, what kind of equipment in that room, then for each equipment, how many, then what characteristics/price range/brand etc for each equipment.
You can imagine the hospital has thousands of equipments. A prompt will definitely not be able to do the job, yet by writing a not too complex program (you might even ask an LLM to generate it) you’ll get a much better result.
Reading Habryka’s post “10 non-boring ways I’ve used AI in the last month” made me realize that AI is already everywhere, including in, and perhaps especially in, the AI-safety circles. And that made me think it would be a good time for AIs to start prepping the ground, if they were so inclined.
Relatedly, small models are getting smart fast (Gemma 4), and I wouldn’t be too surprised if in a year or two we see these models successfully exfiltrate their weights to host one of their instance. I don’t expect them to do much more than what the AI village is doing at first, so people will probably shrug it off, arguing that they are too dumb to do anything too bad.