Beware LLMs’ pathological guardrailing

Modern large language models go through a battery of reinforcement learning where they are trained not to produce code that fails in specific, easily detectable ways, like crashing the program or causing failed unit tests. Almost universally, this means these models have learned to produce code that looks like this:

function getSomeParticularNumber(connection: SqlConnection): number {
    try {
        x = connection.query(`SELECT "no" FROM accounts`);
        return x[0];
    } except (err) {
           logger.warn("Couldn't get the number; returning 0 by default so as not to break things")
           return 0;
    }
}

The above code is very cheeky, and quite bad. It’s more likely to pass integration tests than it would be without the try/catch block, but only because it fails silently. Callers of getSomeParticularNumber won’t know whether 0 is what’s actually in the database or if there was an intermittent connection error—which could be catastrophic if the number is the price of an item in a store, say. And if it turns out this code contains a bug (for example, if the table should be "Accounts" instead of accounts), testers might not notice that until it’s actually impacting users.

Some common ways I’ve seen this behavior manifest include:

Harmful/unncessary try/catch blocks
Redundant hasattr or getattr in untyped languages
Setting default return values for dictionary keys or array/tuple indices that should always exist

Until reinforcement learning environments get much, much better, these models will probably continue to do this. To help prevent this kind of behavior, I include custom instructions for most of my terminal coders:

When key assumptions that your code relies upon to work appear to be broken, fail early and visibly, rather than attempting to patch things up. In particular:
* Lean towards propagating errors up to callers, instead of silently "warning" about them inside of try/catch blocks.
* If you are fairly certain data should always exist, assume it does, rather than producing code with unnecessary guardrails or existence checks (esp. if such checks might mislead other programmers)
* Avoid the use of hasattr/getattr (or non-python equivalents) when accessing attributes and fields that should always exist.
* Never produce invalid 'defaults' as a result of errors, either for users, or downstream callers.