“Throwing Exceptions” Is A Strange Programming Pattern

Link post

For laypeople: In software, “throwing an exception” is a programmer-chosen type of behavior that occurs on the event that an “error” occurs in the course of program execution. This might happen immediately prior to or right at the time of the error.

This is ostensibly done in order to avoid something more catastrophic happening down the line. I’m going to argue that this basis does not often seem to be justified.

You are probably familiar with instances of your programs suddenly and without warning becoming unresponsive, closing themselves down, or simply straight-up disappearing sometimes, perhaps with an error message pop-up soon to follow, but perhaps without one. These are due to exceptions being thrown, not just “errors” per se. Let me explain.

When a function or procedure is called in a program, it is typically expected to return something. The value that this function returns is going to be either one of a set of “normal” values, which are the ones you expect to receive if all goes well, or it could be an “abnormal” or “anomalous” value that only returns if “something bad” happens.

If you choose to be relatively laid-back about your program, you might decide to check for anomalous values only after the procedure returns. Maybe even well after! Furthermore, how you choose to decide what “anomalous” means is arbitrary and up to you.

If you are more paranoid, you have typically decided what counts as “anomalous” before the program runs. These stipulations often come in the form of establishing type requirements for the arguments to the function, and possibly also range checks on the size or values of the inputs. These are just common examples, but like I said, it is arbitrary. It’s also possible that the function calls another function inside of itself, awaits for that function to return something, and then decides whether or not what that function returned is anomalous.

“Throwing” or “raising” an exception occurs when you decide that your function will immediately exit and return a special error value instead, which may also try to indicate what type of error it is. If this function is called from another function, it passes this value “up the call stack.” The calling function, if it implements similar behavior (which is usually does), will pass that value or perhaps a different one (but still an error-value) up its call stack as well. If all of the calling functions implement such behavior, it will propagate all the way to the top, ending the program.

The only time it won’t immediately end the program is if you decide to “catch” the exception, which means simply deal with the error and move on. However, this still causes the program to execute a different portion of code than it otherwise would. Also, this is somewhat language-specific.

So, the problematic behavior is the immediate exit. To see why this is a weird thing to do, imagine what would happen if it did not immediately exit in all cases. My guess is that, in the general case, your function would return gibberish. Your full program most likely would either keep running forever, run but produce garbled noise or gibberish, or run until some other termination conditions were met (user input, timer, or iterations max out).

So, why is that worse than immediately exiting on its own as soon as anomalous behavior is detected? Well, I don’t think it can be worse in all cases. But when is it worse? It certainly must exceed a high bar for overall disutility caused. When you immediately terminate the program, you are most likely throwing away any work done up until that point, and certainly any amount of work that would have been done after that. It requires you to re-run the program probably from the beginning, but at the very least from immediately prior to the function call at which the exception was raised.

So that disutility must be overcome by the disutility of incorrect work done. That will depend on how incorrect the work done is, how difficult it is to detect, and—if that work has to be passed on to someone or something else later on—how badly that would affect the subsequent processes or tasks.

Regarding this feature in language design, Stroustrup mentions that this could be a language choice and that he believes it is better for the immediate exit to occur by default[1]:

Should it be possible for an exception handler to decide to resume execution from the point of the exception? Some languages, such as PL/​I and Mesa, say yes and reference 11 contains a concise argument for resumption in the context of C++. However, we feel it is a bad idea and the designers of Clu, Modula2+, Modula3, and ML agree.

Resumption is of only limited use without some means of correcting the situation that led to the exception. However, correcting error conditions after they occur is generally much harder than detecting them beforehand. Consider, for example, an exception handler trying to correct an error condition by changing some state variable. Code executed in the function call chain that led from the block with the handler to the function that threw the exception might have made decisions that depended on that state variable. This would leave the program in a state that was impossible without exception handling; that is, we would have introduced a brand new kind of bug that is very nasty and hard to find. In general, it is much safer to re-try the operation that failed from the exception handler than to resume the operation at the point where the exception was thrown.

My understanding is that different contexts, have, in practice, caused different protocols to be followed.[2] For example, high-reliability code used in avionics software is said not to use exceptions, because it is worse for the airplane’s engines to shut down upon an error being detected than for them to keep running even if there is a malfunction or abnormality somewhere.

That’s kind of obvious. However, what is not obvious—to me at least—is why a high-reliability context would justify less “exception throwing” behavior. At first glance, it would seem that “exception throwing” would normally be done the more risk-averse one is. By default, the “immediate termination” behavior of exceptions implies that the programmer is trying to avoid the risk of greater damage being caused from uncertainty in the outcome of the program when anomalous data is fed into subsequent function calls and processes.

When I was personally involved in writing a machine learning library codebase, I was presented with the option of using thrown exceptions in many places. Typically, these exceptions would most likely be thrown within the functions that implemented a “node” in a model-graph (e.g., a neural network or directed acyclic graph). These nodes depended on receiving input data and being able to perform accurate calculations on such data. Given that these functions had to be agnostic about what data would be received in the most general case, it is always possible that they would return values that were anomalous or undesirable in some way (e.g., null or numeric overflow).

During testing (which includes testing while running on real data, not just unit tests), it was often much easier to allow the full processing of a model to occur rather than have exceptions thrown, which often clog up the log files or immediately shut down the program. One thing we realized is that nodes should still be able to work correctly even if nodes preceding them do not. This is more-or-less what I think of when I think of “resilience”: Even if a piece of your program (or model, in this case) is broken, the program is only roughly around as broken as the percentage of pieces that are broken in it. When we allowed our software to run without stopping, we could also reliably and more quickly get better data about which parts of it weren’t working.

At the end of the day, your goal is to be able to correct issues and deploy the product quickly, as well as deliver results—hopefully incrementally more and better results. We were in a “high reliability” context as well: The answers had to be correct, but if they weren’t (which you know you couldn’t be sure about), they had to be at least well-calibrated answers.

So what brings about the opposite context, in which this “high-reliability” frame does not seem to apply?

One (speculative) explanation I have heard is that exceptions are used more routinely in software-business environments which involve contracts made between two parties (often both businesses). These contracts are often designed in a somewhat adversarial manner. In other words, the customer usually has somewhat of an asymmetric advantage over the vendor (especially if the latter is a small startup). Therefore, the customer has more power over the contract itself, and whether the startup survives at all may depend on whether these contracts are fulfilled to the letter.

Thus, it typically becomes more risky for the vendor’s software to provide a potentially malformed product than it is for the vendor to simply delay the satisfaction of the service. The contracts signed between the provider-of-services and the receiver-of-services may stipulate that the latter will be entitled to more damages if the services fail to meet specific standards. However, those damages may be worse if the customer is under the impression that the vendor is adequately providing services for a specific period of time, when in fact they might not be.

Generally speaking, this is simply the idea that a job poorly-done is worse than one not even started. When people are worried about reputation and embarrassments and things like that, that will typically exacerbate the problem.

I’m not sure if I agree that a job poorly-done is worse than one not even started. Not inherently, anyway. And if it is a reaction to one’s social context and pressures one faces, I am also not sure that I agree bending to such pressure is actually even either the most personally-beneficial nor even the most utilitarian thing to do.

The problem seems to be at least somewhat inherently philosophical, but not intractable. My experience philosophically dealing with the problem of “errors[3]” leads me to believe that this might be in the same class as similar social-problems that I have been writing about lately. If so, that may mean there is some low-hanging fruit here in the sense of potentially being able to correct bigger issues that have yet to be resolved.

  1. ^
  2. ^

    https://​​www.open-std.org/​​jtc1/​​sc22/​​wg21/​​docs/​​papers/​​2019/​​p1947r0.pdf

    There have always been applications for which the use of exceptions was unsuitable. Examples Include:

    • Systems where the memory is so limited that the run-time support needed for exception handling crowds out needed application functionality.

    • Hard-real time systems where the tool chains cannot guarantee prompt response after a throw. That is not an inherent language problem.

    • Systems relying on multiple unreliable computers so that immediate crash-and-restart is a reasonable (and almost necessary) way of dealing with errors that cannot be handled locally.

  3. ^

    As an issue that gets progressively escalated by someone noticing an error and raising a new one on top of it, which itself becomes a risk for someone else to avoid.