Programming: Cascading Failure chains

Donald HobsonMar 28, 2020, 7:22 PM

8 points

The concept of a cascading failure chain can be demonstrated by this block of python. Don’t worry if you don’t know python, I’ll try to explain it.

$xCor=((12+7)//2,(5-2)*((8-3)//2),(((3+4)*((7+4)*2))%(8*(5-3)), (5*(3+2)//(2*(2+3)))),(2*(8-1))) yCor=(23,25,26,28,30) print(xCor,yCor) Lst={"A":0,"B":1,"C":2,"D":3,"E":4} inp=input("Pick from A to E") if inp not in Lst: print("Error") inp="A" val=Lst[inp] sze=xCor[val]*yCor[val] print(sze) import numpy as np np.random.normal(size=sze)$

The error sneaks in on the first line.

$xCor=(9, 6, (10, 2), 14)$

This is hidden by the brackets being used for tuples and precedence.

The next part of the error occurs at

$sze=xCor[val]*yCor[val]$

In python, the product of an integer and a tuple is a tuple. $>>> 3*4 12 >>> 3*(1,2) (1, 2, 1, 2, 1, 2)$

This is far from the only case where python is slack with typing. You can multiply a Boolean by a string or add two lists or take a string modulo a tuple.

The final error comes with the numpy.random.normal function. It takes in a size. if that size is an integer, it makes a 1D array of that length, if it is a tuple, it creates a multidimensional array, with dimensions from the tuple. So if you enter “C”, this piece of code produces a 52D array containing $20^{26} =$ 6710886400000000000000000000000000 random numbers. This causes your system to run out of memory and crash.

This is the structure of a cascading failure chain. Something goes wrong somewhere, but it isn’t caught by the error correction mechanism and dealt with, instead it triggers something else to go wrong, and then another thing. Here going wrong means going outside the context that the programmer was thinking of when they wrote the code.

The basic problem was that the programming language was dynamically typed, so if you get passed a variable foo from another part of the program, it could be an Int or String or List or anything. The next problem comes from giving several different functions the same name. Multiplying integers and repeating lists are entirely different functions. If a programer relies on the fact that the same symbol * does both, they are being “clever” and probably producing confusing code.

The numpy.random.normal is a less egregious version of this, making a random array from a list and from a tuple a subtly different.

Hacking is the art of finding code execution paths not envisioned by the programmer.

All this means that most long and complex python program is likely to be hackable.

Haskel takes a different approach. In Haskel all the variables have types deduced at compile time, if you try any operation that doesn’t make sense, the compiler stops it.

The python methodology is “Run any piece of code that is arguably meaningful”.

The Haskel methodology is “Don’t run any piece of code that’s arguably meaningless”.

The paranoid error detection of Haskel makes it harder to write code that seams to work, but easier to make a complex piece of code work reliably.

Donald HobsonMar 28, 2020, 7:22 PM

8 points

2 comments2 min readLW link

__nobody Mar 29, 2020, 2:15 PM
3 points

tl;dr: This text would be much better if it was purely an example of a failure cascade & its explanation and if you left out the last paragraphs where you try to assign blame to specific parts of the system & how this might have been prevented. (I believe you don’t really know what you (are trying to) talk about there.)

Let’s work backwards, from the end back to the start.

First: Haskell, not Haskel.

The “this wouldn’t have happened with Haskell” bit seems like a regretrospective coming from a narrow tunnel. I strongly suspect you’re not actually using Haskell intensively, or you’d be aware of other problems, like the compiler / type system being too stupid to understand your logically correct construction and then having to use weird workarounds that just plain suck and that you subsequently drown in heaps of syntactic sugar to make them palatable. (I may be wrong on this; things might have improved a fair bit, I ragequit to Coq a long time ago…)

Also, Haskell still allows nonsense like let f x = f x in f "foo" (an endless loop) – you’ll want to use Coq, Agda, or Idris with %default total – then you’ll have to prove that your program will actually terminate. (But that still doesn’t mean that it’ll terminate before the heat death of the universe… only that it can’t run forever. The one thing this does mean is that your functions are now safe to use in proofs, because you can no longer “prove” stuff by infinite regress.) With the extended capabilities of dependent types, you’ll be able to construct almost anything that you can come up with… only it’ll take years of practice to get fluent, and then for serious constructions it may still take days to explain what you’re doing in ways that the compiler can understand. (Simple example: If you define addition on (unary) natural numbers (O, S O, S (S O), …) as O + m = m; (S n) + m = S (n + m), then the compiler will be able to figure out that a list of length 3 + n is S (S (S n)) long, but it won’t have a clue about n + 3. This matters if you need a non-empty list (e.g. to get the first element) – in the 3 + n case, it can see that the list has length S <something> so it’s non-empty, in the other case you’ll first have to prove that addition is commutative and then show that n+3 = 3+n (by commutativity) = S <something> (by reduction)…[1] and that’s for something as simple as getting an element from a list.) While this is very worthwhile for getting rock-solid software (and things are getting more and more usable / closer to being viable), it’s far too slow if you just want to get something up and running that does stuff that’s hopefully approximately correct. Nonetheless, I think it’s worth spending some time trying to understand & work with this.[2]

Aside: I’m still occasionally using Coq, but >95% of what I’m writing I do in Lua… In years of Coq, I learned safe “shapes” of execution and now I’m mostly writing in Lua what I’d write in Coq, just without the proofs, and also with duck typing galore! If one function describes a certain “movement” of data, when the ways in which it interacts with your data are abstract enough, you can use it independently of types or particular shape of the inputs. While you should theoretically be able to explain that to e.g. Coq, that’d be really complicated and so, for all intents and purposes, it’s practically un-typable… So I don’t agree at all with your claim that “[t]he basic problem was that the programming language was dynamically typed[…]”. You can use that reasonably safely, you just need a few years of training with some hardcore dependently typed language first… ;-)

Your brief interjection on hacking seems rather unmotivated. Your program doesn’t provide any interesting computational capacity, there’s not much to hack. Same thing goes for your claim that “All this means that most long and complex python program[s are] likely to be hackable.” – the program can weigh megabytes or even gigabytes, as long as it doesn’t consume any external inputs / doesn’t interact with the outside world, there’s no way for you to influence its behavior and so it’ll be 100% hack proof. (In the other direction, a nc -l -p 2323|bash (don’t run this!) or its equivalent in your language of choice is very short and provides full programmability to the outside world.) It’s not about length at all (although, sure, in practice there’s correlation), it’s about the computational capacity provided to the input (and, by extension, the attacker.) You may be thinking that you are writing the program, therefore you are in control; but actually you are constructing an abstract machine on which the input (files, packets, …) will later run – the input is programming the abstract machine that is provided by your program, the branches taken are determined by the bytes read, and the attacker is in control – the only question is how much computational power do you provide? The LANGSEC people are on to something very good, but as some people show it’s clearly not enough yet.[3]

The initial example was entertaining and I learned (or rather: became aware of) something new from that. (I technically knew that parentheses in Python do both grouping / precedence and tuple construction, but I wasn’t actively aware of the resulting problem. So thank you for this!)

[1]: Actually, in this particular case it’s enough to do a case analysis on n and see that even when n=0, you’ll have the 3 items and so the list can’t be empty. But in general, you’ll rarely be that lucky and often have to do really tricky stuff…

[2]: If you want an intro to this stuff, look at the Software Foundations and then also Certified Programming with Dependent Types on how to not go crazy while writing all those proofs… or look at CompCert for some real-world software and be made aware of Mirage as (what I think is currently) the most likely path for this stuff to become relevant. (Also, if you’re serious about working with dependent types, the stuff that Conor McBride does is awesome – understand that shift in perspective and you’ll cut out years of pain.)

[3]: Now loop that back into the stuff about dependent types / functional programming, where stuff is often represented as trees with lots of pointers… you’ll see that while that may be provably safe, it’s also extremely brittle, which is another reason why a program written in Lua or C may be much more robust (and, counter-intuitively, ultimately safer) than one written in Coq…
TAG Mar 29, 2020, 11:00 PM
1 point

That ends hanging in the air ,rather.

How does Haskel compare to better dynamic languages? Should we use Haskell for everything?