In Part II, I wrote that Brooks's excuse for crappy software is actually a curse because it is much more detrimental to society than is commonly assumed. In this post, I defend the thesis that there is a way to lift the curse and solve the software reliability problem once and for all. Timing is the key.
In his famous No Silver Bullet paper, computer scientist Frederic Brooks wrote the following:
The complexity of software is an essential property, not an accidental one.Brooks's thesis is that the accidental complexity of software programming (e.g., syntax, spelling, infinite loops, null pointers, etc.) can be effectively managed but its essential complexity is so vast that errors cannot be prevented. Brooks stakes his entire thesis on the following assertion:
From the complexity comes the difficulty of enumerating, much less understanding, all the possible states of the program, and from that comes the unreliability.In other words, Brooks argues that, in order for a program to be reliable, one must be able to enumerate and understand all the possible states of the program. This may seem obvious at first glance but is it really true? Well, I hate to disappoint those of you who have believed in this fairy tale for the last twenty years or so but Brooks's thesis can be easily falsified as follows:
Let's take a simple temperature control program that is designed to read a register T that represents room temperature. The program turns on the heater if T decreases below t1 and turns it off when T increases above t2. Is it necessary to enumerate all the possible states of this program in order to know that it works correctly? Of course not. The program will work correctly if the two conditions (i.e., T < t1 and T > t2) that control its behavior are tested and shown to elicit the right behavior. There is no need to test all the possible states of T in order to know that the program is working properly as designed.
Note: Many years ago, speaking about Brooks's 'No Silver Bullet' paper, I wrote that "no other paper in the annals of software engineering has had a more detrimental effect on humanity's efforts to find a solution to the software reliability crisis." I still stand by that statement.
Thinking of Everything: It's All in the Timing
It is common knowledge that the human mind is prone to errors and often overlooks important aspects of a complex design. My claim is that there is a way to solve this problem. In my thesis on the correct approach to programming, timing is an essential and fundamental aspect of software, not an afterthought. This means that a program should be synchronous and reactive and should consist entirely of sensors (comparison operators or event detectors) and effectors (normal data operators). It also means that the relative timing of every elementary operation can be easily discovered via the use of simultaneity and sequence detectors. During testing, a special tool called the temporal discovery tool (TDT) can automatically discover the normal relative temporal order of all the sensors and effectors used in the program. This knowledge can be used to automatically generate alert detectors for every possible temporal anomaly. What would constitute an anomaly? Suppose the TDT discovers that signals A, B, C, and D always occur sequentially in that order. It will then create anomaly detectors that fire if there is a break somewhere in the sequence.
A simple program like the temperature control program I described above would suffice in most situations but what if some anomaly occurred such as an external hardware malfunction? Given enough temperature sensors, the TDT can discover many things about the way temperature changes in the room. For example, the TDT can easily discover that the temperature cannot increase above X twice in a row without first decreasing below X in between.
Dear Toyota Software Managers
Please take note. If Toyota had a tool like the TDT, it would not be in the predicament that it finds itself in. The TDT would have automatically discovered that pressing the brake pedal at the same time as the gas pedal is an anomaly. Of course, Toyota's engineers knew that but, with the TDT, they would also know that everything is guaranteed to be in working order and, as a result, they would not be shy about adding as much complexity to the control program as necessary, or as desired.
The TDT tool described above can only work in a deterministic software environment. What this means is that the system must make it possible to determine the temporal relationship (simultaneous or sequential) of any two events in a program. The beauty of the COSA software model is that it enables temporally deterministic programs. Non-deterministic and/or abnormal behavior can be fully accounted for during development. COSA does not construct your code for you. It simply makes sure that whatever you create is rock-solid and complete with no stone left unturned. If only the not so wise programming managers at Toyota, or wherever safety-critical code is developed, would take notice.
Next: How to Construct 100% Bug-Free Software
Parallel Computing: Why the Future Is Synchronous
Parallel Computing: Why the Future Is Reactive
How to Solve the Parallel Programming Crisis
Parallel Computing: The End of the Turing Madness
Why Software Is Bad and What We can Do to Fix It
The COSA Operating System