Of the four architectural principles discussed here, apoptosis or programmed suicide seems most foreign to the world of computing. But apoptosis-like mechanisms are already used in some mission critical military or avionics systems (e.g., the space shuttle or fly-by-wire controls for jets). Redundant computers, working in parallel, monitor each other. If one computer goes astray, it is shut down. In critical corporate applications and databases we have hot backup systems that detect an imminent breakdown, disable the failing system, and switch to a backup. Those examples are characterized by a strong awareness that the larger system must survive.
In less mission-critical situations, our old “single-cell” attitudes tend to dominate. We strive to make each individual computer as robust as possible and assume that the robustness of the system is somehow the sum of the robustness of its parts. Of course, when put that way, most of us will recognize that a better metaphor is a chain that is no more robust than its weakest link. Still, the idea of cutting out the weakest link seems foreign to most programmers. We would rather redouble our efforts to strengthen every link. That reflects a fundamental misconception about our ability to control complexity. In multicellular computing systems, we seldom have control of every participating computer, let alone have the ability to strengthen every one of them.
Apoptosis mechanisms evolved along with multicellular life and multicellular life could not have evolved without apoptosis. Multicellular organisms assumed, and exploited, the fact that all cells are expendable. Multicellular computing systems, in contrast, evolved without assuming cell suicide. Instead, our default assumption was, and still is, that each and every computer is valuable and must be protected. That was acceptable in early multicellular computing systems, e.g., client-server systems, because they did not have to face today's onslaught of viruses and worms. Now, computers infected with viruses and worms are a clear danger to all other computers accessable on the network by the infected computer.
As the value of each and every computer has lessened and the threat of viruses and worms has increased, our attitudes must adjust to the new reality. If we are to make use of the example set by biological evolution, we should architect our computing systems so that we can sacrifice any individual machine in order to protect the larger organism.
As we contemplate the adoption of a computing equivalent of cell suicide we must first recognize that the digital world differs in several important ways from the biological world:
The lesson from apoptosis tells us that the first level defense should be the individual computer, especially those attached at the perimeter of the network. They should be able to recognize their own anti-social behavior and detach themselves from the network. The second stage should be based on specialized computers designed to recognize infected computers that, for one reason or another, have not triggered their own suicide mechanism. The second stage system then tells the infected or otherwise wayward machine to ostracize itself. Modern anti-virus detection could serve in both roles. But we must also consider the tradeoffs in the amount of CPU and/or disk access power we want to expend detecting infections. As with biological systems, multicellular computing systems exist to provide a better offense, not the perfect defense.
Ideally, the systems on the periphery could change the degree of defensiveness as they run according to “threat-level” messages from a more global detection mechanism. Think of this central system as a special immune-system organ. It could be based upon honeypots, or on sniffing and analyzing network packets for signs of unusual activity, or any of the many other approaches to network security. The value, however, comes in each computer on the net being able to devote more of its time to detecting possible infections if warned about an elevated threat level. That is, the central system would, in effect, warn every machine in the network of an ongoing infection, perhaps in a way analogous to the biological inflammation response or fever.
All the detection approaches ultimately depend upon how effectively and reliably the peripheral machines can ostracize themselves. A router can “kill” a misbehaving computer by disconnecting it. However, the computer may have other connections, e.g., by WiFi, BlueTooth or other wireless means. The router managing the Ethernet connection may well be different from ones managing wireless connections. The most reliable mechanism is for the computer itself to disconnect completely from any network access. But it has to do so in a way that cannot be defeated by the infecting virus. It may be sufficient to enforce network suicide at the low-level OS, perhaps in the drivers so that a virus cannot circumvent the cut off. Yet even that may not work if buffer overflow exploits can modify a driver. Moreover, with the latest "rootkit" cloaking schemes, it has become extremely difficult to disinfect a computer, or to ever know for sure that it has been disinfected.
It may turn out that we cannot trust any software-only solution. We may need one way hardware shut offs, i.e., using this hardware, the software can shut down all networking devices in a way that cannot be reversed without active, and expert, human intervention. This might be comparable to the level at which CTRL/ALT/DEL is wired into the keyboard of a Wintel machine.
In the final analysis, however, we must remember that cell suicide can be at least a much a threat as the viral infection itself if the multi-cellular system cannot tolerate the loss of infected computers. So, above all, we need to architect the system so that it can treat all computers as expendable.
Contact: sburbeck at mindspring.com
Last revised 6/6/2012