Apoptosis in Computing

Implementing programmed suicide in multicellular computing requires a shift in our attitude toward the role of individual computers

Of the four architectural principles discussed on this website, apoptosis or programmed suicide seems most foreign to the world of computing. Computer professionals tend to assume that their job is to make every effort to protect the systems and seldom if ever to put one out of its misery. However, apoptosis-like mechanisms are already used in some mission critical military or avionics systems (e.g., the space shuttle or fly-by-wire controls for jets). Redundant computers, working in parallel, monitor each other. If one computer goes astray, it is shut down. In critical corporate applications we have hot backup systems that detect an imminent breakdown, disable the failing system, and switch to a backup. Those examples are characterized by a strong awareness that the larger system must survive.

In less mission-critical situations, our old “single-cell” attitudes tend to dominate. We strive to make each individual computer as robust as possible and assume that the robustness of the system is somehow the sum of the robustness of its parts. Of course, when put that way, most of us will recognize that a better metaphor is a chain that is no more robust than its weakest link. Still, the idea of cutting out the weakest link seems foreign to most programmers. We would rather redouble our efforts to strengthen every link. That reflects a fundamental misconception about our ability to control complexity. In multicellular computing systems, we seldom have control of every participating computer, let alone have the ability to strengthen every one of them.

Apoptosis mechanisms evolved along with multicellular life and multicellular life could not have evolved without apoptosis. Multicellular organisms exploited the fact that cells are expendable. Multicellular computing systems, in contrast, evolved with the default assumption being that each and every computer is valuable and must be protected. That was acceptable in early multicellular computing systems, e.g., client-server systems, because they did not have to face today's onslaught of malware. Now, computers infected with viruses and worms are a clear danger to all other computers accessible from the infected computer.

As the value of each and every computer has lessened and the threat of viruses and worms has increased, our attitudes must adjust to the new reality. If we are to make use of the example set by biological evolution, we should architect our computing systems so that we can sacrifice any individual machine in order to protect the larger organism.

Digital analogs of cell suicide

As we contemplate the adoption of a computing equivalent of cell suicide we must first recognize that the digital world differs in several important ways from the biological world:

Since multicellular computing is about virtual interaction rather than physical interaction, a computer need only be isolated from communication with any other computer. Thus, in computing, we should focus on quarantining an errant computer (cutting it off from the net). Rapid detection of infection is crucial. Each infected computer, on average, must be capable of infecting less than one other computer. Otherwise, the epidemic of infection may grow. Therefore the priority must be to detect infection quickly.
In biological systems, all cells share the same apoptosis mechanism hence cell suicide is essentially the same process no matter what kind of cell it is. Computers, however, play very different roles in the IT structure that dictate different ways of dealing with the need to sacrifice a particular computer for the good of the system as a whole. Dealing with an infection in a key database server is far more delicate than dealing with an infected perimeter machine, especially a PC, PDA or smartphone.
Cells are replaceable. Kill one and another soon takes its place. Employees, at least knowledge workers who depend almost totally on their computers, are not only far less replaceable, but have their own opinions about being cut off from the corporate system. IT administrators would face a revolt if they routinely ostracized employee’s machines first and asked questions later. Moreover, the detection of wayward behavior may be fallible -- it could be spoofed, perhaps by an unscrupulous competitor or disgruntled customer or ex-employee. That opens the way to a new kind of denial-of-service attack: spoof the corporate system into cutting off all the employees. This would be a computing version of the Ebola virus. Note that the machines in the Internet backbone and peer-to-peer networks are more similar to the biological systems in that removing any machine, or even many machines, may slow the whole system a bit, but no machine is irreplaceable. That is because these systems were designed from the beginning to be multicellular.
Finally, sacrificing a cell to save the whole organism begs the question of just what we are trying to save in an organization's IT system. While we want to save the stigmergy structure, saving it at the expense of cutting off large numbers of individual user’s machines may kill the organization itself. The IT infrastructure is like the skeleton of the organism. It makes no sense to kill all the perimeter cells just to save the skeleton. We need a better understanding of what we can sacrifice and what we are protecting by doing so. Are we seeking to save the majority of machines, save the “most important” first, save the CEO’s PC at all costs, or perhaps save customer facing machines?

Approaches to programmed suicide in computing

The lesson from apoptosis tells us that the first level defense should be the individual computer, especially those attached at the perimeter of the network. They should be able to recognize their own anti-social behavior and detach themselves from the network. The second stage should be based on specialized computers designed to recognize infected computers that, for one reason or another, have not triggered their own suicide mechanism. The second stage system then tells the infected or otherwise wayward machine to ostracize itself. Modern anti-virus detection could serve in both roles. But we must also consider the tradeoffs in the amount of CPU and/or disk access power we want to expend detecting infections. As with biological systems, multicellular computing systems exist to provide a better offense, not the perfect defense.

Ideally, the systems on the periphery could change the degree of defensiveness as they run according to “threat-level” messages from a more global detection mechanism. Think of this central system as a special immune-system organ. It could be based upon honeypots, or on sniffing and analyzing network packets for signs of unusual activity, or any of the many other approaches to network security. The value, however, comes in each computer on the net being able to devote more of its time to detecting possible infections if warned about an elevated threat level. That is, the central system would, in effect, warn every machine in the network of an ongoing infection, perhaps in a way analogous to the biological inflammation response or fever.

All the detection approaches ultimately depend upon how effectively and reliably the peripheral machines can ostracize themselves. A router can “kill” a misbehaving computer by disconnecting it. However, the computer may have other connections, e.g., by WiFi, BlueTooth or other wireless means. The router managing the Ethernet connection may well be different from ones managing wireless connections. The most reliable mechanism is for the computer itself to disconnect completely from any network access. But it has to do so in a way that cannot be defeated by the infecting virus. It may be sufficient to enforce network suicide at the low-level OS, perhaps in the drivers so that a virus cannot circumvent the cut off. Yet even that may not work if buffer overflow exploits can modify a driver. Moreover, with the latest "rootkit" cloaking schemes, it has become extremely difficult to disinfect a computer, or to ever know for sure that it has been disinfected.

It may turn out that we cannot trust any software-only solution. We may need one way hardware shut offs, i.e., using this hardware, the software can shut down all networking devices in a way that cannot be reversed without active, and expert, human intervention. This might be comparable to the level at which CTRL/ALT/DEL is wired into the keyboard of a Wintel machine.

In the final analysis, however, we must remember that cell suicide can be at least a much a threat as the viral infection itself if the multi-cellular system cannot tolerate the loss of infected computers. So, above all, we need to architect the whole system so that it can treat all computers as expendable.

Contact: sburbeck at mindspring.com
Last revised 8/6/2013

The Four Principles

Specialization

Polymorphic Messaging

Stigmergy

Cell Suicide (Apoptosis)

Complexity

Why the Biology Metaphor

Emergence

Multi-level Emergence

Evolution

Apoptosis in Computing

Implementing programmed suicide in multicellular computing requires a shift in our attitude toward the role of individual computers

Digital analogs of cell suicide

Approaches to programmed suicide in computing

Editor's picks