Multicellular Computing: Messaging in Multicellular Computing

Multicellular biological organisms treat DNA transfer between cells as taboo. Multicellular computing needs to treat exchanging code similarly.

Computers send messages via strings of bytes. The bytes may represent either executable code or data and the dividing line between code and data has become more and more difficult to determine. Explicitly executable files (e.g., .exe on Wintel hardware/OS) are directly executed by the native platform. Or executable binary code may be hidden in various seemingly innocuous formats such as .jpg and SMS messages (see below). Various interpreters on the machine can also make text executable. That text may represent code in Java, JavaScript, ActiveX, PHP, Perl, Python and many other scripting languages. For example, we continue to learn how dangerous JavaScript in web pages can be.

The need for a Taboo against transferring code

Unlike messaging in the biological world, multicellular computing has yet to evolve a strong taboo against code transfer. Some especially sensitive application areas such as government, military and financial IT do their best to prohibit code transfer. But many programmers find it difficult to imagine a world in which altering code in a computer is prohibited even though the cost of preventing viral infections and cleaning systems that have been infected often far exceeds the cost of the original computer. And the social, economic, political and even military costs of cyber attacks and cyber blackmail by those who control botnets (huge networks of infected PC "zombies") may be enormous.

Despite the scourge of viruses and worms in the Windows monoculture, architects of smart cell-phones seem not to have yet absorbed that lesson. Predictably, viruses have already appeared that exploit mobile code between smart cell-phones, including those that exploit BlueTooth connections. Clearly, loading code, if not prohibited entirely, should be a special, and carefully managed process.

Enforcement of such a taboo is complicated by the fact that it is not always possible to separate code from data, especially when end users can still be so easily tricked into executing an email attachment. Code may sneak in as a supposedly innocuous file like a .jpg image and get executed as a result of a buffer-overrun exploit [1]. In smart phones code may sneak in using an SMS messaging exploit [2]. Fortunately, the notion of a taboo against transferring code is growing and the computing industry will incrementally improve its enforcement capabilities.

Increasing specialization in computing also tends to work against code transfer. Compatibility within the Windows monoculture, with its common APIs, permits meaningful, if dangerous, code transfers. Outside of a monoculture, it simply makes no sense to base multicellular computing messages on transmited code. Specialized computers in communities of collaborating machines function in very different ways and in different contexts. Code for a router is meaningless in a PDA or a parallel computation engine. It is neither practical nor useful for a machine requesting a service to send code that specifies how that service is to be carried out. Only the receiving computer can know how best to provide the service. Moreover, computers have “owners” that set the agenda of the machine. The owner is generally unwilling to have the computer hijacked by whoever happened to sneak rogue code into the machine.

Communication by polymorphic non-executable messages

As code transfer loses popularity, a different and rapidly growing trend, Service Oriented Architectures (SOA) and Web Services, is gaining popularity. It mimics living system’s use of polymorphic message sending. The receiving computer, not the sender, determines the meaning of all SOA and Web Services messages. So the multicellular computing world seems already to be evolving the same basic architecture that evolved in biology. Future orchestrated collaborations between computers on the Internet likely will be based upon one variety or another of SOA. Heavyweight SOA, based upon SOAP, WSDL, and a host of other standards, is gaining favor in corporate IT systems. Lighter weight SOA mashups, e.g., those based on AJAX are growing rapidly in the Internet at large. In either case, useful Web Services are emerging on the net and multicellular combinations of Web Services are becoming much more common.

Whichever new protocols emerge, we must coexist for many years with legacy systems that use proprietary communication protocols, old EDI protocols, HTML protocols, and many less common formats. If biology is any guide, many of these will never fully disappear. They will become frozen historical accidents to puzzle future students of the history of computer science.

Semantics is key -- how will shared meaning evolve?

The form of future collaborations between computers seems on its way to being settled, i.e., by moving towards SOA. But the substance is not. Polymorphic messages encoded in XML are syntactically but not semantically self-describing [3]. If polymorphic messages are to be the basis of communication, there has to be some agreement on the meaning of the messages. SOA messaging semantics will be sorted out by the efforts of various standards bodies, by acceptance of conventions, and by much evolutionary trial and error. Yet none of these are completely satisfactory – standards are slow, conventions conflict, and evolution is messy. If biology is any guide, evolution will dominate.

Evolution of messaging semantics in multicellular organisms operates upon all aspects of the message process at once. An individual organism begins with a fertilized egg that divides. Its daughter cells all share the same DNA. These daughter cells differentiate, staying physically co-located. Hence, the organism’s DNA codes for all aspects of the messaging behavior: the behavior of the “sending” cell, i.e., how and when it synthesizes and exports a given messenger molecule, the shape and lifetime of the messenger [4], and the behavior of the “receiving” cell(s), i.e., which ones have binding sites for a given molecule and which biochemical pathways are triggered by receipt of the molecule. If the semantics of the resulting message transfer are not beneficial or at least neutral to the health of the organism, the whole organism is at higher risk of death before it can pass the DNA on to the next generation. Thus survival of the fittest, which operates at the whole organism level, simultaneously punishes poor “syntax” and muddled “semantics” by culling mistakes at either end of the communication.

A single corporate network infrastructure may play an evolutionary role similar to that of a single multicellular organism. That is, it is a unitary, logically contiguous, dedicated collection of computers that supports the whole organization, for good or ill. Its routers, firewalls, VPN links, LDAP servers, DMZs, and specialized application servers must share an agreed upon and internally compatible architecture and implementation. And the semantics of pairwise collaborations must be sensible. The corporate IT staff work to ensure this. If the system is not up to the job, disaster may well ensue for the whole organization. A corporation with a seriously flawed infrastructure may well go out of business, thus failing to pass on its ineffective infrastructure architecture and semantics. Bank mergers are a classic case. A bank’s competitiveness often depends in large part upon its IT infrastructure, which is a carefully (or not so carefully) crafted multicellular computing organism. The weaker bank, when acquired by a stronger one, typically remakes its IT infrastructure to be compatible with the winning bank’s architecture. This sort of system evolves its messaging architecture in a manner similar to that of a multicellular organism, by a remorseless process of survival of the fittest.

The evolution of message semantics in the open Internet is more complicated. A person’s computer may play a different role in many different sets of multi-computer collaborations, some for private uses and some in their role as an employee, customer, or supplier in business relationships. This is similar to the more fluid and ambiguous semantics in ecologies of organisms where a single organism plays many different roles in different ecological collaborations. Predators recognize prey, and vice versa, by all sorts of chemical signals. So, chemical signals mean one thing between individuals of the same species and another to their predators or prey. For example, the musky smells of animals travel on the wind. Thus predators attack from downwind so as not to warn the prey. The “come hither” scent of a desert mouse is intended to attract a mate. That it also attracts a hungry snake is not simply an unfortunate accident. The snake coevolves with the mouse. In a similar manner, especially valuable services (Google, eBay, Amazon) support an API with defined semantics that attracts third party services. Or services with plentiful and demographically valuable users attract competing services that will offer the same API with the same semantics to attract those users. Successful poachers of either sort can then add to the API and the semantics in an attempt to freeze out competitors (this is the infamous Microsoft "Embrace, Extend and Extinguish" strategy). Such co-evolution can result in richer and more useful semantics.

Some efforts, such as UDDI have attempted to provide a semantic “yellow pages” service as a rendezvous point for finding services with the “right” semantics and to organize such semantic information into useful taxonomies. So far, these efforts have been premature, over-engineered and overly complex. It has been like attempting to start a telephone Yellow Pages before there are enough phones to matter. So, the semantics problem in the Internet remains difficult.

Alan Kay [5] proposes that the semantics travel with the message by using objects as the message carrier rather than just data. An object carries both data and the code that provides some of the meaning. Security issues would be handled by assuming that the machines are specialized as object engines that provide separate address spaces for all objects so that one cannot interfere with another. However, object collaboration still requires some shared understanding of the semantics of the polymorphic messages. Reflexive systems, those in which metadata is available with which objects can “reason” about message interfaces of other objects, might be agreed upon and such reasoning might support the bootstrapping of enough semantics to dramatically speed the evolution of useful ways of collaboration. Or, SOA brokers could match up “compatible” parties. This object approach might offer real advantages, however it is a radical departure from the way our current machines work and would require substantial evangelism and investment to succeed. Nonetheless, one day perhaps some classes of computer applications might work very well in such general object engines

[1] Note that new hardware capabilities from Intel and AMD allow memory to be marked “ no execute” which will eventually make buffer-overruns less of a problem.

[2] In July, 2009, a security researcher discovered a way to use SMS messages to completely take over an iPhone without any interaction with the phone's owner. (See interview with Charlie Miller, the discoverer of the iPhone exploit). He also found exploits for Android and Windows Mobile smart phones. He explains the basics of the iPhone exploit thus: "You can send a long [SMS] message in a series of messages and the phone will reconstruct it into a one long string. It accesses an array based on a value from the data. In the case where it thinks it reads -1, it actually accesses the memory before the array, not in the array. By setting things up just right and being tricky, you can actually leverage this to gain complete control of the device. The entire attack takes just over 500 messages, although the victim doesn't know they are being sent because they don't show up on the phone. Most of these messages have to do with setting things up 'just right.' Sixteen of them actually access the array out of bounds."

[3] XML does not encode semantics. Only the syntax is self describing. When people read XML, they subconsciously perceive the tags as carrying semantic information because the tags are usually human-readable words. The receiving computers cannot derive any meaning from those words. Semantics remains in the minds of the reader and writer of the text, not in the XML itself. Imagine, as a thought experiment, reading XML in which all the tags have been encrypted. The encryption removes none of the information in the XML but does remove the illusion of semantics.

[4] Intracellular mechanisms degrade almost all proteins including messages, some quite rapidly. Message's half-life determines their range and the duration of their effect – that is, messenger half-life is an explicit aspect of message management that evolves along with the functioning of the sending and receiving cells and of the organism as a whole.

[5] This paragraph is based on a personal communication with Alan Kay. Multicellular computing was the context for that discussion. He has long championed the view that “dumb” data should be avoided. In principle, I agree with him. The difficulty is in how to prevent the embedded "smarts" from providing an avenue of entry to viruses and other malware.


Contact: sburbeck at mindspring.com
Last revised 8/11/2009