Grid vs. SMP: The Empire Tries Again

Two weeks ago I looked at IBM’s forthcoming cell processor architecture [Paul Murphy, “Fast, Faster and IBM’s PlayStation 3 Processor,” LinuxInsider, June 17, 2004] and last week speculated about the impact it might have on the x86 desktop [Paul Murphy, “Linux on Intel: Think Dead Man Walking,” LinuxInsider, June 24, 2004]. This week, I want to go beyond that and look at the impact the cell architecture will have on the battle for server dominance over the next five years. IBM isn’t the only company coming out with a new CPU technology. Sun’s throughput computing is as revolutionary and as little understood, despite being closer to realization.

Look at both products from a distance and what you see is two companies using broadly similar technologies to implement radically different ideas about how server computing should be done. Both rely on Unix to make things work, and both are building multi-CPU assemblies on single pieces of silicon. But where Sun is pursuing the Unix ideal of large resources equally available to all via networked symmetric multiprocessing (SMP), IBM is doing the opposite by using a grid on a chip to provide easier and more secure partitioning, process isolation, resource management and user-activity tracking.

Both companies are selling server-to-desktop strategies to datacenter management. But in IBM’s case, its technology strategies fit perfectly with customer beliefs about how computing should be managed while Sun’s sales people have to fudge and shuffle because the right way to run Solaris is pretty much the opposite of what the traditional data center manager knows and understands.

Vision, Technology and Marketing

These differences in vision, technology, and marketing have deep historical roots. In 1964 and 1965, when MIT was developing its vision of future computing, two broad camps emerged. One group sought ways to evolve the academic traditions of openness, peer review and community into the promised digital era by treating the computer as a communications device extending the individual user’s reach across both time and space.

In contrast, the other group saw the computer mainly as a machine for replacing clerks, offering the ability to get increasingly complex work done quickly and accurately.

In the end, the academic side won the funding battle and the Multiplexed Information and Computing Service (Multics) development project was born with an open-source agenda as described by designers Corbat and Vyssotsky in 1965 when they wrote the following:

It is expected that the Multics system will be published when it is operating substantially…. Such publication is desirable for two reasons: First, the system should withstand public scrutiny and criticism volunteered by interested readers; second, in an age of increasing complexity, it is an obligation to present and future system designers to make the inner operating system as lucid as possible so as to reveal the basic system issues.

Unfortunately, the academics who had won the funding battle lost the war as the various teams contracted to deliver on their ideas veered toward a more traditional, production-oriented understanding of what systems do and how they work. As a result, Multics eventually became a nice interactive operating system that served several generations of Honeywell (and later Bull) users very well, but didn’t meet the original agenda as a means of extending the power of the individual through communications.

Nine years later, Dennis Ritchie found a positive way, in The Evolution of the Unix Time-Sharing System, to express the fundamental differences between the Uniplexed Information and Computing Service (Unics) designers and Multics management as a statement of the Unix design motivation:

What we wanted to preserve was not just a good environment in which to do programming, but a system around which a fellowship could form. We knew from experience that the essence of communal computing, as supplied by remote-access, time-shared machines, is not just to type programs into a terminal instead of a keypunch, but to encourage close communication.

As a result, Unix became a bottom-up movement, a sort of guerrilla OS reluctantly sanctioned by management because users found it too useful to let go. From Bell Labs, it escaped into academia and ultimately became what we have now, three major research directions expressed as Solaris, Linux and BSD, but all carrying forward the original commitment to openness and the use of the technology to weld together communities of users by improving communications across both time and distance.

On the Workstation Front

Thus Sun’s technology strategy reflects key Unix ideas and relies on its long lead in Solaris symmetric multiprocessing to deliver on them without breaking support for existing Sparc applications. That’s why throughput computing provides for parallel lightweight processes in a “system on a chip” SMP environment with tightly integrated onboard memory and input-output (I/O) controllers, while Solaris 10 already sports improved lock management and support for a brand new I/O infrastructure built around uniform access to very large file systems.

More speculatively, Sun is rumored to be working on a pair of software products that, if successful, could make a low-megahertz, throughput-oriented CPU like the Niagara a power on the workstation-computing front. The first of these seems to be a preprocessor whose operation is akin to having a very good engineer insert the openMP pragmas and so allow applications to take maximum advantage of available CPU parallelism. The second is thought to use markup inserted by the first to do the opposite — allow the executables produced to run effectively on more traditional Sparc gear.

At the sales level, furthermore, Sun has a desktop-to-server story no other company can match. The Java enterprise system comprises everything needed from datacenter to desktop with services deliverable on the secure, low-cost Sunray or, for the more traditionally minded, on new or recycled x86 desktops running either Linux or Solaris.

Grid Computing and the Virtual Machine

Go back to the mid 1960s to review IBM’s responses to the MIT bid opportunity and what you see is very different. IBM management backed the more traditional approach, but the people who lost went on to invent the VM operating system and heavily influence the “future systems” design later released as the technically phenomenal System-38, now the iSeries.

VM was as much of a technical success as Unix and shared its heritage as a technology relying on user support against management deprecation for its existence, but was completely opposed to it in philosophy. Where Unix united users, VM separated them; where Unix freed users from boundaries, VM imposed resource limits and management controls; where Unix migrated to SMP and greater resource availability, VM evolved to work with finer grained hardware partitioning, systems virtualization and ever tighter controls over resource use.

Within the IBM operating systems community, however, VM offered major advantages to users simply because the CMS shell did offer interactive service and it didn’t take long for users to subvert part of the design by forcing ways to share disks and other communications channels with each other. As a result, it quickly became one of IBM’s most important offerings and its later evolution both influenced, and was influenced by, other development work within IBM.

Grid computing is much less resource efficient than SMP computing, but cheaper to build, ridiculously easy to partition, and highly scalable in terms of the number of tasks supported — making it the natural hardware expression for key VM ideas on process separation, resource management and usage tracking. The cell technology lets IBM move this from theory to practice; implementing VM control ideas across large scale grids.

Such machines have not, in the past, achieved very good processor use with even the best of today’s grid-style supercomputers not typically maintaining even 30 percent of their theoretical Linpack capacity when subjected to real workloads. IBM’s machine, however, is expected to produce much higher ratios for two reasons.

The most obvious of these is that the built-in high-speed communication combines with the grid-on-a-chip approach to allow very dense processor packaging in which propagation delay is minimized. Less obviously, however, performance losses usually increase nonlinearly with the number of CPUs in a grid, and partitioning a heavily loaded grid therefore improves performance by decreasing the number of “engines” allocated to each process.

That’s counterintuitive and the exact opposite of what happens in an SMP environment where partitioning wastes resources, but will have the effect of allowing IBM to scale servers to benchmarks and consequently turn in some awesome numbers while ratifying its community’s most deeply held perceptions about the one right way to manage servers.

IBM’s Achilles’ Heel?

The Achilles’ heel in IBM’s strategy might turn out to be power use. This won’t help Wintel, which suffers from the same problem but may offer Sun a significant advantage because SMP on a chip naturally reduces power consumption by eliminating redundancies and reducing transmission distances.

Actual system-power usage is usually largely determined by memory and therefore workload but, all other things being equal, Sun’s processor advantage coupled with the use of higher-density, lower-power memory — such as that produced by Micron’s new 6F process — could mean that a typical 16-way Rock-based system running a mixed workload in 64 GB of memory will draw about the same power as a four-way IBM Cell system with 16 GB of Toshiba’s custom DDR product.

More speculatively, if work on the preprocessors succeeds, Sun could opt for a memory-embedded floating-point array processor along the lines pioneered by Micron’s “Yukon” technology without giving up either backwards software compatibility with previous Sparc products or its advantages in SMP throughput. That, of course, would give it the ability to match IBM on floating-point performance without incurring the power costs that go with high megahertz rates.

Right now, IBM seems to be several years behind Sun on both hardware and software, but grids are simpler than SMP. Cellular computing therefore offers IBM both a performance advantage and a way to catch up. As a result, the coming battle for datacenter supremacy seems to be shaping up as a clear competition between the hardware expressions of key ideas on both sides.

In the IBM corner, traditional mainframe ideas about control and management animate grid-on-a-chip computing with ultra-secure partitioning and per-process resource allocation backed by unrivalled floating point performance, easy scalability and low cost. In the Unix corner, the movement toward information and community integration will continue as the network disappears into the computer with both SMP and I/O technology moving into hardware to provide low-cost, high-bandwidth software services to entire communities of users.

That’s two unopposable forces heading for collision — and grinding up Wintel between them.

See “Fast, Faster and IBM’s PlayStation 3 Processor” and “Linux on Intel: Think Dead Man Walking” for additional coverage on this topic by Paul Murphy…

Paul Murphy, a LinuxInsider columnist, wrote and published The Unix Guide to Defenestration. Murphy is a 20-year veteran of the IT consulting industry, specializing in Unix and Unix-related management issues.

Leave a Comment

Please sign in to post or reply to a comment. New users create a free account.

LinuxInsider Channels