Linux and the Power of Virtual Mega-Machines
Feb 19, 2010 5:00 AM PT
Cloud computing describes an Internet-based computing infrastructure that has abstracted users and user applications from the underlying computing resources that support them. In concept, cloud computing is functionally different from previous IT architectures in that users no longer need to own, have expertise in, or have control over the underlying technology -- they are only aware of borrowing and consuming IT services, much as they would with telephony, electrical or plumbing infrastructures.
The cloud paradigm has developed along with the maturity of virtualization and provisioning technologies that enable resources-as-a-service using Internet protocols.
Wikipedia defines the cloud as "a computing capability that provides an abstraction between the computing resource and its underlying technical architecture (e.g., servers, storage, networks), enabling convenient, on-demand network access to a shared pool of configurable computing resources that can be rapidly provisioned and released with minimal management effort or service provider interaction." This definition states that clouds have five essential characteristics: on-demand self-service, broad network access, resource pooling, rapid elasticity and measured service."
Too Big for the Cloud?
The challenge for evolving cloud computing architectures is that there have been and always will be a significant number of applications that can't easily serve existing compute nodes. These demanding workloads require either a large number of processing cores or large memory (RAM) footprints that are beyond the scale of normal cloud computing resources. For end users who have been increasingly disaggregated from the compute nodes and trained to think of compute resources as on-demand, this is both frustrating and challenging, as they have to differentiate workloads that can run in the cloud from those that continue to need dedicated, expensive and purpose-built hardware. These users also need to maintain and manage these additional compute resources when their energies are likely better spent running workloads and generating application results instead of managing IT.
Recently, Penguin Computing and SGI announced a new HPC solution targeting end users interested in deploying their applications in the cloud. Penguin Computing and SGI are only the latest in a growing number of vendors either supplying the technology to deploy HPC in private clouds or service providers offering high-performance computing cloud services.
What makes this trend even more interesting is that traditional cloud provisioning services were targeted at enterprise workloads -- largely parallel or small data-set application services. These workloads have commonly been the focus of virtualization vendors like VMware or Citrix, who are providing cloud managers with the ability to share hardware resources among several workloads and multiple customers. Workloads requiring a large number of processing cores and/or very large memory footprints, sometimes requiring hundreds of GB of RAM, have been completely eliminated from cloud computing deployments.
So What Is Changing?
Currently, the vast majority of workloads requiring a large number of processing cores or large memory have already moved or are in the process of moving to Linux.
These once-proprietary Unix applications have been relatively easy to migrate or are increasingly being written for a Linux or open source operating system alternative. This makes it inherently easier for these workloads to move to x86 infrastructure, providing more flexibility in their deployment models and giving customers the ability to take advantage of higher-performance and lower-cost commoditized systems.
The Hypervisor in the Cloud
Virtualization is one infrastructure market not often thought of as being relevant for demanding workloads. Partitioning virtualization, as that from VMware or Citrix, is thought of as a way to optimize individual server utilization when running workloads that require less than full system resources. However, other workloads are often looking for ways to increase processor cores and memory -- not partitioning them.
New technologies from companies like ScaleMP provide a new kind of virtualization technology: virtualization for aggregation. Using the same fundamental hypervisor technology, these vendors have found a way to aggregate the power of multiple x86 boxes and run off-the-shelf Linux (for ScaleMP that is Linux kernel level 2.6.11 or later) to create very large virtual machines scaling across hundreds of processor cores and terabytes of RAM. These large virtual machines (VMs) are perfect for demanding workloads.
However, this is only half the story. Once vendors have created large VMs out of smaller x86 servers, it's just a small step to provisioning systems from a cloud infrastructure on-demand. Provisioning large VMs on an on-demand basis will lead to a revolution in architectural design -- one in which workloads requiring sub-system resources and workloads demanding resources of several systems can coexist in the same infrastructure and dynamically allow IT administrators to modify their compute resources -- accommodating needed workload rather than restricting the infrastructure to the workloads that fit within a single node. This increase in workload addressability will, in most cases, increase cloud utilization to close to 100 percent of enterprise and compute workloads and increase cloud infrastructure ROI.
This VM-on-demand-for-any-workload paradigm is not a futuristic revolution. For example, HPC-as-a-service enabled by hypervisor aggregation technologies exist today and are available from companies like R Systems in El Dorado Hills, Calif.
The common thread in this discussion is Linux and virtualization. Virtualization allows Linux to utilize exactly the hardware resources required by a specific workload -- whether these workloads require sub-system resources (partitioning) or combination of multiple systems (aggregation). Growing flexibility allows cloud infrastructures to be more power- and resource-efficient, and it provides users with virtual resources that fit their workload. Furthermore, it is possible that future cloud deployments will combine both capabilities to allow small VMs to be aggregated into large Vms, providing even greater level of flexibility -- a capability that deserves an article of its own.
Shai Fultheim is the founder and CTO of ScaleMP.