The world of data storage can be as confusing as Alice in Wonderland’s Tulgy Wood to organizations seeking the best and most cost-efficient way to gather and save information. Acronyms abound — SAN (Storage Area Network), NAS (Network Attached Storage), DAS (Direct Attached Storage), SAM (Storage Area Management) and HSM (Hierarchical Storage Management) to name a few — and trying to sift through them and arrive at the right choice can be enough to make an IT manager yearn for the days of file-cabinet forests.
The typical home user employs hard drives, both internal and external — the simplest type of DAS. However, hard drives have significant drawbacks and thus are not suitable for enterprises as a stand-alone solution to the data storage problem. Although they can be configured to back up files, they also can crash or fail, making backups unavailable. And on its own, a hard drive often cannot provide enough room to handle large files that users must access from a single volume.
RAID, on the other hand, applies an enterprise-level framework to the concept of hard drives. About 15 years ago, a team of computer scientists at the University of California, Berkeley, developed this technology, otherwise known as Redundant Array of Inexpensive (later Independent) Disks. Since then, RAID has become a primary component of storage systems, both networked and stand-alone.
How does this technology work, and which companies produce RAID systems?
“RAID is a way to increase data availability by striping files across multiple disk drives and adding redundant bits to allow the data to be recovered in case one [or more] disk drives fails during operation,” Forrester Research principal analyst Bob Zimmerman told the E-Commerce Times. “DAS, NAS and SAN all use RAID to improve data availability.”
RAID developed to increase systems dependability, agreed David Hill, vice president for storage management at Aberdeen Group.
“In the early ’90s, the cost of hard disk drives and open systems [was] high, and SCSI disks didn’t have a lot of high reliability,” Hill told the E-Commerce Times. “Ten or 20 percent were likely to have a failure, which meant you then had to restore your data from tape, which was slow going.”
RAID comes in six levels, RAID 0 to RAID 5. The most commonly used configurations are RAID 1, RAID 3 and RAID 5.
RAID 1 uses mirroring, in which data is encoded on at least two sets of drives. “If you have five disks [in your system], you buy five more. If one of the five go out, the others keep operating,” Hill explained. “You have all your data, so you can then rebuild so that both [sets] are synchronous.”
However, because RAID 1 is 100 percent redundant, it requires double the drives needed for an organization’s storage needs, making it an expensive solution, Forrester’s Zimmerman said.
Going for Parity
In contrast, the RAID 3 standard uses an odd number of disks over which data is striped and then employs an extra drive to store the error correction, or “parity” data. Should one of the drives fail, the parity drive is used to recover the data. However, during the time it takes to replace the failed drive and reload information onto its replacement, users face impaired performance.
The lowest-cost option, RAID 5, is the most common approach to RAID, Meta Group program director Rob Schafer told the E-Commerce Times. RAID 5 stripes both stored user data and parity data across all of the drives.
Although a RAID 5 setup provides good backup and is less expensive than a RAID 1 setup, RAID 1 provides faster data throughput, Schafer said. Moreover, RAID 1 provides the best reliability, making it a necessity for storage of mission-critical data, despite its cost.
Data Life Cycles
Of course, many organizations store a wide range of data of varying importance. As a result, they may have a variety of RAID setups. According to Dave Farmer, director of products and technologies PR at storage giant EMC, the main question is: What level of storage service does a customer need for a particular slice of information?
In the first decision point on the path toward choosing a storage solution, Farmer told the E-Commerce Times, “The customer classifies data based of variety of characteristics in order to determine what capabilities are needed and the cost of acquiring [those capabilities].”
He noted that because requirements vary depending on the information being stored, RAID vendors like EMC are making forays into what he called “life cycle management,” in which they create an environment that lets organizations manage data across its entire life cycle.
Creation of a tiered storage network is a key component of life cycle management, Farmer said. Once customers have classified data and have provided tiered RAID storage setups for those classes of data, they then may create a system-wide infrastructure environment that can access, manage and protect the data.
Serving Up RAID
RAID can be incorporated into an enterprise IT setup in several ways — as part of a SAN or as part of a NAS, for example. Many vendors sell systems that include this technology.
According to Schafer, EMC is the 800-pound gorilla in the storage market. Not only does the company provide high-end storage systems that include RAID to its customers, it also rebrands its midrange and low-end systems to Dell.
In addition to EMC and Dell, Schafer cited IBM, HP and Hitachi as key players in what he called a very competitive market, delivering RAID storage as part of their overall solutions.
Perhaps one of the most intriguing offerings in the space is Apple’s second-generation Xserve RAID. Although Zimmerman said the Xserve is a nonstarter outside of Apple’s traditional markets, the company’s latest Xserve iteration supports industry-standard fibre switches from vendors like Brocade and Qlogic. It also is certified for use with Microsoft Windows Server 2003 and Red Hat Linux, among others, and is priced at just over $3 per gigabyte.
“They see the price, and they don’t believe it, and then they look at the feature set and say, ‘It’s got redundant power, redundant cooling, protective RAID,'” Alex Grossman, director of hardware storage at Apple, told the E-Commerce Times.
“We’re starting to get pull from a variety of places because in the storage world, as long as you’re compatible, as long as you have the appropriate certifications and hook up to fibre channels, people don’t care,” Grossman added. “They just care about how much it is per gigabyte.”
There are a number of other RAID levels besides RAID 1-5. While some of them are arguably irrelevant (6, 7, 53) there are three VERY important ones that you failed to cover:
RAID 0: This mode simply stripes data across two or more drives. This increases your likelihood of losing data (if one drive fails you’re toast), but increases throughput dramatically. This can be very handy in situations where you need high-throughput temporary storage (video processing, session storage, etc.) but it’s most interesting in because of two other RAID levels: 0+1 and 10.
RAID 0+1: This mode pairs drives into stripe-sets and then mirrors data across those stripesets. This gives you redundancy AND throughput. This mode is almost as reliable as RAID 1 — either stripeset can die and you can still operate — and almost as fast as RAID 0 — you get the benefit of striping but pay the penalty of synchronous mirroring. Think of this as mirroring data across two very fast drives, each of which happens to have a higher likelihood of failure than a typical hard drive. (You can use more than 4 drives by making your stripesets stripe across more than 2 drives — this improves throughput and capacity but increases risk of one of your stripesets dying.)
RAID 10: This is the opposite of RAID 0+1. Instead of mirroring data across stripesets you stripe data across mirrored pairs of drives. (Got that?) Think of it as striping your data across drives that are slightly slower than typical drives but much more reliable. This performs similarly to RAID 0+1 but is more reliable because more drives can simultaneously fail without causing data loss (one drive from each mirrored pair vs. one drive for RAID 0+1).
These modes — particularly 0+1 and 10 — are commonly seen in lower end configurations (software-based RAID setups or systems using low-end RAID controllers to direct-attach cheap off-the-shelf drives) such as smaller Linux mail or database servers.