EXPERT ADVICE

Mining Data at the Speed of E-Commerce

In e-commerce, understanding the customer can make the difference between a company’s online success or failure. As processes for collecting and storing data become more sophisticated and efficient, companies are able to keep databases of transactional and clickstream data that are larger than ever before.

This information is essential to unlocking the secrets of a site’s customer base.

Until recently, much of this data was locked away in traditional data warehouse architectures, systems designed to store large amounts of data and allow for routine queries.

Although these systems can be reconfigured to perform some more in-depth and complex queries, the types of analyses today’s online businesses depend on can take several hours and even days to process on these architectures, which just isn’t good enough for the pace of online business.

Traditional architectures are usually made up of a patchwork of separate database, server and storage devices configured together to function as a data warehouse.

These solutions were originally developed to handle online transaction processing (OLTP), but were not designed to speed through the highly complex business intelligence queries that online retailers must now perform in order to gain insights into their market.

Unable to Keep the Pace

Although traditional data warehouses do a sufficient job of storing gigabytes, terabytes and even petabytes of data, when the amount of users and the complexity of queries grows along with this data, these architectures are unable to keep pace and inevitably slow to unacceptable levels.

Online retailers are increasingly seeing the need for faster analyses of data so they can truly get a handle on what the mass amounts of clickstream and transactional data mean, and then turn that data into positive business results.

Often, this requires ad hoc, iterative analyses — retailers should be able to keep drilling down and asking questions so they can hone to a fine point their understanding of what’s going on with their business and customers.

However, they are often unable to go back and ask additional questions in a timely manner because of the limitations of their systems. What good are terabytes of detailed data if the query results aren’t available until a week after the fact and just give a general view of the landscape?

Faster Analysis

In 2002, a powerful new technology entered the data analytic scene that allows businesses to drill down into mass amounts of data in ways that were previously impossible. This new technology, called a data warehouse appliance and introduced by Netezza, architecturally integrates database, server and storage in a single device, and puts the processing power right next to the data.

Instead of vast amounts of data traveling over the infrastructure for processing, Netezza has brought processing intelligence to the data, analyzing the data as fast as it streams off the disk. This means that only the results relevant to the query are returned, and analytic performance is significantly faster — 10 to 100 times faster.

How is this performance possible? Netezza designed the data warehouse appliance specifically for complex analytics on large volumes of data, with a streaming, massively parallel architecture optimized for this purpose.

Dramatic Results

This “bringing the query to the data” approach delivers a built-in performance advantage for powering the complex queries at the heart of business analytics. Netezza’s fully integrated architecture, built with inexpensive commodity components, provides a dramatic performance advantage and accounts for the system’s low purchase price and administrative simplicity.

One online retailer that has augmented its existing traditional data warehouse architecture with Netezza’s data warehouse appliance technology is Amazon.com. The company started seeing limitations in its ability to query data as it continued to grow and analyses became increasingly sophisticated. With clickstream data playing a critical role in Amazon.com’s marketing strategy, the company needed a way to analyze its clickstream data quickly in order gain a better understanding of customers’ needs.

Within one day, Amazon.com was able to install a data warehouse appliance and get it up and running. Within one month, Amazon.com had loaded 25 terabytes of clickstream data onto the new system. The results the online retailer saw with the appliance were quite dramatic: Incremental bulk loads were approximately two times faster, while merge loads ran up to eight times faster.

Staying Ahead of the Pack

On a set of 10 complex clickstream queries, Amazon’s appliance was able to speed through processes as much as 37 times faster than on its previous system. Instead of having to wait hours or days for reports to come through, Amazon.com can now make decisions quickly and adjust its course as needed. Similar results have been seen by other online retailers, creating a shift in the way e-commerce companies analyze their data.

The e-commerce industry continues to grow more competitive by the day, with new companies springing up and brick-and-mortar establishments making the transition to the digital world.

Online retailers that want to stay ahead of the pack need to have the most accurate and up-to-date information on hand at all times to ensure that they have a solid grasp of their customer base in order to stay competitive in the ever-changing online marketplace.


Justin Lindsey is chief technology officer for Netezza, a data warehouse appliance provider.


Leave a Comment

Please sign in to post or reply to a comment. New users create a free account.

Related Stories

LinuxInsider Channels