Data Warehousing, Part 2: When Worlds Collide

As data warehousing continues to take hold as a powerful tool of modern business, developers are hard at work trying to build applications that will move the technology to the next level. Part 1 of this three-part series looks at how some developers are attempting to fuse two seemingly disparate worlds of data collection and retrieval — business intelligence (BI) and geographic information systems (GIS) — to create the multidimensional data warehousing model of the future. This installment examines some of the problems associated with that endeavor, along with some approaches toward solving them.

The challenges associated with building the next generation of large-scale data warehouse systems should not be underestimated.

Storing and accessing data across multiple geographically dispersed platforms using multiple storage and transfer standards is a huge task. How to designdatabase schema that are general and open-ended enough to accommodate the information needs of a wide range of users at any point in time is another head scratcher, as is how to design a network architecture that is stable and powerful enough to support real-time data updates, along with high volumes of online querying and application processing.

Then there’s the challenge of designing user interfaces and presenting reams of data in formats that humans find comprehensible.

Such problems aren’t solely technological, however. It’s human nature for people to feel a sense of ownership of the data they use and to develop comfortable ways to apply it to problem-solving. This natural inclination affects what data is gathered, as well as how it is stored and made accessible.

This leads to another big hurdle: lingering attitudes and behaviors. There remains a fairly wide gap between corporate data warehouse specialists and end-users, based upon education, training and organizational cultures.

Bridging Cultural Divides

Bridging this divide may be the greatest challenge for those seeking to use data warehousing concepts to build the next generation of powerful information systems to create very large-scale virtual organizations.

Take the seemingly apparent benefits of combining data warehousing concepts used in business forecasting and management with GIS, for example.

“I have always been puzzled by the chasm separating the data warehouse community and the geographic information systems community,” says Ralph Kimball, founder of the Kimball Group, a provider of data warehousing consulting services based in Boulder Creek, Calif., in a 2003 article.

“Very few ‘conventional’ data warehouses exploit their data with a map-driven approach — yet these same data warehouses are rich with geographic entities, including addresses, point locations, sales districts and higher level political geographies. I have always instinctively believed that the conventional data warehouse community could gain a great deal by taking advantage of some GIS tools and user interfaces. A map can be very compelling. For example, a two-dimensional portrayal of data can show patterns that other kinds of analysis simply can’t reveal,” he states.

“The truth is,” Kimball now tells the E-Commerce Times, “that [the] article you ran across was probably [written] too early, and while the separate data warehouse and GIS communities have been growing robustly, the natural marriage of these approaches has not taken place. But I think we may finally break the logjam.”

Kimball’s optimism has its basis in recent events. The popular response to GIS-based Web services such as Google Earth and Google Maps, he notes, is “wakening people to the attractiveness of querying maps. This may in turn suggest to users of BI that they should demand a map interface from their current databases.”

Intersection of Parallel Lines

“The BI community simply needs to turn on a map feature to use the masses of address and location data they already have,” Kimball explains. “The BI community (primarily vendors) needs to develop more compelling examples of why map querying gives the user decisive advantages for decision support.”

“To sum up, the logjam isn’t technology, it isn’t availability of data, but it’s a cultural, analytic perception gap where BI users need to see the advantages of geo-querying,” Kimball maintains.

ESRI, the developer of ArcView, ArcIMS and other GIS products, says it has been working precisely along these lines. “Historically, business intelligence and geographic information systems have followed separate development and implementation paths,” says Steve Trammell, ESRI cadastral solutions manager.

“Customer requests for a more complete operational picture and the ability to be more proactive, have led to the combination of these two technologies,” Trammell says in a paper slated for presentation at a trade conference later this year. “Regulatory requirements [have] also raised the visibility of both technologies within many organizations. In response to BI and GIS users, leading BI providers have been integrating the two technologies and providing supported solutions to a growing number of end-users.”

Trammell and others at ESRI have played a part in the integration of GIS and BI systems as they have migrated from the desktop, the server, and on to the enterprise and Web services level, using data warehousing, SOA and network protocols such as XML (extensible markup language) and SOAP (simple object access protocol), among others, to do so.

Earlier in the evolutionary period of business intelligence and geographic information systems, developers were working along parallel lines to meet the needs of core users in their particular domains. With much of that work accomplished, they have begun collaborating across organizational lines to extend their integration efforts to the enterprise level and beyond.

“For a GIS provider, using tabular data from hundreds of database and file systems could be difficult and expensive,” Trammell states. “This was a problem addressed by BI providers by using the ETL (extract, transform and load) process, or connectors that allow BI applications to use native file formats. Conversely, BI providers were hard pressed to deal with the variety of geographic data formats, CAD data and imagery,” he points out.

“The myriad of projections and datums used in GIS maps were also challenging,” Trammell observes. “The GIS sector had addressed these issues by adopting standards for the interoperability of GIS data. Two of the major obstacles to integration had already been dealt with, but had not been effectively communicated between the two application environments. The interoperability of tabular data and geographic data was already a reality.”

Data Warehousing, Part 1: Building the Virtual Organization

Data Warehousing, Part 3: One Step Beyond

Leave a Comment

Please sign in to post or reply to a comment. New users create a free account.

Related Stories

LinuxInsider Channels