As more and more business is conducted online, and as companies develop a three-dimensional view of their customers by correlating information from multiple departments and multiple data sources, today’s centralized relational databases have become obsolete, according to Nelson Mattos, director of information integration at IBM.
Using existing technology, it can take seemingly forever to query multiple databases, and structured and unstructured data cannot be integrated. Labor costs are also substantial. “For every dollar a company spends on an application package, they spend (US)$5 to $9 on labor and infrastructure to integrate [the application] into their existing IT infrastructure,” Mattos told the E-Commerce Times.
To remedy these problems, companies must find a way to access data in its native storage locations on the fly, rather than pulling all information into a centralized data repository. Can IBM’s new Xperanto technology deliver what enterprises need, helping them manage and quickly access far-flung, heterogeneous data sources? Or will it prove to be just another overhyped technology that falls flat?
Evolution, Not Revolution
Xperanto is a “federated” database initiative. Essentially, the underlying concept is that Xperanto marks an evolutionary step forward — it is not meant to bea replacement for relational databases. In a federated — or virtual — database, data stores exist independently of each other, but they cooperate through an integration layer to provide users with a single view of all relevant data.
In other words, a federated database creates a “meta” database layer that points to data regardless of where it is stored, said Giga Information Group research director Philip Russom. “You don’t have to move the data into one place to manage it,” he told the E-Commerce Times.
Potential applications of this technology are numerous. For example, using a federated database, call center operators could pull together customer information from flat files, e-mail, spreadsheets and even Web services. Or, a personal finance manager could tie together in-house bank records with investment pricing information from the Internet.
Despite recent hype, however, database integration technologies are not new. IBM launched DataJoiner, for integrating relational data, in 1995, and Enterprise Information Portal, which integrated unstructured data, in 1998. But these older forms of data integration suffered from performance problems. Because they were batch-oriented, they often took about 24 hours to run.
Fortunately, time requirements have improved. The current generation of products — including DB2 Information Integrator, IBM’s first product in its Xperanto initiative — features optimization technology that allows queries to be completed in just a few minutes. “I’m already seeing companies run reports through a federated database so they can assess [corporate] performance every two to three hours,” Russom said.
According to IBM, its products also can integrate structured data, such as flat-file databases, and unstructured data, such as scanned images, into a single virtual database.
Still other improvements center on application development. DB2 Information Integrator, for example, supports both SQL and content management APIs. According to Mattos, the tool cuts coding time in half by eliminating much of the querying, correlating and data extraction that must be done at the application level. “You can add more back-end systems without impacting [a] running application,” he said. “DB2 II will know how to translate a request to the interface of the data source, extract the information, and deliver it to the application.”
Of course, federated database technology is in its early days, and it is far from perfect. Although it is touted as a real-time product, for example, DB2 Information Integrator and similar offerings actually function in “near real-time” — with results lagging by up to an hour. “The query optimization technology still has room for improvement,” Russom said. “The call center could be the killer app, but to get there, [federated database products] are going to have to deliver real-time data integration.”
More complete exploitation of Web services also will be necessary. IBM plans a future product release that will include the capability to write queries using an XML-based query language. The company also is working on search and text-mining capabilities that will allow Information Integrator to recognize a document’s content automatically for categorization purposes.
In the meantime, IBM is not the only player in the market for next-generation databases. Several smaller companies, such as Nimble Technology, Enosys and MetaMatrix, already are shipping products. For example, MetaMatrix’ platform-independent, Java-based integrationoffering is already in its third release and delivers query results in both relational and schema-compliant XML.
“We are certain that our technology is setting the bar for enterprise information integration,” Shawn Curtiss, director of marketing at MetaMatrix, told the E-Commerce Times. “We have the features and functionality that other are striving to get to.”
Although small companies might not strike fear into the heart of Big Blue, heftier competitors are also on the horizon. “Both Oracle and Microsoft have similar technologies but haven’t pulled them into a product,” Russom said. “There’s a high probability that they will respond with some type of real-time integrated data solution before the end of 2003.”
If that happens, even more enterprises will begin betting on database integration technology. Then it really will be time for these productsto stand and deliver — or get out of the game.