Data Provisioning and Integration

Sybase Data Federation leaves distributed data in place and provisions it to users and applications across the organization. The Data Federation environment is called a data services layer or data grid.

Data provisioning—the process of making data available in an orderly and secure way to users, application developers, and applications that need it—is a significant challenge for large, distributed organizations. With many widely varying demands for data, geographically distributed users and data sources, production systems that must be insulated from uncontrolled access, and concerns about intellectual property and confidential data, careful data provisioning is more important and more difficult than ever before.

If everyone needed data in the same format from a single data source, and that format happened to be the way the data is currently stored, there would be no data integration challenge. But the world is not that simple. Some applications expect relational data. Others need data in XML form. Still others need to aggregate sales data across multiple departments, or integrate data from different systems to obtain a single view of the customer. Data of multiple types must be combined to provide a result.

All this poses significant challenges to application developers, who must spend time writing code to access and transform data, rather than writing business logic. Developers also need to know where the data resides, and changes in the location typically break the application. Sybase Data Federation shields developers from this issue, as only the service definitions have to be changed when data moves—not the application code.

While many software vendors have turned their attention to data integration, few have addressed all the challenges, and many are creating solutions that are more complex than the original challenges. At the same time, many data consumers lack the skills necessary to take advantage of such solutions, which means involving more people in each integration project—as well as more time and expense.

The Data Federation solution implements a federated approach to data provisioning and integration that leaves distributed data in place and provisions it to users and applications across the organization. Many companies today prefer federated solutions, which leave data in place, to “big bang” solutions that require moving data into a central repository. Federated solutions minimize costs associated with disrupting data, users, applications, and administrators.

When you install Data Federation software you create a unified, low-overhead system for provisioning distributed data across departments, locations, and companies. This system is called a data services layer or data grid. The data services layer’s scope can be small or large. It can serve one department or an entire extended enterprise.

Figure 1. Data Federation retrieves data from multiple sources of different types, tailors it in ways users and developers need, and makes it available securely across the organization. Users and applications access data through standard interfaces and do not need to know where data is physically stored.

A data grid provides:

How does it do this? First, you install a set of server components on your existing network, creating a data grid that you can think of as a large catalog you might use to “shop” for data. At the beginning, the catalog is empty. Then, one by one, individual data owners “publish” their data for others to use, creating entries in the catalog. At the same time, they establish access rights for each catalog entry, specifying who can read the data, who can update the data, and so on.

In creating entries in the data catalog, data owners do not create replicas of their data. Instead, they create a link from the data catalog entry to data that exists somewhere—in a production database, an operational data store, a data warehouse, or a file server—wherever data is currently stored and managed.

The data catalog’s entries are arranged in a hierarchy much like any other directory structure, with one important difference—the data catalog is location independent. Users and developers do not need to know where data is physically stored in order to find and use it. They have one place to go to retrieve all data available to them.

But what about impact on the data source as new users and applications add to its load? By making explicit which queries are allowed to run against a data store, and by controlling cache coherence windows or prescheduling queries to run at specific times or with a certain frequency, data owners can control the load on operational systems. Data Federation has rich caching and scheduling capabilities that make this process easy to administer and transparent to the consuming users and applications.

Send your feedback on this help topic to Sybase Technical Publications: pubs@sybase.com

Your comments will be sent to the technical publications staff at Sybase, Inc. For product-related issues or technical support, contact Sybase Technical Support at 1-800-8SYBASE.