Data Provisioning and Integration

Sybase Data Federation leaves distributed data in place and provisions it to users and applications across the organization. The Data Federation environment is called a data services layer or data grid.

Data provisioning—the process of making data available in an orderly and secure way to users, application developers, and applications that need it—is a significant challenge for large, distributed organizations. With many widely varying demands for data, geographically distributed users and data sources, production systems that must be insulated from uncontrolled access, and concerns about intellectual property and confidential data, careful data provisioning is more important and more difficult than ever before.

If everyone needed data in the same format from a single data source, and that format happened to be the way the data is currently stored, there would be no data integration challenge. But the world is not that simple. Some applications expect relational data. Others need data in XML form. Still others need to aggregate sales data across multiple departments, or integrate data from different systems to obtain a single view of the customer. Data of multiple types must be combined to provide a result.

All this poses significant challenges to application developers, who must spend time writing code to access and transform data, rather than writing business logic. Developers also need to know where the data resides, and changes in the location typically break the application. Sybase Data Federation shields developers from this issue, as only the service definitions have to be changed when data moves—not the application code.

While many software vendors have turned their attention to data integration, few have addressed all the challenges, and many are creating solutions that are more complex than the original challenges. At the same time, many data consumers lack the skills necessary to take advantage of such solutions, which means involving more people in each integration project—as well as more time and expense.

The Data Federation solution implements a federated approach to data provisioning and integration that leaves distributed data in place and provisions it to users and applications across the organization. Many companies today prefer federated solutions, which leave data in place, to “big bang” solutions that require moving data into a central repository. Federated solutions minimize costs associated with disrupting data, users, applications, and administrators.

When you install Data Federation software you create a unified, low-overhead system for provisioning distributed data across departments, locations, and companies. This system is called a data services layer or data grid. The data services layer’s scope can be small or large. It can serve one department or an entire extended enterprise.

Figure 1. Data Federation retrieves data from multiple sources of different types, tailors it in ways users and developers need, and makes it available securely across the organization. Users and applications access data through standard interfaces and do not need to know where data is physically stored.

A data grid provides:

One data services layer where users and applications can access the data they need—Data Federation supports both read and write access to data
One unified catalog that enables access to multiple types of data: relational data, XML documents, file data, and application data
One unified access control mechanism that operates across networks, locations, departments—even companies

How does it do this? First, you install a set of server components on your existing network, creating a data grid that you can think of as a large catalog you might use to “shop” for data. At the beginning, the catalog is empty. Then, one by one, individual data owners “publish” their data for others to use, creating entries in the catalog. At the same time, they establish access rights for each catalog entry, specifying who can read the data, who can update the data, and so on.

In creating entries in the data catalog, data owners do not create replicas of their data. Instead, they create a link from the data catalog entry to data that exists somewhere—in a production database, an operational data store, a data warehouse, or a file server—wherever data is currently stored and managed.

The data catalog’s entries are arranged in a hierarchy much like any other directory structure, with one important difference—the data catalog is location independent. Users and developers do not need to know where data is physically stored in order to find and use it. They have one place to go to retrieve all data available to them.

But what about impact on the data source as new users and applications add to its load? By making explicit which queries are allowed to run against a data store, and by controlling cache coherence windows or prescheduling queries to run at specific times or with a certain frequency, data owners can control the load on operational systems. Data Federation has rich caching and scheduling capabilities that make this process easy to administer and transparent to the consuming users and applications.

Access and Authentication
After you create the data catalog, you can give users and application developers permission to shop there and find all the data they need.
Files
A data owner with a set of files can “publish” the files by sharing them with the Data Federation domain, naming them in the data catalog structure, and specifying access rights. Typically the data owner publishes an entire directory. Applications and users who need the file can then access it through the data catalog without having to know where it is located.
Application Data
When you use Data Federation to provide access to application data, you have several options.
Protecting Production Databases
Sybase Data Federation is designed to help database administrators insulate their production databases from risk in several ways, including: providing access to data through stored procedures; limiting access to all but specific, predefined queries; caching results; enabling access through data services and generated views; and by using two-phase commit for distributed transactions.
Ensuring Data Security
Sybase Data Federation protects intellectual property and sensitive data by ensuring that only users who are authorized to access specific data can access it. Data security features include fine-grained access policies, authentication directory service integration, encryption, firewall integration, and audit logging.
Data Representation
Sybase Data Federation software has several flexible capabilities, such as data services and view generators, that let you manipulate data in any format that’s appropriate for your application. However, Data Federation uses SQL rowsets and XML as the primary means for representing data. A rowset is a self-describing sequence of rows, or tuples. Each row consists of several named and typed columns.
Reusing Provisioning Work
If data is made available for one purpose, and the same data is needed by another group or application, the data owner need only change the access rights on the catalog entry.

Parent topic: Introduction to Data Federation