Guidelines for Data Loading

Understand the guidelines for efficiently loading data from the EIS to the CDB.


info

Poor performance is often due to the manner in which data is loaded into the CDB. See Data-Loading Design.


info

MBOs that use DCN as the cache group policy have only one partition. See DCN and Partitions.


info

Determine if the EIS can efficiently return a sorted data set based on the primary key when retrieving very large data sets. See Optimizing Loading of Large Data Sets.


yield

Use caution when recycling existing APIs provided by EIS for use by the mobile application. Adopt only after careful evaluation.


check

Use an efficient interface (protocol) for high data volume, for example, JCo versus Web Services.


check

Use DCN to push changes to avoid expensive refresh cost and differential calculation for large amounts of data.


check

Use multiple partitions to load data in parallel and reduce latency.


check

If initial data volume is very large, consider a scheduled cache group policy with an extremely large interval to pull data into CDB and then update via DCN. However, do this only with careful orchestration to avoid lost updates.


check

Use cache groups to control and fine-tune how the manner in which MBO caches are filled.


check

Use shared-read MBOs whenever possible.


check

Improve update efficiency by using small DCN message payloads.


stop

Do not mix DCN with scheduled or on-demand cache policies, except for one-time initial data load operations—thereafter, use only DCN for updates.

Data-Loading Design

Successful data-loading design requires careful analysis of data and datasource characteristics, usage patterns, and expected user load. A significant portion of performance issues are due to incorrect data loading strategies. Having a poor strategy does not prevent a prototype from functioning, but does prevent the design from functioning properly in a production environment. A good design always starts with analysis.

Optimizing Loading of Large Data Sets

During a cache fill operation, Unwired Server retrieves a data set from the EIS. The Data Services component performs a sort/merge algorithm upon the data set to detect what has been modified since the last refresh and updates the cache. If the data set is sorted by the primary key, Data Services adds new items, compares and updates matching items, and removes missing items from the cache by traversing the records in the data set. However, if the retrieved data set is not sorted, Data Services must perform an in-memory sort. If the retrieved data set is very large, the sorting may require significant resources, causing the Unwired Server memory footprint to expand and CPU usage to spike.

To optimize performance, change the LOAD ALL or appropriate load queries to include an ORDER BY <primary key> clause to the SELECT statement in cases where the EIS is a relational database.
  1. In the Properties view of Sybase Unwired WorkSpace, go to the Attributes > Definition tab.
  2. Click Edit, then adjust the existing SQL definition with an additional ORDER BY clause followed by the primary key column(s).
For a detailed example of using ORDER BY to optimize data refresh of large data sets, see this document on the SAP Community Network http://scn.sap.com/docs/DOC-32091.

Alternatively, determine if the EIS can use DCN to send changes as they occur to Unwired Server. This is a superior method to update Unwired Server with deltas of a very large data set. Regardless if the EIS can return a sorted data set or not, CPU usage, network bandwidth, and time to process a very large data set can still be a performance concern.

Sybase Unwired Platform architecture is most optimal when:
  1. The mobile client is highly selective in the data required from the EIS and uses synchronization parameters to pull down a small subset of data from the EIS ( i.e. cache partitioning ). Refreshing cache partitions on an as needed basis is more optimal than pulling down a large number of rows from the EIS. See Client Defined Cache Partitions.
  2. Effective use of applying operation results to the cache (cache write through/behind using) are used. If the client application performs the majority of data updates then writing through or behind, the Unwired Server cache is more efficient and requires less synchronization with the EIS. See Operation Cache Policies.
  3. DCN pushes deltas from the EIS into the Unwired Server cache or pushes large unpartitioned data sets into the Unwired Server cache. See Data Change Notification.

Recycling Existing Artifacts

The most common mistake is to reuse an existing API without understanding whether it is suitable. In some cases, you can make the trade-off of using a result-set filter to clean the data for the MBO if the cost is reasonable. This filtering does not eliminate the cost of retrieving data from the EIS and filtering it out. Every part of the pipeline impacts performance and influences data loading efficiency. The best interface is always based on your requirement rather than a design intended for a separate purpose.

Pull Versus Push

Since push-style data retrieval is performed by HTTP with JSON content, optimized interfaces like JDBC or JCo are often more suitable for high-volume data transfer. Pull-style data retrieval requires the same amount of data to be transferred during refresh, and then compares changes with what is currently in the CDB. If data volume is large, the cost can be overwhelming, even with an optimized interface. DCN can efficiently propagate changes from the EIS to the CDB. However, mixing DCN and other refresh mechanisms is generally not supported. When refresh and DCN collide, race conditions can occur, leading to inconsistent data.

You can load data using a pull strategy, then switch to DCN for updates. The key is to make sure the transition between pull and push is orchestrated correctly with the EIS so updates are not missed between the time the pull ends and the push begins. Initial loading can be triggered by device users by way of the on-demand cache group policy, or with a scheduled cache group policy that has a very small interval, which then changes to an extremely large interval once data loads.

It is not advisable to use a very large DCN message for updates. Processing a large DCN message requires a large transaction, significant resources, and a reduction in concurrency.

Cache Group and Data Loading

Cache groups are the tuning mechanism for data loading. Within a package, there can be multiple groups of MBOs that have very different characteristics that require their own loading strategy. For example, it is common to have transactional and reference data in the same package. Multiple cache groups allow fine-tuning which data in a package is loaded into the CDB independent of other cache groups.

Using Cache Partitions

Cache partitions increase performance by enabling parallel loading and refresh, reducing latency, supporting on-demand pull of the latest data, and limiting scope invalidation. You must determine whether a partitioned-cache makes sense for the mobile application. The mobile application may not be able to function without the entire set of reference data, and partitioning is a viable alternative. However, even if a cache partition is not the right approach, it may still be worth considering if you can apply the concept of horizontal partition. A cache partition uses vertical partitioning. In horizontal partitioning, with a hierarchy, you may not need to load the entire object graph to start as long as some levels can be pulled on demand. By using additional cache groups, you can potentially avoid a large data load.

A cache partition is a set of MBO instances that correspond to a particular partition key. The loading of the MBOs is achieved through synchronization parameters mapped to result affecting load arguments.

DCN and Partitions

There is only one partition for the DCN cache group policy. When a synchronization group maps to a DCN cache group, the synchronization parameters are used only for filtering against the single partition. In addition, the single partition of the MBO cache in the DCN cache group should always be valid, and you should not use an "Invalidate the cache" cache policy for any MBO operations.