Guidelines for Data Loading

Understand the guidelines for efficiently loading data from the EIS to the CDB.



Poor performance is often due to the manner in which data is loaded into the CDB. See Data-Loading Design.



MBOs that use DCN as the cache group policy have only one partition. See DCN and Partitions.



Use caution when recycling existing APIs provided by EIS for use by the mobile application. Adopt only after careful evaluation.



Use an efficient interface (protocol) for high data volume, for example, JCo versus Web Services.



Use DCN to push changes to avoid expensive refresh cost and differential calculation for large amounts of data.



Use multiple partitions to load data in parallel and reduce latency.



If initial data volume is very large, consider a scheduled cache group policy with an extremely large interval to pull data into CDB and then update via DCN. However, do this only with careful orchestration to avoid lost updates.



Use cache groups to control and fine-tune how the manner in which MBO caches are filled.



Use shared-read MBOs whenever possible.



Improve update efficiency by using small DCN message payloads.



Do not mix DCN with scheduled or on-demand cache policies, except for one-time initial data load operations—thereafter, use only DCN for updates.

Data-Loading Design

Successful data-loading design requires careful analysis of data and data source characteristics, usage patterns, and expected user load. A significant portion of performance issues are due to incorrect data loading strategies. Having a poor strategy does not prevent a prototype from functioning, but does prevent the design from functioning properly in a production environment. A good design always starts with analysis.

Recycling Existing Artifacts

The most common mistake is to reuse an existing API without understanding whether it is suitable. In some cases, you can make the trade-off of using a result-set filter to clean the data for the MBO if the cost is reasonable. This filtering does not eliminate the cost of retrieving data from the EIS and filtering it out. Every part of the pipeline impacts performance and influences data loading efficiency. The best interface is always based on your requirement rather than a design intended for a separate purpose.

Pull Versus Push

Since push-style data retrieval is performed by HTTP with JSON content, optimized interfaces like JDBC or JCo are often more suitable for high-volume data transfer. Pull-style data retrieval requires the same amount of data to be transferred during refresh, and then compares changes with what is currently in the CDB. If data volume is large, the cost can be overwhelming, even with an optimized interface. DCN can efficiently propagate changes from the EIS to the CDB. However, mixing DCN and other refresh mechanisms is generally not supported. When refresh and DCN collide, race conditions can occur, leading to inconsistent data.

You can load data using a pull strategy, then switch to DCN for updates. The key is to make sure the transition between pull and push is orchestrated correctly with the EIS so updates are not missed between the time the pull ends and the push begins. Initial loading can be triggered by device users by way of the on-demand cache group policy, or with a scheduled cache group policy that has a very small interval, which then changes to an extremely large interval once data loads.

It is not advisable to use a very large DCN message for updates. Processing a large DCN message requires a large transaction, significant resources, and a reduction in concurrency.

Cache Group and Data Loading

Cache groups are the tuning mechanism for data loading. Within a package, there can be multiple groups of MBOs that have very different characteristics that require their own loading strategy. For example, it is common to have transactional and reference data in the same package. Multiple cache groups allow fine-tuning which data in a package is loaded into the CDB independent of other cache groups.

Using Cache Partitions

Cache partitions increase performance by enabling parallel loading and refresh, reducing latency, supporting on-demand pull of the latest data, and limiting scope invalidation. You must determine whether a partitioned-cache makes sense for the mobile application. The mobile application may not be able to function without the entire set of reference data, and partitioning is a viable alternative. However, even if a cache partition is not the right approach, it may still be worth considering if you can apply the concept of horizontal partition. A cache partition uses vertical partitioning. In horizontal partitioning, with a hierarchy, you may not need to load the entire object graph to start as long as some levels can be pulled on demand. By using additional cache groups, you can potentially avoid a large data load.

A cache partition is a set of MBO instances that correspond to a particular partition key. The loading of the MBOs is achieved through synchronization parameters mapped to result affecting load arguments.

DCN and Partitions

There is only one partition for the DCN cache group policy. When a synchronization group maps to a DCN cache group, the synchronization parameters are used only for filtering against the single partition. In addition, the single partition of the MBO cache in the DCN cache group should always be valid, and you should not use an "Invalidate the cache" cache policy for any MBO operations.