Deciding When to Use Parallel Queries

Examples of situations where you can and cannot use parallel queries.

Not all problems can be partitioned as neatly as the average price calculation described in "Parallel Queries". For example, suppose you are using a bank management application to partition all bank ATM (Automated Teller Machine) transactions and handle them in parallel projects. If you partition the transactions by bank branch, the application processes ATM records for each branch very quickly and keeps track of how much money is in the ATM of each branch. However, since different customers may use different bank branches, queries involving customer accounts are processed by computers that have all the information about a customer's transactions at one branch, but no information about the customer's transactions at other branches.

If, on the other hand, you partition the data by customer ID and assign one computer to every 10,000 customers, customer balances are calculated quickly, but queries involving the calculation of money balances in a particular branch cannot be performed by a single computer. In both cases, the different computers running Sybase CEP Server have to exchange information, which precludes the use of parallel queries.

Furthermore,you can improve performance by distributing your queries.

Divide data into separate streams and then quickly merge it back into one stream when the parallel queries finish their processing. The performance benefits you gain from parallel processing are reduced if the overhead of partitioning or recombining the streams is high.

The rest of this chapter assumes that you can partition your data into independent sets, in which each set contains all the data necessary to perform the computations you want to run.