Rewriting subqueries as EXISTS predicates

The assumptions that underlie the design of SQL Anywhere require that it conserves memory and that by default it returns the first few results of a cursor as quickly as possible. In keeping with these objectives, SQL Anywhere rewrites all set-operation subqueries, such as IN, ANY, or SOME predicates, as EXISTS or NOT EXISTS predicates, if such rewriting is semantically correct. By doing so, SQL Anywhere avoids creating unnecessary work tables and may more easily identify a suitable index through which to access a table.

Uncorrelated and correlated subqueries

Uncorrelated subqueries are subqueries that contain no explicit reference to the table or tables contained in the rest of the higher-level portions of the query.

The following is an ordinary query that contains a uncorrelated subquery. It selects information about all the customers who did not place an order on January 1, 2001.

SELECT *
FROM Customers c
WHERE c.ID NOT IN
   (  SELECT o.CustomerID
      FROM SalesOrders o
      WHERE o.OrderDate = '2001-01-01' );

One possible way to evaluate this query is to create a work table of all customers in the SalesOrder table who placed orders on January 1, 2001, and then query the Customers table and extract one row for each customer listed in the work table.

However, SQL Anywhere avoids materializing results as work tables. It also gives preference to plans that return the first few rows of a result most quickly. So, the optimizer rewrites such queries using NOT EXISTS predicates. In this form, the subquery becomes correlated: the subquery now contains an explicit outside reference to the ID column of the Customers table.

SELECT *
FROM Customers c
WHERE NOT EXISTS
   (  SELECT *
      FROM SalesOrders o
      WHERE o.OrderDate = '2000-01-01'
         AND o.CustomerID = c.ID );

This query is semantically equivalent to the one above, but when expressed in this new syntax, several advantages become clear:

  1. The optimizer can choose to use either the index on the CustomerID attribute or the OrderDate attribute of the SalesOrders table. However, in the SQL Anywhere sample database, only the ID and CustomerID columns are indexed.

  2. The optimizer has the option of choosing to evaluate the subquery without materializing intermediate results as work tables.

  3. The database server can cache the results of a correlated subquery during execution. This allows the re-use of previously-computed values of this predicate for the same values of the outside reference c.ID. In the case of query above, caching does not help because customer identification numbers are unique in the Customers table. So, the subquery is always computed with different values for the outside reference c.ID.

Further information about subquery caching is located in Subquery and function caching.

See also