One of the most important functions of a CCL query is to select which rows and columns are passed from the source stream to the destination stream. Columns are selected by specifying the required columns in the Select list.
Rows are selected by using one or more filter or selection conditions. A row is selected if all of the filter conditions are true.
Let's look at a slightly more complex query which illustrates column and row selection:
INSERT INTO StockTradesMicrosoft SELECT volume, price FROM StockTrades WHERE symbol = 'MSFT';
In this example, only the volume and price data is needed, so the destination stream StockTradesMicrosoft has the following schema:
Column | Datatype | Description |
---|---|---|
shares | Integer | The number of shares exchanged for this trade. |
price | Float | The price of each share exchanged for this trade. |
The query selects the volume and price columns (SELECT volume, price) from the StockTrades stream for all Microsoft stock trades (WHERE symbol = 'MSFT'). Notice the following syntax in this query:
Because the StockTrades and StockTradesMicrosoft streams do not have identical schemas, the query can not simply select all columns in StockTrades using the asterisk. In fact, using the asterisk notation would result in a query compilation error stating that there are more columns in the SELECT list than there are in the destination stream.
The SELECT clause lists the specific StockTrades columns which should be published to the StockTradesMicrosoft stream. When there is more than one column in the SELECT clause, the column names are separated by a comma.
The WHERE clause follows the FROM clause. The WHERE clause specifies a filter condition for the query. The filter condition is an expression which must evaluate to TRUE or FALSE. It restricts the rows from the data source which are published into the destination stream to those rows that meet the expression's condition.
Even though the volume column in StockTrades (volume) doesn't have the same name in StockTradesMicrosoft (shares), there is no problem because, by default, columns are copied from source to destination positionally. That is, the first item in the SELECT list publishes to the first column of the destination stream, the second item in the select list publishes to the second column, and so on. This default order can be overridden, as explained in a later example.
Here is an example of what the input and output streams would look like for this query: