Era of New Technology: Common SQL Programming Mistakes

Common SQL Programming Mistakes - 5

Predicate Evaluation Order :

If you are familiar with the logical query processing order, then you may expect that a query is executed in the following order:

1.FROM
2.WHERE
3.GROUP BY
4.HAVING
5.SELECT

The sequence above outlines the logical order for executing query. Logically the FROM clause is processed first defining the source data set, next the WHERE predicates are applied, followed by GROUP BY, and so on.

However, physically, the query is processed differently and the query optimizer is free to move expressions in the query plan in order to produce the most cost efficient plan for retrieving the data. This leads to a common misunderstanding that a filter in the WHERE clause is applied before the next phases are processed. In fact, a predicate can be applied much later in the physical execution plan. Also, there is no left to right order for execution of predicates. For example, if you have a WHERE clause containing "WHERE x=1 AND y=2", there is no guarantee that "x=1" will be evaluated first. They can be executed in any order.

For example, consider the following Accounts table where, in the account_reference column, Business accounts are denoted by a numeric reference and Personal accounts by a character reference:

account_nbr account_type account_reference
------------- --------------- -------------------
1 Personal abc
2 Business Basic 101
3 Personal def
4 Business Plus 5

In general, this table indicates bad design. The account_reference column should be represented as two different attributes, specific to business and personal accounts and each with the correct data type (not even belonging to the same table). However, in practice, we very often have to deal with systems designed with shortcomings, where altering the design is not an option.

Given the above scenario, a valid request is to retrieve all business type accounts with an account reference that is greater than 20 (assuming account reference has some meaningful numeric value for business type accounts). The query may look like this:

SELECT account_nbr, account_reference AS account_ref_nbr
FROM Accounts
WHERE account_type LIKE 'Business%'
AND CAST(account_reference AS INT) > 20;

However, the query results in error:

"Conversion failed when converting the varchar value 'abc' to data type int"

The query fails because, as noted earlier, there is no prescribed order for executing predicates and nothing guarantees that the predicate "account_type LIKE ‘Business%’" will be evaluated before the predicate "CAST(account_reference AS INT) > 20". In our case, the second predicate is evaluated first resulting in a conversion error, due to the incompatible values in the account_reference column, for personal accounts.

One attempt to resolve this issue might be to use a derived table (or common table expression) to filter the business type accounts first, and then apply the predicate for account_reference column:

SELECT account_nbr, account_ref_nbr
FROM (SELECT account_nbr,
CAST(account_reference AS INT) AS account_ref_nbr
FROM Accounts
WHERE account_type LIKE 'Business%') AS A
WHERE account_ref_nbr > 20;

However, this results in the exact same error because derived tables and CTEs are expanded in the query plan and a single query plan is produced, where predicates can again be pushed up or down in the plan.

As indicated earlier, the problem here is a mix of bad design and misunderstanding of how SQL Server performs physical query execution. What is the solution? The best solution is to design the table correctly and avoid storing mixed data in a single column. In this case, a work around is to use a CASE expression to guarantee that only valid numeric values will be converted to INT data type:

SELECT account_nbr, account_reference AS account_ref_nbr
FROM Accounts
WHERE account_type LIKE 'Business%'
AND CASE WHEN account_reference NOT LIKE '%[^0-9]%'
THEN CAST(account_reference AS INT)
END > 20;

The CASE expression uses a LIKE pattern to check for valid numeric values (a double negation logic is used which can be translated as "there is not a single character that is not a digit"), and only for those values performs the CAST. For the rest of the values the CASE expression results in NULL, which is filtered out because NULL is not matched with any value (even with NULL).

Category : SQL Programming - Common Mistakes | Read More......

Era of New Technology

"We've been on this journey before.... and we know the way"

Captions

About Me

My Blog List

Common SQL Programming Mistakes - 5

Blog Archive