The following references need to be added

Microsoft.SqlServer.Smo
Microsoft.SqlServer.SmoEnum
Microsoft.SqlServer.ConnectionInfo



ServerConnection oServerCon = new ServerConnection(txtServer.Text, txtUserName.Text, txtPassword.Text);
Server oServer = new Server(oServerCon);

lvDbOptions.Items.Clear();
String Value = String.Empty;

if (ddlOption.SelectedItem.ToString().ToLower() == "tables")
{
foreach (Table oTable in oServer.Databases[txtDatabase.Text].Tables)
{
if (oTable.IsSystemObject == false)
{
Value = oTable.ToString().Substring(oTable.ToString().IndexOf(".") + 2, (oTable.ToString().Length - 1) - (oTable.ToString().IndexOf(".") + 2));

lvDbOptions.Items.Add(new ListViewItem(Value));
}
}
}
else if (ddlOption.SelectedItem.ToString().ToLower() == "stored procedures")
{
foreach (StoredProcedure oProc in oServer.Databases[txtDatabase.Text].StoredProcedures)
{
if(oProc.IsSystemObject == false)
{
Value = oProc.ToString().Substring(oProc.ToString().IndexOf(".") + 2, (oProc.ToString().Length - 1) - (oProc.ToString().IndexOf(".") + 2));

lvDbOptions.Items.Add(new ListViewItem(Value));
}
}
}
else if (ddlOption.SelectedItem.ToString().ToLower() == "functions")
{
foreach (UserDefinedFunction oFunc in oServer.Databases[txtDatabase.Text].UserDefinedFunctions)
{
if (oFunc.IsSystemObject == false)
{
Value = oFunc.ToString().Substring(oFunc.ToString().IndexOf(".") + 2, (oFunc.ToString().Length - 1) - (oFunc.ToString().IndexOf(".") + 2));

lvDbOptions.Items.Add(new ListViewItem(Value));
}
}
}

Category : | Read More......

The following references need to be added

Microsoft.SqlServer.Smo
Microsoft.SqlServer.SmoEnum
Microsoft.SqlServer.ConnectionInfo



ServerConnection oCon = new ServerConnection(server, DBUserID, DBPassword);

Server oServer = new Server(oCon);
Restore oRestore = new Restore();

//sFileName - File Physical Location

if (File.Exists(sFileName))
{
oRestore.Database = DbName;
oRestore.Action = RestoreActionType.Database;
oRestore.Devices.AddDevice(sFileName, DeviceType.File);
oRestore.NoRecovery = false;
oRestore.ReplaceDatabase = true;
oRestore.PercentCompleteNotification = 10;
oRestore.PercentComplete +=
new PercentCompleteEventHandler(bkp_percentComplete);
oRestore.SqlRestore(oServer);
}

protected void bkp_percentComplete(object sender, PercentCompleteEventArgs e)
{
lblStatus.Text = e.Percent.ToString() + "% restored";
}

Category : | Read More......

The following references need to be added

Microsoft.SqlServer.Smo
Microsoft.SqlServer.SmoEnum
Microsoft.SqlServer.ConnectionInfo



ServerConnection oCon = new ServerConnection(Server, DBUserID, DBPassword);

Server oServer = new Server(oCon);
Backup oBackUp = new Backup();

sFileName = BackupLocation + "\\" + DBName
+ "_" + System.DateTime.Now.ToString("dd_MM_yyyy hh_mm") + ".bak";

BackupDeviceItem oBackupDevice =
new BackupDeviceItem(sFileName, DeviceType.File);

oBackUp.Devices.Add(oBackupDevice);

//oBackUp.Devices.AddDevice(@"C:\DBbackup.bak", DeviceType.File);
//oBackUp.Devices.AddDevice(sFileName, DeviceType.File);

oBackUp.Database = DBName;
oBackUp.Action = BackupActionType.Database;
oBackUp.NoRecovery = false;
oBackUp.Initialize = true;
oBackUp.PercentCompleteNotification = 10;

oBackUp.PercentComplete += new PercentCompleteEventHandler(bkp_percentComplete);
oBackUp.SqlBackup(oServer);

protected void bkp_percentComplete(object sender, PercentCompleteEventArgs e)
{
lblStatus.Text = e.Percent.ToString() + "% backed up";
}

Category : | Read More......

TRUNCATE TABLE statement was used to remove all data, the IDENTITY column resets the numbering to start with the original value. DELETE statement was used to remove all data, the IDENTITY values are still incremented according to the last value used.


A TRUNCATE TABLE statement tends to use fewer locks than a DELETE statement. When a TRUNCATE TABLE statement is used, SQL Server applies table and page locks but not row locks, as a DELETE statement does.


A TRUNCATE TABLE statement uses less transaction log space than a DELETE statement. When a TRUNCATE TABLE statement is used, SQL Server de-allocates the data pages and records only the de-allocations in the transaction log. When a DELETE statement is used, SQL Server makes an entry into the transaction log for each deleted row.


A TRUNCATE TABLE statement leaves no pages in a table, whereas a DELETE statement can leave empty pages.


While a TRUNCATE TABLE statement is more efficient than a DELETE statement, there are restrictions that govern the use of TRUNCATE TABLE. For example, you should not use a TRUNCATE TABLE statement against a table under the following conditions:


An indexed view specifies the table in its definition.

Transaction or merge replication is used to publish the table.

The table is referenced by a foreign key constraint (unless it is a self-referencing foreign key).

The table's IDENTITY values must be preserved and consistently incremented.

Only specific rows are to be deleted from the table, and not the entire dataset.


Regardless of these differences, both the TRUNCATE TABLE and DELETE statements remove only data and do not impact the table structure. Indexes, constraints, and columns are left untouched.

The Date object has methods for manipulating dates and times. JavaScript stores dates as the number of milliseconds since January 1, 1970. The sample below shows the different methods of creating date objects, all of which involve passing arguments to the Date() constructor.

A few things to note:

To create a Date object containing the current date and time, the Date() constructor takes no arguments.

When passing the date as a string to the Date() constructor, the time portion is optional. If it is not included, it defaults to 00:00:00. Also, other date formats are acceptable (e.g, "11/06/2009" and "11-06-2009").

When passing date parts to the Date() constructor, dd, hh, mm, ss, and ms are all optional. The default of each is 0.

Months are numbered from 0 (January) to 11 (December).

Some common date methods are shown below. In all the examples, the variable RIGHT_NOW contains "Fri Nov 06 00:23:54:650 EDT 2009".

The Math object is a built-in static object. The Math object's properties and methods are accessed directly (e.g, Math.PI) and are used for performing complex math operations. Some common math properties and methods are shown below.

Relational Databases :

A relational database at its simplest is a set of tables used for storing data. Each table has a unique name and may relate to one or more other tables in the database through common values.

Tables :

A table in a database is a collection of rows and columns. Tables are also known as entities or relations.

Rows :

A row contains data pertaining to a single item or record in a table. Rows are also known as records or tuples.

Columns :


A column contains data representing a specific characteristic of the records in the table. Columns are also known as fields or attributes.

Relationships :

A relationship is a link between two tables (i.e, relations). Relationships make it possible to find data in one table that pertains to a specific record in another table.

Datatypes :

Each of a table's columns has a defined datatype that specifies the type of data that can exist in that column. For example, the FirstName column might be defined as varchar(20), indicating that it can contain a string of up to 20 characters. Unfortunately, datatypes vary widely between databases.

Primary Keys :

Most tables have a column or group of columns that can be used to identify records. For example, an Employees table might have a column called EmployeeID that is unique for every row. This makes it easy to keep track of a record over time and to associate a record with records in other tables.

Foreign Keys :

Foreign key columns are columns that link to primary key columns in other tables, thereby creating a relationship. For example, the Customers table might have a foreign key column called SalesRep that links to EmployeeID, the primary key in the Employees table.

Relational Database Management System :

A Relational Database Management System (RDBMS), commonly (but incorrectly) called a database, is software for creating, manipulating, and administering a database. For simplicity, we will often refer to RDBMSs as databases.

Popular Databases :

ORACLE
SQL SERVER
DB2

Popular Open Source Databases :

MySQL
PostgreSQL

SQL Statements :

Database Manipulation Language (DML)

DML statements are used to work with data in an existing database. The most common DML statements are:

SELECT
INSERT
UPDATE
DELETE

Database Definition Language (DDL)

DDL statements are used to structure objects in a database. The most common DDL statements are:
CREATE
ALTER
DROP

Database Control Language (DCL)

DCL statements are used for database administration. The most common DCL statements are:

GRANT
DENY (SQL Server Only)
REVOKE

A regular expression is a pattern that specifies a list of characters. In this section, we will look at how those characters are specified :

Start and End ( ^ $ )

A caret (^) at the beginning of a regular expression indicates that the string being searched must start with this pattern.

The pattern ^foo can be found in "food", but not in "barfood".

A dollar sign ($) at the end of a regular expression indicates that the string being searched must end with this pattern.

The pattern foo$ can be found in "curfoo", but not in "food".

Number of Occurrences ( ? + * {} )

The following symbols affect the number of occurrences of the preceding character: ?, +, *, and {}.

A questionmark (?) indicates that the preceding character should appear zero or one times in the pattern.

The pattern foo? can be found in "food" and "fod", but not "faod".

A plus sign (+) indicates that the preceding character should appear one or more times in the pattern.

The pattern fo+ can be found in "fod", "food" and "foood", but not "fd".

A asterisk (*) indicates that the preceding character should appear zero or more times in the pattern.

The pattern fo*d can be found in "fd", "fod" and "food".

Curly brackets with one parameter ( {n} ) indicate that the preceding character should appear exactly n times in the pattern.

The pattern fo{3}d can be found in "foood" , but not "food" or "fooood".

Curly brackets with two parameters ( {n1,n2} ) indicate that the preceding character should appear between n1 and n2 times in the pattern.

The pattern fo{2,4}d can be found in "food","foood" and "fooood", but not "fod" or "foooood".

Curly brackets with one parameter and an empty second paramenter ( {n,} ) indicate that the preceding character should appear at least n times in the pattern.

The pattern fo{2,}d can be found in "food" and "foooood", but not "fod".

Common Characters ( . \d \D \w \W \s \S )

A period ( . ) represents any character except a newline.

The pattern fo.d can be found in "food", "foad", "fo9d", and "fo*d".

Backslash-d ( \d ) represents any digit. It is the equivalent of [0-9].

The pattern fo\dd can be found in "fo1d", "fo4d" and "fo0d", but not in "food" or "fodd".

Backslash-D ( \D ) represents any character except a digit. It is the equivalent of [^0-9].

The pattern fo\Dd can be found in "food" and "foad", but not in "fo4d".

Backslash-w ( \w ) represents any word character (letters, digits, and the underscore (_) ).

The pattern fo\wd can be found in "food", "fo_d" and "fo4d", but not in "fo*d".

Backslash-W ( \W ) represents any character except a word character.

The pattern fo\Wd can be found in "fo*d", "fo@d" and "fo.d", but not in "food".

Backslash-s ( \s) represents any whitespace character (e.g, space, tab, newline, etc.).

The pattern fo\sd can be found in "fo d", but not in "food".

Backslash-S ( \S ) represents any character except a whitespace character.

The pattern fo\Sd can be found in "fo*d", "food" and "fo4d", but not in "fo d".

Grouping ( [] )

Square brackets ( [] ) are used to group options.

The pattern f[aeiou]d can be found in "fad" and "fed", but not in "food", "faed" or "fd".

The pattern f[aeiou]{2}d can be found in "faed" and "feod", but not in "fod", "fed" or "fd".

Negation ( ^ )

When used after the first character of the regular expression, the caret ( ^ ) is used for negation.

The pattern f[^aeiou]d can be found in "fqd" and "f4d", but not in "fad" or "fed".

Subpatterns ( () )

Parentheses ( () ) are used to capture subpatterns.

The pattern f(oo)?d can be found in "food" and "fd", but not in "fod".

Alternatives ( )

The pipe ( ) is used to create optional patterns.

The pattern foo$^bar can be found in "foo" and "bar", but not "foobar".

Escape Character ( \ )

The backslash ( \ ) is used to escape special characters.

The pattern fo\.d can be found in "fo.d", but not in "food" or "fo4d".

Backreferences

Backreferences are special wildcards that refer back to a subpattern within a pattern. They can be used to make sure that two subpatterns match. The first subpattern in a pattern is referenced as \1, the second is referenced as \2, and so on.

A more practical example has to do matching the delimiter in social security numbers. Examine the following regular expression.

^\d{3}([\- ]?)\d{2}([\- ]?)\d{4}$

Within the caret (^) and dollar sign ($), which are used to specify the beginning and end of the pattern, there are three sequences of digits, optionally separated by a hyphen or a space. This pattern will be matched in all of following strings (and more).

•123-45-6789
•123 45 6789
•123456789
•123-45 6789
•123 45-6789
•123-456789

Trim function, trims all leading and trailing spaces:

String.prototype.trim = function(){return (this.replace(/^[\s\xA0]+/, "").replace(/[\s\xA0]+$/, ""))}

StartsWith to check if a string starts with a particular character sequecnce:

String.prototype.startsWith = function(str) {return (this.match("^"+str)==str)}

EndsWith to check if a string ends with a particular character sequecnce:

String.prototype.endsWith = function(str) {return (this.match(str+"$")==str)}

All these functions once loaded will behave as built-in JavaScript functions. Here are few examples:

var myStr = “ Earth is a beautiful planet ”;
var myStr2 = myStr.trim(); //==“Earth is a beautiful planet”;
if (myStr2.startsWith(“Earth”)) // returns TRUE
if (myStr2.endsWith(“planet”)) // returns TRUE
if (myStr.startsWith(“Earth”)) // returns FALSE due to the leading spaces…
if (myStr.endsWith(“planet”)) // returns FALSE due to trailing spaces…

Temp Tables vs. Table Variables

1. SQL Server does not place locks on table variables when the table variables are used.
2. Temp tables allow for multiple indexes to be created
3. Table variables allow a single index the Primary Key to be created when the table variable is declared only.
4. Temp tables can be created locally (#TableName) or globally (##TableName)
5. Table variables are destroyed as the batch is completed.
6. Temp tables can be used throughout multiple batches.
7. Temp tables can be used to hold the output of a stored procedure (temp tables will get this functionality in SQL Server 2008).

Table variables and Temp Tables vs. CTEs

1. CTEs are used after the command which creates them.
2. CTEs can be recursive within a single command (be careful because they can cause an infinite loop).
3. Table variables and Temp Tables can be used throughout the batch.
4. The command before the CTE must end with a semi-colon (;).
5. As Temp tables and table variables are tables you can insert, update and delete the data within the table.
6. CTEs can not have any indexes created on them, source tables much have indexes created on them.

DECLARE @people TABLE
(
id INT,
name VARCHAR(32)
)


A table variable is created in memory, and so performs slightly better than #temp tables (also because there is even less locking and logging in a table variable). A table variable might still perform I/O to tempdb (which is where the performance issues of #temp tables make themselves apparent), though the documentation is not very explicit about this.

Table variables are automatically cleared when the procedure or function goes out of scope, so you don't have to remember to drop or clear the data (which can be a good thing or a bad thing; remember "release early"?). The tempdb transaction log is less impacted than with #temp tables; table variable log activity is truncated immediately, while #temp table log activity persists until the log hits a checkpoint, is manually truncated, or when the server restarts.

Table variables are the only way you can use DML statements (INSERT, UPDATE, DELETE) on temporary data within a user-defined function. You can create a table variable within a UDF, and modify the data using one of the above statements.

Table variables can lead to fewer stored procedure recompilations than temporary tables and — since they cannot be rolled back — do not bother with the transaction log.

Limitation of Table Variable :

1. Table variables are only allowed in SQL Server 2000+, with compatibility level set to 80 or higher.

2. You cannot use a table variable in either of the following situations:

INSERT @table EXEC sp_someProcedure

SELECT * INTO @table FROM someTable


3. You cannot truncate a table variable.

4. Table variables cannot be altered after they have been declared.

5. You cannot explicitly add an index to a table variable, however you can create a system index through a PRIMARY KEY CONSTRAINT, and you can add as many indexes via UNIQUE CONSTRAINTs.

6. Table variables are not visible to the calling procedure in the case of nested procs.

7. Table variables do not maintain statistics like temporary tables.

Do I have to use table variables instead of temporary tables?

The answer depends on these three factors:

1. The number of rows that are inserted to the table.
2. The number of recompilations the query is saved from.
3. The type of queries and their dependency on indexes and statistics for performance.

In some situations, breaking a stored procedure with temporary tables into smaller stored procedures so that recompilation takes place on smaller units is helpful.

CREATE TABLE ##people
(
id INT,
name VARCHAR(32)
)


Global temporary tables operate much like local temporary tables; they are created in tempdb and cause less locking and logging than permanent tables. However, they are visible to all sessions, until the creating session goes out of scope (and the global ##temp table is no longer being referenced by other sessions). If two different sessions try the above code, if the first is still active, the second will receive the following:

Server: Msg 2714, Level 16, State 6, Line 1
There is already an object named '##people' in the database.


I have yet to see a valid justification for the use of a global ##temp table. If the data needs to persist to multiple users, then it makes much more sense, at least to me, to use a permanent table. You can make a global ##temp table slightly more permanent by creating it in an autostart procedure, but I still fail to see how this is advantageous over a permanent table. With a permanent table, you can deny permissions; you cannot deny users from a global ##temp table.

CREATE TABLE #people
(
id INT,
name VARCHAR(32)
)


A temporary table is created and populated on disk, in the system database tempdb — with a session-specific identifier packed onto the name, to differentiate between similarly-named #temp tables created from other sessions. The data in this #temp table (in fact, the table itself) is visible only to the current scope (usually a stored procedure, or a set of nested stored procedures). The table gets cleared up automatically when the current procedure goes out of scope, but you should manually clean up the data when you're done with it:

DROP TABLE #people

This will be better on resources ("release early") than if you let the system clean up *after* the current session finishes the rest of its work and goes out of scope.

A common use of #temp tables is to summarize/compact/reorganize data coming from another stored procedure. So, take this example, which pares down the results of the system procedure sp_who2 into only the SPID, Status, and HostName of *active* processes that are *not* part of the regular operation of the system:

CREATE TABLE #sp_who3
(
SPID INT,
Status VARCHAR(32) NULL,
Login SYSNAME NULL,
HostName SYSNAME NULL,
BlkBy SYSNAME NULL,
DBName SYSNAME NULL,
Command VARCHAR(32) NULL,
CPUTime INT NULL,
DiskIO INT NULL,
LastBatch VARCHAR(14) NULL,
ProgramName VARCHAR(32) NULL,
SPID2 INT
)

INSERT #sp_who3 EXEC sp_who2 'active'

SELECT SPID, Status, HostName FROM #sp_who3
WHERE spid > 15

DROP TABLE #sp_who3

One of the main benefits of using a #temp table, as opposed to a permanent table, is the reduction in the amount of locking required (since the current user is the only user accessing the table), and also there is much less logging involved. (You could also increase this performance by placing tempdb on a separate drive.)

One minor problem with #temp tables is that, because of the session-specific identifier that is tacked onto the name, the name you give it is limited to 116 characters, including the # sign (while other table types are limited to 128). If you try, you will see this:

Server: Msg 193, Level 15, State 1, Line 1
The object or column name starting with '#' is too long. The maximum length is 116 characters.


Hopefully this won't be a limitation in your environment, because I can't imagine a table name that long being useful or manageable.

Another potential problem with #temp tables is that, if you enter a transaction and use a #temp table, and then cancel without ever issuing a ROLLBACK or COMMIT, you could be causing unnecessary locks in tempdb.

Checking for Duplicates :

On any version of SQL Server, you can identify duplicates using a simple query, with GROUP BY and HAVING, as follows:

DECLARE @table TABLE (data VARCHAR(20))
INSERT INTO @table VALUES ('not duplicate row')
INSERT INTO @table VALUES ('duplicate row')
INSERT INTO @table VALUES ('duplicate row')

SELECT data, COUNT(data) nr
FROM @table
GROUP BY data
HAVING COUNT(data) > 1

The result indicates that there are two occurrences of the row containing the “duplicate row” text:

data nr
-------------------- ----
duplicate row 2

Removing Duplicate Rows in SQL Server :

The following sections present a variety of techniques for removing duplicates from SQL Server database tables, depending on the nature of the table design.

Tables with no primary key :

When you have duplicates in a table that has no primary key defined, and you are using an older version of SQL Server, such as SQL Server 2000, you do not have an easy way to identify a single row. Therefore, you cannot simply delete this row by specifying a WHERE clause in a DELETE statement.

You can, however, use the SET ROWCOUNT 1 command, which will restrict the subsequent DELETE statement to removing only one row. For example:

DECLARE @table TABLE (data VARCHAR(20))
INSERT INTO @table VALUES ('not duplicate row')
INSERT INTO @table VALUES ('duplicate row')
INSERT INTO @table VALUES ('duplicate row')

SET ROWCOUNT 1

DELETE FROM @table WHERE data = 'duplicate row'

SET ROWCOUNT 0

In the above example, only one row is deleted. Consequently, there will be one remaining row with the content “duplicate row”. If you have more than one duplicate of a particular row, you would simply adjust the ROWCOUNT accordingly. Note that after the delete, you should reset the ROWCOUNT to 0 so that subsequent queries are not affected.

To remove all duplicates in a single pass, the following code will work, but is likely to be horrendously slow if there are a large number of duplicates and table rows:

DECLARE @table TABLE (data VARCHAR(20))

INSERT INTO @table VALUES ('not duplicate row')
INSERT INTO @table VALUES ('duplicate row')
INSERT INTO @table VALUES ('duplicate row')

SET NOCOUNT ON

SET ROWCOUNT 1

WHILE 1 = 1
BEGIN
DELETE FROM @table
WHERE data IN (SELECT data
FROM @table
GROUP BY data
HAVING COUNT(*) > 1)

IF @@Rowcount = 0
BREAK ;
END

SET ROWCOUNT 0

When cleaning up a table that has a large number of duplicate rows, a better approach is to select just a distinct list of the duplicates, delete all occurrences of those duplicate entries from the original and then insert the list into the original table.

DECLARE @table TABLE(data VARCHAR(20))

INSERT INTO @table VALUES ('not duplicate row')
INSERT INTO @table VALUES ('duplicate row')
INSERT INTO @table VALUES ('duplicate row')
INSERT INTO @table VALUES ('second duplicate row')
INSERT INTO @table VALUES ('second duplicate row')

SELECT data
INTO #duplicates
FROM @table
GROUP BY data
HAVING COUNT(*) > 1

-- delete all rows that are duplicated

DELETE FROM @table
FROM @table o INNER JOIN #duplicates d
ON d.data = o.data

-- insert one row for every duplicate set

INSERT INTO @table(data)
SELECT data
FROM #duplicates

As a variation of this technique, you could select all the data, without duplicates, into a new table, delete the old table, and then rename the new table to match the name of the original table:

CREATE TABLE duplicateTable3(data VARCHAR(20))

INSERT INTO duplicateTable3 VALUES ('not duplicate row')
INSERT INTO duplicateTable3 VALUES ('duplicate row')
INSERT INTO duplicateTable3 VALUES ('duplicate row')
INSERT INTO duplicateTable3 VALUES ('second duplicate row')
INSERT INTO duplicateTable3 VALUES ('second duplicate row')

SELECT DISTINCT data
INTO tempTable
FROM duplicateTable3
GO

TRUNCATE TABLE duplicateTable3

DROP TABLE duplicateTable3

exec sp_rename 'tempTable', 'duplicateTable3'

In this solution, the SELECT DISTINCT will select all the rows from our table except for the duplicates. These rows are immediately inserted into a table named tempTable. This is a temporary table in the sense that we will use it to temporarily store the unique rows. However, it is not a true temporary table (i.e. one that lives in the temporary database), because we need the table to exist in the current database, so that it can later be renamed, using sp_Rename.

The sp_Rename command is an absolutely horrible way of renaming textual objects, such as stored procedures, because it does not update all the system tables consistently. However, it works well for non-textual schema objects, such as tables.

New Techniques for Removing Duplicate Rows in SQL Server 2005

SQL Server 2005 has introduced the row_number() function, which provides an alternative means of identifying duplicates. Rewriting the first example, for tables with no primary key, we can now assign a row number to each row in a duplicate group, with a command such as:

DECLARE @duplicateTable4 TABLE (data VARCHAR(20))

INSERT INTO @duplicateTable4 VALUES ('not duplicate row')
INSERT INTO @duplicateTable4 VALUES ('duplicate row')
INSERT INTO @duplicateTable4 VALUES ('duplicate row')
INSERT INTO @duplicateTable4 VALUES ('second duplicate row')
INSERT INTO @duplicateTable4 VALUES ('second duplicate row')

SELECT data, row_number() OVER ( PARTITION BY data ORDER BY data ) AS nr
FROM @duplicateTable4

The result will show:

data nr
-------------------- --------------------
duplicate row 1
duplicate row 2
not duplicate row 1
second duplicate row 1
second duplicate row 2

In the above example, we specify an ordering and partitioning for the row_number() function. Note that the row_number() is a ranking window function, therefore the ORDER BY and the PARTITION BY in the OVER clause are used only to determine the value for the nr column, and they do not affect the row order of the query. Also, while the above is similar to our previous GROUP BY clause, there is a big difference concerning the returned rows. With GROUP BY you must use an aggregate on the columns that are not listed after the GROUP BY. With the OVER clause there is no such restriction, and you can get access to the individual rows in the groups specified by the PARTITION BY clause. This gives us access to the individual duplicate rows, so we can get not only the number of occurrences, but also a sequence number for the individual duplicates. To filter out the duplicate rows only, we could just put the above query into a CTE or a subquery. The CTE approach is as follows:

DECLARE @duplicateTable4 TABLE (data VARCHAR(20))

INSERT INTO @duplicateTable4 VALUES ('not duplicate row')
INSERT INTO @duplicateTable4 VALUES ('duplicate row')
INSERT INTO @duplicateTable4 VALUES ('duplicate row')
INSERT INTO @duplicateTable4 VALUES ('second duplicate row')
INSERT INTO @duplicateTable4 VALUES ('second duplicate row')

WITH numbered
AS ( SELECT data, row_number() OVER ( PARTITION BY data ORDER BY data ) AS nr
FROM @duplicateTable4
)

DELETE FROM numbered
WHERE nr > 1

This solution will even work with large objects, if you stick to the new large object types introduced in SQL Server 2005: i.e. use VARCHAR(MAX) instead of TEXT, NVARCHAR(MAX) instead of NTEXT, and VARBINARY(MAX) instead of IMAGE. These new types are comparable to the deprecated TEXT, NTEXT and IMAGE, and they have the advantage that you will be able to use them with both DISTINCT and row_number().

Overuse of cursors :

Let's face it – we love loops! Whether we start programming with VB, C, C++, Java, or C#, one of the first constructs we encounter is some form of a loop. They can helpfully solve pretty much any challenge you might face.

And so, it is only natural on the day we start programming with SQL to seek out our favorite loop construct. And here it is – the mighty cursor (and its little WHILE brother)! Then we hurry to put the well known tool to use in solving our problems.

Let's look at one example. Given a table with product prices, we have to perform a monthly update of prices for products; the price updates are stored in another table with new prices.

ProductPrices table:

sku price effective_start_date effective_end_date
---- ------ -------------------- ------------------
1 10.50 2009-01-01 NULL
2 11.50 2009-01-01 NULL
3 19.00 2009-01-01 NULL
4 11.25 2009-01-01 NULL

NewPrices table:

sku price
---- ------
2 11.25
4 12.00

A cursor solution may look like this:

DECLARE @sku INT;
DECLARE @price DECIMAL(15, 2);
DECLARE PriceUpdates

CURSOR LOCAL
FORWARD_ONLY
STATIC
READ_ONLY
FOR SELECT sku, price
FROM NewPrices;

OPEN PriceUpdates;

FETCH NEXT FROM PriceUpdates
INTO @sku, @price;

WHILE @@FETCH_STATUS = 0
BEGIN

UPDATE ProductPrices
SET price = @price, effective_start_date = CURRENT_TIMESTAMP
WHERE sku = @sku;

FETCH NEXT FROM PriceUpdates
INTO @sku, @price;

END

CLOSE PriceUpdates;
DEALLOCATE PriceUpdates;

Mission accomplished! Now we can take a well-deserved break while the query is running. Soon, the realization dawns that procedural row by row processing is not working well in SQL. Besides being very slow, our solution is long, hard to read and maintain. This is the moment we understand the power of SQL is its set-based nature. The same task can be accomplished using a very efficient set-based query that is easier to understand and maintain:

UPDATE ProductPrices
SET price = (SELECT N.price
FROM NewPrices AS N
WHERE N.sku = ProductPrices.sku),
effective_start_date = CURRENT_TIMESTAMP
WHERE EXISTS(SELECT *
FROM NewPrices AS N
WHERE N.sku = ProductPrices.sku);

There are different ways to write a set based query to solve this problem: using the MERGE statement, update with Common Table Expression, or the SQL Server specific update with join. But the point is to utilize the natural power of the SQL language and use set based techniques to solve problems and to avoid procedural solutions.

Note: While you should avoid cursors as much as possible, there are certain problems, such as running total aggregations, that today are still best solved using cursors. We can be optimistic that future enhancements will provide better tools to solve those problems in a set based way.

Incorrect subquery column :

When writing a subquery, it is very easy to abstract yourself from the main query logic and concentrate on the subquery itself. This can lead to the innocent mistake of substituting a column from the subquery source table for a column with similar name from the main query.

Let's look at two very simple tables; one is a Sales table containing sales data, and the other is an auxiliary Calendar table that has all calendar dates and holidays (abbreviated here):

Sales table:

sale_date sale_amount
---------- -----------
2009-01-01 120.50
2009-01-02 115.00
2009-01-03 140.80
2009-01-04 100.50

Calendar table:

calendar_date holiday_name
------------- ----------------
2009-01-01 New Year's Day
2009-01-02 NULL
2009-01-03 NULL
2009-01-04 NULL
2009-01-05 NULL

Our task is to retrieve sales data for holiday dates only. It seems like a trivial query to write:

SELECT sale_date, sale_amount
FROM Sales AS S
WHERE sale_date IN (SELECT sale_date
FROM Calendar AS C
WHERE holiday_name IS NOT NULL);

However, you'll find that query simply returns all rows from the Sales table! A closer look at the query reveals that the culprit to be the SELECT list of the subquery. It accidentally references the sales_date column from the Sales table, instead of the calendar_date column from the Calendar table.

If that is the case, why did we not get an error? Although the outcome was not what we expected, this is still a valid SQL statement. When using a subquery, the outer query's columns are exposed to the inner query. Here, we unintentionally converted the self-contained subquery, to be executed once and the value passed to the outer query, to a correlated subquery, logically executed once for every row returned by the outer query.

As a result, the subquery evaluates to sale_date IN (sale_date) which is always true, as long as there is at least one holiday date in the Calendar table, and so our result set returns all rows from the Sales table. Of course, the fix is easy in this case; we simply use the correct date column from the Calendar table:

SELECT sale_date, sale_amount
FROM Sales AS S
WHERE sale_date IN (SELECT C.calendar_date
FROM Calendar AS C
WHERE C.holiday_name IS NOT NULL);


This illustrates another important point: it is a best practice to prefix columns in subqueries with table aliases. For example, if we had used an alias like this:

SELECT sale_date, sale_amount
FROM Sales AS S
WHERE sale_date IN (SELECT C.sale_date
FROM Calendar AS C
WHERE holiday_name IS NOT NULL);


Then this query would have resulted in an error – "Error: Invalid column name 'sale_date'".

Functions on indexed columns in predicates :

We often tend to write code as a direct translation of given request. For example, if we are asked to retrieve all customers whose name starts with the letter L, it feels very natural to write the query like this, using the LEFT function to return the first character of their name:

SELECT customer_name
FROM Customers
WHERE LEFT(customer_name, 1) = 'L';


Alternatively, if we are asked to calculate the total sales for January 2009, we might write a query like the following, which uses the DATEPART function to extract the relevant month and year from the sale_date column:

SELECT SUM(sale_amount) AS total_sales
FROM Sales
WHERE DATEPART(YEAR, sale_date) = 2009
AND DATEPART(MONTH, sale_date) = 1;


While these queries look very intuitive, you will find that the indexes that you (of course!) have on your customer_name and sale_date columns remain unused, and that the execution plan for these queries reveal index scans.

The problem arises from the fact that the index columns are being passed to a function, which the query engine must then evaluate for every single row in the table. In cases such as these, the WHERE clause predicate is deemed "non-SARGable" and the best that the query optimizer can do is perform a full index or table scan.

To make sure the indexes get used, we need to avoid the use of functions on the indexed columns. In our two examples, it is a relatively simple task to rewrite the queries to use SARG-able predicates. The first requested can be expressed with this logically equivalent query:

SELECT customer_name
FROM Customers
WHERE customer_name LIKE 'L%';

The equivalent for the second query is as follows:

SELECT SUM(sale_amount) AS total_sales
FROM Sales
WHERE sale_date >= '20090101'
AND sale_date < '20090201';


These two queries are most likely to utilize index seek to retrieve the data quickly and efficiently.

It's worth noting that SQL Server is getting "smarter" as it evolves. For example, consider the following query, which uses the CAST function on the indexed sale_date column:

SELECT SUM(sale_amount) AS total_sales
FROM Sales
WHERE CAST(sale_date AS DATE) = '20090101';

If you run this query on SQL 2005 or earlier, you'll see an index scan. However, on SQL Server 2008 you'll see an index seek, despite the use of the CAST function. The execution plan reveals that the predicate is transformed into something like the following:

SELECT SUM(sale_amount) AS total_sales
FROM Sales
WHERE sale_date >= '20090101'
AND sale_date < '20090102';

However, in general, you should use SARGable predicates where possible, rather than rely on the evolving intelligence of the optimizer.

Scalar user-defined functions :

Reuse of code is one of the fundamental principles we learn when programming in any language, and the SQL language is no exception. It provides many means by which to logically group code and reuse it.

One such means in SQL Server is the scalar user-defined function. It seems so convenient to hide away all those complex calculations in a function, and then simply invoke it in our queries. However, the hidden "sting in the tail" is that it can bring a heavy toll in terms of performance. When used in a query, scalar functions are evaluated for each row and, with large tables, this can result in very slow running queries. This is especially true when the scalar function needs to access another table to retrieve data.

Here is one example. Given tables with products and sales for products, the request is to retrieve total sales per product. Since the total sales value can be reused in another place, you decide to use a scalar function to calculate the total sales for a product:

CREATE FUNCTION dbo.GetTotalSales(@sku INT)
RETURNS DECIMAL(15, 2)
AS
BEGIN
RETURN(SELECT SUM(sale_amount)
FROM Sales
WHERE sku = @sku);
END


Then the query to retrieve the total sales for each product will look like this;

SELECT sku, product_description, dbo.GetTotalSales(sku) AS total_sales
FROM Products;


Isn't this a very neat and good looking query? But just wait until you run it over a large data set. The total sales calculation will be repeated for each and every row, and the overhead will be proportional to the number of rows. The correct way to handle this is, if possible, is to rewrite the function as a table-valued function, or simply perform the calculation in the main query. In our example, performing the calculation in the query will look like this:

SELECT P.sku, P.product_description, SUM(S.sale_amount) As total_sales
FROM Products AS P
JOIN Sales AS S
ON P.sku = S.sku
GROUP BY P.sku, P.product_description;


And here is a table-valued function that can be used to calculate total sales:

CREATE FUNCTION dbo.GetTotalSales(@sku INT)
RETURNS TABLE
AS
RETURN(SELECT SUM(sale_amount) AS total_sales
FROM Sales
WHERE sku = @sku);

Now the table-valued function can be invoked in the query using the APPLY operator:

SELECT sku, product_description, total_sales
FROM Products AS P
CROSS APPLY dbo.GetTotalSales(P.sku) AS S;

Use of SELECT * :

On first encounter with SQL we always praise the genius who invented the syntax SELECT *! It's so handy and easy to use! Instead of explicitly listing all column names in our query, we just use the magic wildchar '*' and retrieve all columns. For example, a common misuse of SELECT * is to extract a set of all plastic products and to insert them into another table with the same structure:

INSERT INTO PlasticProducts
SELECT *
FROM Products
WHERE material_type = 'plastic';


Job done! However, one day business requirements change and two new columns are added to the Products table:

ALTER TABLE Products
ADD effective_start_date DATETIME,
effective_end_date DATETIME;

All of sudden the magic query results in error:

Msg 213, Level 16, State 1, Line 1
Insert Error: Column name or number of supplied values does not match table definition.

The fix is to explicitly list the column names in the query:

INSERT INTO PlasticProducts (sku, product_description, material_type)
SELECT sku, product_description, material_type
FROM Products
WHERE material_type = 'plastic';


The situation can get even worse if a view is created using SELECT *, and later the base tables are modified to add or drop columns.

Note: If a view is create using the SCHEMABINDING option, then the base tables cannot be modified in a way that will affect the view definition.

To conclude, do not use SELECT * in production code! One exception here is when using the EXISTS predicate. The select list in the subquery for the EXISTS predicate is ignored since only the existence of rows is important.

Subqueries that return more than one value :

A very frequent request is to retrieve a value based on some correlation with the main query table. For example, consider the following two tables, storing details of products and the plants that manufacture these products:

Products table:

sku product_description
----- ------------------
1 Bike
2 Ball
3 Phone

ProductPlants table:

sku plant_nbr
----- -----------
1 1
2 1
3 2

The request is to extract the manufacturing plant for each product. One way to satisfy the request is to write the following query using correlated subquery to retrieve the plant:

SELECT sku, product_description,
(SELECT plant_nbr
FROM ProductPlants AS B
WHERE B.sku = A.sku) AS plant_nbr
FROM Products AS A;


Note that the point here is to illustrate a technique; there could be a more efficient way to accomplish the same task. However, all works fine and we get the correct result set:

sku product_description plant_nbr
---- ------------------- -----------
1 Bike 1
2 Ball 1
3 Phone 2

The query will continue to work happily until the day arrives that the company decides to start manufacturing Balls at plant 3, to cope with increasing demand. The ProductPlants table now looks like this:

sku plant_nbr
----- -----------
1 1
2 1
2 3
3 2


All of a sudden, our query starts generating the following error:

Msg 512, Level 16, State 1, Line 1
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.


The error is descriptive enough. Instead of the expected scalar value, our subquery returns a result set, which breaks the query. Based on our business requirements, the fix is simple. To list all plants manufacturing plant for a particular product, we simply use a JOIN:

SELECT A.sku, A.product_description, B.plant_nbr
FROM Products AS A
JOIN ProductPlants AS B
ON A.sku = B.sku;


Now the query completes without errors and returns the correct results:

sku product_description plant_nbr
---- -------------------- -----------
1 Bike 1
2 Ball 1
2 Ball 3
3 Phone 2

Note that the same error can occur in a predicate where a column or expression is tested against a subquery, for example "… column = (SELECT value FROM Table)". In that case, the solution is to use the IN predicate in place of "=".

Outer joins and placement of predicates :

Outer joins are such a great tool but are also much misunderstood and abused. Some people seem to like them so much that they throw one into almost every query, regardless of whether or not it is needed!

The key to correct use of outer joins is an understanding of the logical steps required to process an outer join in a query. Here are the relevant steps from the query processing phases:

1.A cross join (Cartesian product) is formed for the two input tables in the FROM clause. The result of the Cartesian product is every possible combination of a row from the first table and a row from the second table.

2.The ON clause predicates are applied filtering only rows satisfying the predicate logic.

3.Any Outer rows filtered out by the predicates in step 2 are added back. Rows from the preserved table are added with their actual attribute values (column values), and the attributes (columns) from the non preserved table are set to NULL.

4.The WHERE clause predicates are applied.
An outer join query can produce completely different results depending on how you write it, and where predicates are placed in that query. Let's look at one example, based on the following two tables, Customers and Orders:

Customers table:

customer_nbr customer_name
------------ --------------
1 Jim Brown
2 Jeff Gordon
3 Peter Green
4 Julie Peters

Orders table:

order_nbr order_date customer_nbr order_amt
----------- ---------- ------------ ----------
1 2008-10-01 1 15.50
2 2008-12-15 2 25.00
3 2009-01-02 1 18.00
4 2009-02-20 3 10.25
5 2009-03-05 1 30.00

Our task is to retrieve a list of all customers, and the total amount they have spent on orders, since the beginning of year 2009. Instinctively, one may write the following query:

SELECT C.customer_name, SUM(COALESCE(O.order_amt, 0)) AS total_2009
FROM Customers AS C
LEFT OUTER JOIN Orders AS O
ON C.customer_nbr = O.customer_nbr
WHERE O.order_date >= '20090101'
GROUP BY C.customer_name;


But the results do not look good:

customer_name total_2009
-------------- ------------
Jim Brown 48.00
Peter Green 10.25


Customers Jeff and Julie are missing from the result set. Where is the problem? In order to understand what went wrong, let’s play back this query one step at a time following the logical processing order. The first step is a cross join between the two input tables:

SELECT C.customer_name, O.order_amt
FROM Customers AS C
CROSS JOIN Orders AS O;


This results in every possible combination of rows from both input tables:

customer_name order_amt order_date
---------------- ---------- ----------
Jim Brown 15.50 2008-10-01
Jim Brown 25.00 2008-12-15
Jim Brown 18.00 2009-01-02
Jim Brown 10.25 2009-02-20
Jim Brown 30.00 2009-03-05
Jeff Gordon 15.50 2008-10-01
Jeff Gordon 25.00 2008-12-15
Jeff Gordon 18.00 2009-01-02
Jeff Gordon 10.25 2009-02-20
Jeff Gordon 30.00 2009-03-05
Peter Green 15.50 2008-10-01
Peter Green 25.00 2008-12-15
Peter Green 18.00 2009-01-02
Peter Green 10.25 2009-02-20
Peter Green 30.00 2009-03-05
Julie Peters 15.50 2008-10-01
Julie Peters 25.00 2008-12-15
Julie Peters 18.00 2009-01-02
Julie Peters 10.25 2009-02-20
Julie Peters 30.00 2009-03-05

The next step is applying the ON predicates of the JOIN clause:

SELECT C.customer_name, O.order_amt, O.order_date
FROM Customers AS C
INNER JOIN Orders AS O
ON C.customer_nbr = O.customer_nbr;


The result of this query includes only customers with orders. Since customer Julie does not have any orders it is excluded from the result set:

customer_name order_amt order_date
-------------- ---------- ----------
Jim Brown 15.50 2008-10-01
Jeff Gordon 25.00 2008-12-15
Jim Brown 18.00 2009-01-02
Peter Green 10.25 2009-02-20
Jim Brown 30.00 2009-03-05


The third step of the logical processing order is adding back the outer rows. These rows were excluded in the prior step because they did not satisfy the join predicates.

SELECT C.customer_name, O.order_amt, O.order_date
FROM Customers AS C
LEFT OUTER JOIN Orders AS O
ON C.customer_nbr = O.customer_nbr;


Now customer Julie is added back in the result set. Notice the added outer rows from the preserved table (Customers) have values for the selected attributes (customer_name) and the non-preserved table (Orders) rows have NULL for their attributes (order_amt and order_date):

customer_name order_amt order_date
-------------- ---------- ----------
Jim Brown 15.50 2008-10-01
Jim Brown 18.00 2009-01-02
Jim Brown 30.00 2009-03-05
Jeff Gordon 25.00 2008-12-15
Peter Green 10.25 2009-02-20
Julie Peters NULL NULL


The last step is applying the WHERE clause predicates:

SELECT C.customer_name, O.order_amt, O.order_date
FROM Customers AS C
LEFT OUTER JOIN Orders AS O
ON C.customer_nbr = O.customer_nbr
WHERE O.order_date >= '20090101';


Now the picture is clear! The culprit is the WHERE clause predicate. Customer Jeff is filtered out from the result set because he does not have orders past January 1, 2009, and customer Julie is filtered out because she has no orders at all (since the outer row added for Julie has NULL for the order_date column). In effect, in this case, the predicate in the WHERE clause turns the outer join into an inner join.

To correct our initial query, it is sufficient to move the WHERE predicate into the join condition.

SELECT C.customer_name, SUM(COALESCE(O.order_amt, 0)) AS total_2009
FROM Customers AS C
LEFT OUTER JOIN Orders AS O
ON C.customer_nbr = O.customer_nbr
AND O.order_date >= '20090101'
GROUP BY C.customer_name;


Now, the query returns correct results because Jeff and Julie are filtered out in the join predicates, but then added back when the outer rows are added.

customer_name total_2009
-------------- ------------
Jeff Gordon 0.00
Jim Brown 48.00
Julie Peters 0.00
Peter Green 10.25


In a more complex example, with multiple joins, the incorrect filtering may happen on a subsequent table operator (like join to another table) instead in the WHERE clause. For example, say we have an OrderDetails table containing product SKU and quantity, and the request is to retrieve a list of all customers, with order amount and quantity, for selected product SKUs. The following query may seem correct:

SELECT C.customer_name, O.order_amt, D.qty
FROM Customers AS C
LEFT OUTER JOIN Orders AS O
ON C.customer_nbr = O.customer_nbr
INNER JOIN OrderDetails AS D
ON D.order_nbr = O.order_nbr
AND D.sku = 101;


However, here the INNER join with the OrderDetails table plays the exact same role as the predicate in the WHERE clause in our previous example, in effect turning the LEFT OUTER join to INNER join. The correct query to satisfy this request needs to use a LEFT OUTER join to join to the OrderDetails table:

SELECT C.customer_name, O.order_amt, D.qty
FROM Customers AS C
LEFT OUTER JOIN Orders AS O
ON C.customer_nbr = O.customer_nbr
LEFT JOIN OrderDetails AS D
ON D.order_nbr = O.order_nbr
AND D.sku = 101;

Predicate Evaluation Order :

If you are familiar with the logical query processing order, then you may expect that a query is executed in the following order:

1.FROM
2.WHERE
3.GROUP BY
4.HAVING
5.SELECT


The sequence above outlines the logical order for executing query. Logically the FROM clause is processed first defining the source data set, next the WHERE predicates are applied, followed by GROUP BY, and so on.

However, physically, the query is processed differently and the query optimizer is free to move expressions in the query plan in order to produce the most cost efficient plan for retrieving the data. This leads to a common misunderstanding that a filter in the WHERE clause is applied before the next phases are processed. In fact, a predicate can be applied much later in the physical execution plan. Also, there is no left to right order for execution of predicates. For example, if you have a WHERE clause containing "WHERE x=1 AND y=2", there is no guarantee that "x=1" will be evaluated first. They can be executed in any order.

For example, consider the following Accounts table where, in the account_reference column, Business accounts are denoted by a numeric reference and Personal accounts by a character reference:

account_nbr account_type account_reference
------------- --------------- -------------------
1 Personal abc
2 Business Basic 101
3 Personal def
4 Business Plus 5


In general, this table indicates bad design. The account_reference column should be represented as two different attributes, specific to business and personal accounts and each with the correct data type (not even belonging to the same table). However, in practice, we very often have to deal with systems designed with shortcomings, where altering the design is not an option.

Given the above scenario, a valid request is to retrieve all business type accounts with an account reference that is greater than 20 (assuming account reference has some meaningful numeric value for business type accounts). The query may look like this:

SELECT account_nbr, account_reference AS account_ref_nbr
FROM Accounts
WHERE account_type LIKE 'Business%'
AND CAST(account_reference AS INT) > 20;


However, the query results in error:

"Conversion failed when converting the varchar value 'abc' to data type int"

The query fails because, as noted earlier, there is no prescribed order for executing predicates and nothing guarantees that the predicate "account_type LIKE ‘Business%’" will be evaluated before the predicate "CAST(account_reference AS INT) > 20". In our case, the second predicate is evaluated first resulting in a conversion error, due to the incompatible values in the account_reference column, for personal accounts.

One attempt to resolve this issue might be to use a derived table (or common table expression) to filter the business type accounts first, and then apply the predicate for account_reference column:

SELECT account_nbr, account_ref_nbr
FROM (SELECT account_nbr,
CAST(account_reference AS INT) AS account_ref_nbr
FROM Accounts
WHERE account_type LIKE 'Business%') AS A
WHERE account_ref_nbr > 20;


However, this results in the exact same error because derived tables and CTEs are expanded in the query plan and a single query plan is produced, where predicates can again be pushed up or down in the plan.

As indicated earlier, the problem here is a mix of bad design and misunderstanding of how SQL Server performs physical query execution. What is the solution? The best solution is to design the table correctly and avoid storing mixed data in a single column. In this case, a work around is to use a CASE expression to guarantee that only valid numeric values will be converted to INT data type:

SELECT account_nbr, account_reference AS account_ref_nbr
FROM Accounts
WHERE account_type LIKE 'Business%'
AND CASE WHEN account_reference NOT LIKE '%[^0-9]%'
THEN CAST(account_reference AS INT)
END > 20;


The CASE expression uses a LIKE pattern to check for valid numeric values (a double negation logic is used which can be translated as "there is not a single character that is not a digit"), and only for those values performs the CAST. For the rest of the values the CASE expression results in NULL, which is filtered out because NULL is not matched with any value (even with NULL).

Data type mismatch in predicates

This is another typical mistake that is sometimes hard to catch. It is very easy to mismatch data types in predicates. It could be in a stored procedure where the parameter is passed as one data type and then used in a query to filter data on a column of different data type. Another example is joining tables on columns with different data types, or simply using a predicate where data types are mismatched.

For example, we may have a Customers table where the last_name column is of type VARCHAR:

CREATE TABLE Customers (
customer_nbr INT NOT NULL PRIMARY KEY,
first_name VARCHAR(35) NOT NULL,
last_name VARCHAR(35) NOT NULL);

Then the following stored procedure is used to retrieve the customer information by customer last name:

CREATE PROCEDURE GetCustomerByLastName
@last_name NVARCHAR(35)
AS
SELECT first_name, last_name
FROM Customers
WHERE last_name = @last_name;


Notice here the parameter @last_name is of data type NVARCHAR. Although the code "works", SQL Server will have to perform implicit conversion of the last name column to NVARCHAR, because NVARCHAR is of higher data precedence. This can result in a performance penalty. The implicit conversion is visible in the query plan as CONVERT_IMPLICIT. Based on collation, and other factors, a data type mismatch may also preclude the use of an index seek. Use of the correct data type resolves the problem:

CREATE PROCEDURE GetCustomerByLastName
@last_name VARCHAR(35)
AS
SELECT first_name, last_name
FROM Customers
WHERE last_name = @last_name;

In many cases, this mistake is the result of splitting responsibilities on the team, and having one team member design the tables and another implement stored procedures or code. Another reason could be using different data sources to join data where the join columns have different data types in the source systems. The same advice applies not only to character data type mismatches, but also to mismatches between numeric data types (like INT and FLOAT), or the mixing of numeric and character data types.

NULLs and the NOT IN predicate

One of the most common requests is to retrieve data based on some column value not included in a list of values. The following two tables illustrate the scenario. We have tables with colors and products:

Colors table:

color
----------
Black
Blue
Green
Red

Products table:

sku product_description color
---- -------------------- ------
1 Ball Red
2 Bike Blue
3 Tent NULL

Note that these tables do not represent a perfect design, following normalization rules and best practices. Rather, it is a simplified scenario to help illustrate this example better. In reality, the colors table would most likely contain a color code key column that would be referenced in the products table.

The request is to select a list of colors that have not previously been used on products. In other words, we need to construct a query that returns only those colors for which there is no product with that color. It might seem, at first glance, that the NOT IN predicate provides a very intuitive way to satisfy this request, very close to how the problem would be stated in plain English:

SELECT C.color
FROM Colors AS C
WHERE C.color NOT IN (SELECT P.color
FROM Products AS P);

You may have been expecting this query to return two rows (for 'black' and 'green') but, in fact, it returns an empty result set:

color
----------

(0 row(s) affected)


Obviously this is 'incorrect'. What is the problem? It's simply that SQL uses three-valued logic, driven by the existence of NULL, which is not a value but a marker to indicate missing (or UNKNOWN) information. When the NOT operator is applied to the list of values from the subquery, in the IN predicate, it is translated like this:

"color NOT IN (Red, Blue, NULL)" is equivalent to "NOT(color=Red OR color=Blue OR color=NULL)"

The expression "color=NULL" evaluates to UNKNOWN and, according to the rules of three-valued logic, NOT UNKNOWN also evaluates to UNKNOWN. As a result, all rows are filtered out and the query returns an empty set.

This mistake will often surface if requirements change, and a non-nullable column is altered to allow NULLs. It also highlights the need for thorough testing. Even if, in the initial design, a column disallows NULLs, you should make sure your queries continue to work correctly with NULLs.

One solution is to use the EXISTS predicate in place of IN, since EXISTS uses two-valued predicate logic evaluating to TRUE/FALSE:

SELECT C.color
FROM Colors AS C
WHERE NOT EXISTS(SELECT *
FROM Products AS P
WHERE C.color = P.color);


color
----------
Black
Green


Other possible solutions are as follows:

/* IS NOT NULL in the subquery */

SELECT C.color
FROM Colors AS C
WHERE C.color NOT IN (SELECT P.color
FROM Products AS P
WHERE P.color IS NOT NULL);

/* EXCEPT */

SELECT color
FROM Colors
EXCEPT
SELECT color
FROM Products;


/* LEFT OUTER JOIN */

SELECT C.color
FROM Colors AS C
LEFT OUTER JOIN Products AS P
ON C.color = P.color
WHERE P.color IS NULL;


While all solutions produce the desired results, using EXCEPT may be the easiest to understand and use. Note that the EXCEPT operator returns distinct values, which works fine in our scenario but may not be correct in another situation.

My intention with this article was to highlight some of the more common mistakes that I've seen people make when coding SQL. The list reflects my direct experience working with different development teams and developers, performing code review sessions, and so on, as well as the issues I see every day on the newsgroups and forums. The list of common mistakes is far from exhaustive, and is presented in no particular order of severity.

Here is the list:

1.NULLs and the NOT IN predicate
2.Functions on indexed columns in predicates
3.Incorrect subquery column
4.Data type mismatch in predicates
5.Predicate evaluation order
6.Outer joins and placement of predicates
7.Subqueries that return more than one value
8.Use of SELECT *
9.Scalar user-defined functions
10.Overuse of cursors

The examples are presented using SQL Server's Transact-SQL dialect, but most of the concepts are valid in any SQL implementation.

Download from the below Link :
http://www.4shared.com/file/145616795/7f59593c/SQL.html