How to Have Standard Event Logging in SSIS and Avoid Traps

April 15, 2015June 16, 2016Aalam Rangi2 Comments

Event logging in SSIS gives a lot of valuable information about the run-time behavior and execution status of the SSIS package. Having a common minimum number of events is good for consistency in reports and general analysis. Lets say your team wants to ensure that all packages must log at least OnError, OnWarning, OnPreExecute and OnPostExecute events. If the package has a DataFlowTask then the BufferSizeTuning should also be logged. The developer can include more events to log as required but these mentioned previously are the minimum that must be included.

You can create pre-deployment checklists or documentation to ensure this minimum logging. Probably you already have that. As the number of developers and/or packages increase, it becomes difficult to ensure consistency in anything, not just for event logging. Therefore documentation, checklists and training are helpful to an extent only. Your requirement could be more complex than my five-event example above and thus more prone to to oversight.

The easiest way to ensure a common logging implementation would be a logging template that has all the minimum event pre-selected. The developer should just need to apply that to the package.

Event Logging in SSIS with a Template

I assume that you are already familiar with the concepts of event logging in SSIS so this post is not going to be a beginners level introduction to event logging. I will rather discuss options to have a minimum standard event logging across SSIS packages and teams with minimal effort. I’ll also mention some traps to avoid.

I am using a demo SSIS package with two Data Flow Tasks and an Execute SQL Task. I have enabled event logging in SSIS for the first few events at the package level for the sake of demonstration. The logging configuration options for the package node (which is the top node) are shown in the first image.

Image 1 - Logging configuration window for the package node — Image 1 – Event logging configuration window for the package node

The logging options at the child container node Data Flow Task 1 are shown in the second image. The configuration for other Data Flow and the Execute SQL Task look the same.

Image 2 - Logging configuration window at the child container node — Image 2 – Event logging configuration window at the child container node

The check marks for the tasks are grayed out which means they are inheriting the logging options from their parent, i.e. the package. To disable logging for a task, remove its check mark in the left tree view window.

TIP: Logging can also be disabled by going to the Control Flow canvas and changing the LoggingMode property of the task to Disabled.

The Trick

Now look at the bottom of the images again. Notice the Load… and Save… buttons? They do exactly what they say. You can set your logging options and save them as an XML template. Later, this XML template can be loaded into other packages to enable the same logging options.

The XML template file has nodes for each event. For example, the logging options for OnError event are saved like this –


-<EventsFilter Name="OnError">

-<Filter>

<Computer>true</Computer>

<Operator>true</Operator>

<SourceName>true</SourceName>

<SourceID>true</SourceID>
<ExecutionID>true</ExecutionID>

<MessageText>true</MessageText>

<DataBytes>true</DataBytes>

</Filter>

</EventsFilter>

Notice that the XML just mentions the event name, not the name of any task. This means that when the template file is loaded, this logging option will be set for any task where the event is applicable. More on this later.

The Traps

The OnError event is a generic event applicable to all tasks. Lets talk about events that are specific to tasks. For example, the BufferSizeTuning event is applicable just to the Data Flow Tasks, not Execute SQL Tasks.

When I proceed to set logging for BufferSizeTuning event, I have to set it individually in the Data Flow Task tree node. Notice the message at the bottom of the second image that says –

To enable unique logging options for this container, enable logging for it in the tree view.

This message is important in the context of saving and loading a template file too. When I save a template file, the logging options of just that tree view node are saved. For example, the BufferSizeTuning event will be saved in the template only if I am at the Data Flow task in the tree view. It will not be saved if I am at the Package or the Execute SQL task in the tree view.

The reverse is also true. When I load a template, its logging options are applied to just that node which I select in the tree view. For example, if I load a template at the Data Flow Task 1, the options will not be applied to the Data Flow Task 2 or the Execute SQL Task. If the template has an event that is not applicable to the task then that event’s settings will be ignored. For example, the BufferSizeTuning event logging option is meant for Data Flow Tasks so it will be ignored for the Execute SQL Task. The fact that non-relevant options are ignored can be helpful for us to consolidate all logging options in a single template file.

Conclusion

A package level Save and Load of a logging template is straight forward. But if you need to have logging for events that are specific to a task type, then consider creating a logging template for each type of task. Also, if your logging configuration requires anything else than the package level settings, remember to load the template for each task in the tree view.

Number of Template Files	How	Pros and Cons
Individual File per Task	Create one template file for each type of task. The file will have events applicable to that task.	Pros – Easier to know what type of tasks have a template and which ones do not.Cons – More files to manage.
Single File for All Tasks	Create a template file for each task. Then copy all event options in a single XML file.	Pros – One file is easier to manage.Cons – Not obvious which tasks are include. Need to put in comments in the XML file.

How to Use Temp Table in SSIS

April 8, 2015June 23, 2016Aalam Rangi8 Comments

Using a temporary table in SSIS, especially in a Data Flow Task, could be challenging. SSIS tries to validate tables and their column metadata at design time. As the Temp table does not exist at the design time, SSIS cannot validate its metadata and throws an error. I will present a pretty straight forward solution here to trick SSIS into believing that the Temp table actually exists and proceed as normal.

Temporary Table Reference Across Two Tasks

To begin with, I will demonstrate that a Temp table can be referenced across two tasks. Add two Execute SQL Tasks in your package. Both of them use the same OLEDB connection. The first task creates a Local Temp table and inserts one row into it. The second task tries to insert one more row in the same table.

TSQL script in the first task –

/* Create a LOCAL temp table*/
IF
(
Object_id('[tempdb].[dbo].[#LocalTable]')
IS NOT NULL
)
DROP TABLE
[tempdb].[dbo].[#LocalTable]
GO

CREATE TABLE [#LocalTable]
(
id INT IDENTITY,
label VARCHAR(128)
);
GO

/* Insert one row */
INSERT INTO [#LocalTable]
(label)
VALUES ('First row');
GO

TSQL script in the second task –

/* Insert one row */
INSERT INTO [#LocalTable]
(label)
VALUES ('Second row');
GO

Invalid Object Name Error

When executed, the SSIS package gives the following error because the second task cannot see the Temp table created in the first task. Local Temp tables are specific to a connection. When SSIS switches from one task to another, it resets the connection so the Local Temp table is also dropped.

Error: 0xC002F210 at ESQLT-InsertSecondRow
, Execute SQL Task: Executing the query
&quot;/* Insert second row */
INSERT INTO [#LocalTable]...&quot;
failed with the following error:
&quot;Invalid object name '#LocalTable'.&quot;.
Possible failure reasons:
Problems with the query
, &quot;ResultSet&quot; property not set correctly
, parameters not set correctly
, or connection not established correctly.
Task failed: ESQLT-InsertSecondRow
Warning: 0x80019002 at SSIS-DemoTempTable:
SSIS Warning Code DTS_W_MAXIMUMERRORCOUNTREACHED.
The Execution method succeeded, but the number
of errors raised (1) reached the maximum
allowed (1); resulting in failure. This occurs
when the number of errors reaches the number
specified in MaximumErrorCount. Change the
MaximumErrorCount or fix the errors.
SSIS package &quot;SSIS-DemoTempTable.dtsx&quot;
finished: Failure.

The Fix

The fix is pretty simple. Right-click on the OLEDB connection manager and go to the Properties window. Change the RetainSameConnection property to True. This will force the connection manager to keep the same connection open. I have another post with more details about the RetainSameConnection property of OLEDB connection managers.

This fixes the error and the package executes successfully.

Temporary Table in a Data Flow Task

Now let me demonstrate that a Temp table can be used in a Data Flow Task.

Add a Data Flow Task to the package.

In the Data Flow task, add an OLEDB source that will use the same OLEDB connection as used by the Execute SQL Tasks earlier. In the OLEDB Source Editor window, there is no way to find our Local Temp table in the list so close the Editor window.

The Development Workaround

Open a SSMS query window and connect to the SQL Server used in the OLEDB connection. Now create a Global Temp table with the same column definition. You can just copy the CREATE TABLE script and add one more # symbol to the table name.

The Global Temp table is just a development workaround for the restriction imposed by the volatility of the Local Temp table. You can even use an actual physical table instead of the Global Temp table. We will switch to the Local Temp table in the end of this post and then the Global Temp table (or the actual physical table) can be dropped.

Script –

/* Create a GLOBAL temp table
with the same schema as the
earlier LOCAL temp table.
Note the ## in the table name */

CREATE TABLE [##LocalTable]
(
id INT IDENTITY,
label VARCHAR(128)
);
GO

Come back to the SSIS Control Flow. Create a new package scoped variable of String data type. Give it the name TableName and put the Global Temp table name as its value.

Go to the Data Flow > OLEDB Source and double click to open the OLEDB Source Editor window. Choose the Data Access Mode as Table name or view name variable. In the Variable name drop-down, choose the new variable that we created. This means that now the OLEDB Source is going to use the GLOBAL Temp table. Of course, it is not the same as the LOCAL Temp table but we will get to that in a minute. Click on the Columns tab to load the table metadata. Then click on OK to close the OLEDB Source Editor.

Now add a Flat File Destination and configure its properties. I’ll not go into those details. Please let me know in the comments or via email if you need information on how to configure a Flat File Destination.

The final Data Flow Task looks like this.

You can execute the package now to verify if it runs successfully. Although it will run fine, the flat file will not have rows because the source of the data is the Global Temp table, not the Local Temp table populated by the Execute SQL Tasks.

A Global Temp table (or a physical table) is common to all users so it could cause issues in multi-user environments. Local Temp tables are specific to a connection, hence more scalable. All that is needed now is to remove one # in the variable value and the OLEDB Source will point to the correct Local Temp table. To clean up, you can drop the Global Temp table.

The flat file will have the rows inserted by the Execute SQL Tasks.

Avoid Validation

Subsequent runs of the package will show validation errors because the Local Temp table is not available when the package starts. To go around this, you can set the DelayValidation property of the package to TRUE. As the package is the parent container for all other tasks, this property will be applied to all tasks in the package. If you do not wish to disable validation for all tasks, then you can set it for individual tasks, i.e. the first Execute SQL Task and the Data Flow Task. Again, the Data Flow Task may contain multiple sources, destinations and transformations and you may not want to disable validation for all of them. In that case you can be more granular and set just the ValidateExternalMetadata property of the OLEDB Source to FALSE.

Some scheduled jobs send failure emails upon an error and DBAs need to remember to fix the job and re-execute it. I use filter rules in MS Outlook to organize my inbox by redirecting all the scheduled job emails to a separate folder. In addition to that, I use conditional formatting rules to keep track of action items as pending or done. The conditional formatting rules highlight the job failure emails with the red color and turn them green when marked complete as shown in the following image.

Create a Conditional Formatting Rule to Highlight a Failed Job Email in Red:

Navigate to the folder where you would like to modify the view. You can even customize your main Inbox folder view, if that is where your emails are. Right-click on the header row and go to View Settings.

Click on Conditional Formatting.

In the Conditional Formatting window, click on Add and rename the new untitled rule. Then change the font color to red and click on Condition.

In the Filter window, put a sufficiently unique text phrase from the error email that can distinguish it from good emails. Also specify in the drop-down where exactly the filter should look for the text phrase i.e. just in the email subject or, email body or both.

My emails have the following text in the email body that I can use for identification –

STATUS: Failed

Now this a simple example with a text phrase for filtering but you can use other criteria in More Choices and Advanced tabs too for more complex scenarios as necessary. If you get emails with different text patterns then you can always create a new rule for each phrase.

That is it. Click on OK to come back to your inbox and verify that the error emails have a red font.

Create a Conditional Formatting Rule to Highlight a Failed Job Email in Green After Resolution:

Now we want another rule that will change the red email to a green one when it is marked as complete. Add another Conditional Formatting rule, rename it and change the font color to green and click on Condition.

In the filter window, follow the same steps as in the previous red rule. Then go the More Choices tab.

In the More Choices tab, check the Only Items which: and select the drop-down value of are marked complete.

Back in the Conditional Formatting window, move the green job up. This is important otherwise the red rule will override the effects of the green rule.

The final window will look like this.

Now when ever the issue from the failure email is resolved, just right-click on the email, go to Follow Up, select Mark Complete. The email will turn green.

How To Connect SSMS to ALWAYSON Read-Only Secondary Database

February 16, 2015Aalam Rangi4 Comments

The databases in PRIMARY availability group can be used for read-write access. The databases in the SECONDARY availability group can be used just for read-only access.

An attempt to connect to a SECONDARY availability group database with a normal connection, which is read-write by default, shows the following error message –

Msg 978, Level 14, State 1, Line 1

The target database ('AGDemoDB') is in an
availability group and is currently accessible
for connections when the application intent is
set to read only. For more information about
application intent, see SQL Server Books Online.

To resolve the issue, the connection string needs to have the Application Intent = ReadOnly parameter. How do you pass parameters in a SSMS connection?

SSMS has many options that are not too obvious. One of them is to provide additional connection parameter options. All that is needed to resolve the above error is to use the Additional Connection Parameters screen in the connection dialog and put the parameter there.

The keyword should not have any spaces.

Further reading:

The AlwaysOn Professional MSDN blog has more examples of connection strings for various applications.

You may also like to review the Application Intent Filtering feature of AlwaysOn at this and this link.

How to Avoid Orphan Users in SQL Server ALWAYSON, Create Logins Correctly

February 9, 2015Aalam Rangi9 Comments

SQL Server logins are stored in the [master] database. System databases (master, model, msdb and tempdb) cannot be included in an availability group so a login created on the PRIMARY replica will not show up on the SECONDARY replica automatically. It has to be created manually on the SECONDARY replica.

Logins have an SID associated with them. When a Windows authentication login is created in SQL Server, it uses the SID from the Windows Active Directory. So the SID for such a login will be the same on every SQL Server in the network domain.

When a SQL authentication login is created, it gets a new auto-generated SID assigned by the SQL Server. This auto-generated SID will be different in each SQL Server even if the login name and the password combination are the same.

The database users are mapped to the logins internally using the SID, not the login name/user name. There are some situations where the SID may have a mismatch, e.g. when a database is restored to a different server where the supposedly matching login already exists, or a login is dropped and recreated without consideration to the mapped users, or a login is recreated between a database detach and reattach etc. Such users that do not have a login with a matching SID are known as orphan users. This SID mismatch means that although an application or a user can connect to the server using the login, but it can not access the database.

The following options with varying degree of effectiveness can be used to fix the SID mapping between a login and an orphan user –

Drop and recreate the user in the restored database. Of course the user permissions will get deleted too and have to be granted again.
Drop and recreate the login with same SID as the restored database. This is a definite no-no if there are other databases on the server linked to that login. It will only complicate matters.
Run the system stored procedure sp_change_users_login. It has parameters to just report, fix one or fix all orphans. But the stored procedure is now marked as deprecated so there are no guarantees of future availability.
ALTER the user (there are some restrictions, like there cannot be a one-to many mapping etc.) –
```
USE [MyUserDBName]
GO
ALTER USER someuser WITH LOGIN = somelogin
```
As the user databases on the SECONDARY replica are read-only, the role of the SECONDARY server has to be changed to PRIMARY by doing a failover before the above ALTER command can be executed.

All of this can be avoided if the login is created correctly on the SECONDARY replica. We just have to make sure that the SID for the login on the SECONDARY replica matches the PRIMARY replica.

Let us begin with creating a new login on the PRIMARY replica of an existing Availability Group –

/* On the PRIMARY replica */

-- Create the Login
USE [master]
GO
CREATE LOGIN [TestLogin]
WITH PASSWORD=N'abc123#'
, DEFAULT_DATABASE=[master]
GO

Grant privileges to this login if necessary. They will NOT automatically replicate to the SECONDARY replica. Now find the SID of this new login.

-- Get the SID for the new Login
SELECT name, sid 
FROM sys.server_principals 
WHERE name = 'TestLogin'

/* Results:
name          sid
TestLogin     0x8EA0E033BD83524180CF813A20C5265B
*/

On the SECONDARY replica, create the login with the same SID. The GUI wizard to create logins does not have this feature to specify the SID, so the login has to be created using TSQL with an additional parameter.

/* On the SECONDARY replica */
-- Create the Login with the same SID as
-- the PRIMARY replica
CREATE LOGIN [TestLogin]
WITH PASSWORD=N'abc123#'
-- use the SID retrieved above
, SID = 0x8EA0E033BD83524180CF813A20C5265B
, DEFAULT_DATABASE=[master]
GO

Grant the same privileges to this login as done on the PRIMARY replica.

Now go back to the PRIMARY replica and create database user mapped to the login and grant required permissions at the database level. This new database user will be automatically replicated on the SECONDARY replica with its permissions and correctly map to the login. No action on SECONDARY required because the user database is in an Availability Group that is synced across replicas.

Further reading:
For the sake of completeness, I must mention the widely cited KB 918992 article (How to transfer logins and passwords between instances of SQL Server) which provides a stored procedure [sp_rev_login] to move the logins from one server to another. This stored procedure generates the CREATE LOGIN script with the password hash and the SID. You would need that stored procedure only if you don’t have access to the clear text passwords or, if you want to include that script as a scheduled job but not hardcode the password in the job. If you do not have those constraints then you can simply use the steps described in my blog post here.

How to Log SSIS Variable Values During Execution in the Event Log

December 11, 2014August 31, 2016Aalam Rangi1 Comment

Log entries created during the execution of an SSIS package help in monitoring, analysis, and issue resolution. In addition to logging events, you might want to capture the run-time values of variables in the package. I couldn’t find a native feature to log SSIS variable values so the following post shows how I did it.

The Setup

For this demo, I’ll do an INSERT operation on a table named TableA and use variables to save the before and after INSERT row count. The TableA is always blank at the beginning of the package i.e. the row count is zero. The Execute SQL Task named ESQLT-InsertRowsInTableA inserts 502 rows in TableA. The simple package looks like the image below. If you are wondering about the funny prefixes in the object names then you can read about my naming conventions mentioned in my other blog post.

I have two integer variables named rcTableA_PreRefresh and rcTableA_PostRefresh scoped at the package level.

Enable the Event

A log entry is generated when an event is triggered. Each object in SSIS has its own events that can be logged. I’ll use the event called OnVariableValueChanged, which as the name denotes, is triggered whenever the value of the variable changes. This event is disabled by default. To enable it, go to the Properties window of the variable and make the RaiseChangedEvent property to True. It must be enabled for each variable individually.

Next, I include the OnVariableValueChanged event in the logging configuration. It has to be included at the container level where the variables are scoped to. In my case, at the package level. I’m using the SSIS Log Provider for SQL Server in this package.

Then I execute the package and look at the [dbo].[sysssislog] table for the log entries.

There are some log entries but something is missing. I see the OnVariableValueChanged event logged for the Post Refresh variable but not the Pre Refresh variable.

The reason is that the initial value of the variable is set to zero in the package. The row count of a brand new empty table is also zero. So there was no change in variable value. The OnVariableValueChanged event fires only when the value actually changes! Overwriting with the same value doesn’t fulfill this condition.

To resolve that, I change the initial values in the package to -1. Now even if the row count turns out to be zero, the variable value will still change from -1 to zero. The COUNT function can’t count below zero, can it?

I run the package again and check out the [dbo].[sysssislog] table.

Things are better. The OnVariableValueChanged event for both the variables show up in the log. But the variable values are still not there.

Log SSIS Variable

The reason for missing values is that the event logging just captures the fact that the variable value changed. It doesn’t capture the value by itself. I’ll make an addition to the event handler to get the variable values too. I add an Execute SQL Task to the package level event handler for OnVariableValueChanged event.

The General tab of the Execute SQL Task has the following properties and SQL command –

INSERT INTO [dbo].[sysssislog]
([event]
,[computer]
,[operator]
,[ source]
,[sourceid]
,[executionid]
,[starttime]
,[endtime]
,[datacode]
,[databytes]
,[message])
VALUES
('*SSIS-OnVariableValueChanged' -- Custom event name
,? -- param 0
,? -- param 1
,? -- param 2
,? -- param 3
,? -- param 4
,? -- param 5
,? -- param 6
,0 -- Zero
,'' -- Blank string
,?) -- param 7

Notice that I precede the custom event name with an asterisk to differentiate it from the log entries created by the system.

The Parameter Mapping tab of the Execute SQL Task has the following properties –

Pay attention to the System::VariableValue (last variable) in this screen. Its data type is LONG, which is appropriate for the numeric row counts in my example. You may have different data types for your variables. Do adjust the data type and length appropriately. Using a wrong type could lead to no value logged at all.

I run the package again and this time, the variable values are also logged in the table.

Summary

A single event handler will take care of all variables in that scope. In my case, two package scoped variables are handled by a single package level event handler.

The variable value really has to change to fire the event.

The OnVariableValueChanged event is triggered for the container that has the variable in its scope. The container triggering the change in variable value could be different than the container that has the variable in its scope. In my demo, the variables were scoped to the package. Even though an Execute SQL Task is changing the variable values, I still put the event handler at the package level. As another example, assume there is a variable declared in the scope of a ForEachLoop container and there is a Script Task in the ForEachLoop. The Script Task changes the variable value. The OnVariableValueChanged event will be triggered for the ForEachLoop task.

I have used the default [sysssislog] logging table to log my variable values. You can easily use a different custom table by changing the OLEDB connection and making appropriate changes to the INSERT statement.

[jetpack_subscription_form subscribe_text=”If you liked this post then please subscribe to get new post notifications in email.”]

TSQL Gotcha – Order of expressions for a range search with BETWEEN

November 27, 2014Aalam RangiLeave a comment

Do you trust your users to always pass range search parameters in the correct order? Or do the users trust that the code will take care of a small thing like parameter ordering?

If you have a stored procedure or a script that accepts two parameters to do a range search using a BETWEEN keyword (or maybe with a >= and <=) then it is important to verify that the start expression and end expression are in the correct order. The correct order for numeric values is smaller value before a larger value. The correct order for character values is the dictionary order. A user might pass on the expression values the wrong way around and see no results returned.

The following demonstration has a small check for the correct order of values and to do a quick reorder to fix any issues. To help things a bit more, instead of using generic variable names like @param1 and @param2, they can be made self-documenting by being descriptive like @begin_param and @end_param.

Demonstration with numeric values:

declare @DemoTableNumeric table 
(col1 int)

-- Insert 10 rows in the demo table
insert into @DemoTableNumeric (col1)
values 
(1),(2),(3),(4),(5),
(6),(7),(8),(9),(10)

-- verify data
select * 
from @DemoTableNumeric

declare 
@param1 int, 
@param2 int

-- Assign values
-- Note: Param1 &gt; Param2
select 
@param1 = 7, 
@param2 = 4

-- The following return zero rows
-- because the first expression is
-- greater than the second expression
-- It is a wrong order of values.
select * 
from @DemoTableNumeric
where 
col1 between @param1 and @param2

select * 
from @DemoTableNumeric
where 
col1 &gt;= @param1 and col1 &lt;= @param2

-- It is important to verify the expression
-- values and reorder them if necessary
-- The following IF condition does that
if @param1 &gt; @param2
begin
	declare @temp int
	set @temp = @param1
	set @param1 = @param2
	set @param2 = @temp
end

-- Now both queries return rows
select * 
from @DemoTableNumeric
where 
col1 between @param1 and @param2

select * 
from @DemoTableNumeric
where 
col1 &gt;= @param1 and col1 &lt;= @param2

Demonstration with character values:

declare @DemoTableChar table 
(col1 varchar(10))

insert into @DemoTableChar (col1)
values
('Alpha'),
('Golf'),
('Kilo'),
('Oscar'),
('Tango'),
('Zulu')

-- verify data
select * 
from @DemoTableChar

declare 
@param1 varchar(10), 
@param2 varchar(10)

-- Note: Param1 &gt; Param2
select 
@param1 = 'Tango', 
@param2 = 'Golf'

-- This returns zero rows
-- because the first expression is
-- greater than the second expression
-- It is a wrong order of values.
select * 
from @DemoTableChar
where 
col1 between @param1 and @param2

select * 
from @DemoTableChar
where 
col1 &gt;= @param1 and col1 &lt;= @param2

-- It is important to verify the expression
-- values and reorder them if necessary
if @param1 &gt; @param2
begin
	declare @temp varchar(10)
	set @temp = @param1
	set @param1 = @param2
	set @param2 = @temp
end

-- Now both queries return rows
select * 
from @DemoTableChar
where 
col1 between @param1 and @param2

select * 
from @DemoTableChar
where 
col1 &gt;= @param1 and col1 &lt;= @param2

Suppress the Error Number, Severity Level and State Number in the Error output

September 2, 2014Aalam RangiLeave a comment

For any reason, if you don’t want to show the Error Number, Severity Level and the State Number along with the Error Message, use the Severity Level 0 (Zero) or 10. The Severity Number 10 is converted to Zero internally.

Demonstration:

A severity number except 0 or 10 displays the Error Number, Severity and State information.

A severity number 0 or 10 suppresses the Error Number, Severity and State information.

Further Reading:

Database Engine Error Severities
http://msdn.microsoft.com/en-us/library/ms164086(v=sql.105).aspx

Find Top 25 Inefficient Query Plans by CPU, IO, Recompiles, Execution Count

June 2, 2014Aalam RangiLeave a comment

Microsoft TechNet Gallery is a treasure trove of scripts that can save you a lot of coding time or sometimes introduce creative ways of solving a challenge.

I came across this collection of scripts that will show you the top 25 inefficient query plans (in XML format) sorted by CPU, IO, recompiles, execution counts etc.

Download: http://gallery.technet.microsoft.com/Find-inefficient-query-88f4611f

TSQL Gotcha – Putting a Condition in WHERE or ON clause: Does it Matter in TSQL?

May 12, 2014Aalam RangiLeave a comment

A filter condition can be syntactically put in the WHERE clause of the query or the ON clause too. But does it matter?

No, it does not matter for an INNER join. I found this one example of an exception in Martin Smith’s comment on Stack Overflow where using the GROUP BY ALL makes a difference in results. GROUP BY ALL is now a deprecated syntax (Feature ID 169) so probably we can ignore this exception. But it is always good to know.

On the other hand, Yes, it could affect the output in any of the OUTER joins.

This happens because when a filter criteria is made a part of the JOIN condition, it is applied at the time of reading the tables to identify the rows for the JOIN. The rows that do not satisfy the filter criteria are not included for the purpose of the JOIN and that causes NULLs in the OUTER JOINs.

However, with the filter criteria a part of the WHERE clause, it is applied after the query has been processed and the adhoc dataset has been created internally. The final dataset is reduced to just those rows which meet the filter criteria.

I will use a criteria of c.isactiveYN = ‘Y’ for demonstration below. Rows with a value of ‘Y’ are used for the join. The rows with ‘N’ or NULLs don’t match the criteria so they are left out.

As the examples show below, this seemingly minor difference can cause substantial difference in the query output of OUTER JOINs. This effect could cascade on to other subsequent tables in bigger queries that are joined to the original tables.

-- Create demo tables and insert some data
declare @customer table 
(id int, 
cname varchar(20), 
isactiveYN char(1))

declare @order table 
(id int, 
customerid int, 
orderdate date)

insert into @customer 
values
(1, 'Bob',   'Y'),
(2, 'Harry', 'N'),
(3, 'Max',   'Y')

insert into @order
values
(1, 1, '2014-1-1'),
(2, 1, '2014-2-2'),
(3, 2, '2014-3-3'),
(4, 2, '2014-4-4'),
(5, 3, '2014-5-5'),
(6, 3, '2014-6-6')

select * from @customer
select * from @order

Results :

Demonstration: The location of criteria has no effect on the results in INNER JOIN.

-- A simple INNER JOIN without any 
-- conditions to display all data:

-- INNER JOIN with out any conditions
select * 
from 
  @customer as c
inner join 
  @order as o
  on c.id = o.customerid

-- The following two queries differ
-- in the location of the criteria
-- but still show similar results

-- INNER JOIN with filter in WHERE clause
select * 
from 
  @customer as c
inner join 
  @order as o
  on c.id = o.customerid
where
  c.isactiveYN = 'Y'

-- INNER JOIN with filter in ON clause
select * 
from
  @customer as c
inner join 
  @order as o
  on c.id = o.customerid
  and c.isactiveYN = 'Y'

Results : The second and third results are similar.

Demonstration: LEFT OUTER joins with the condition in the WHERE clause and the ON clause

-- LEFT JOIN: Filter in WHERE clause
select * 
from 
  @customer as c
left outer join 
  @order as o
  on c.id = o.customerid
where 
  c.isactiveYN = 'Y'

-- LEFT JOIN: Filter in ON clause
select * 
from 
  @customer as c
left outer join
  @order as o
  on c.id = o.customerid
  and c.isactiveYN = 'Y'

Results : “Harry” is not in the first result-set because the row was filtered out by the WHERE clause. “Harry” is in the second result-set because it comes from the LEFT OUTER table but there is no matching row because it was not used for the JOIN.

Demonstration: RIGHT OUTER joins with the condition in the WHERE clause and the ON clause

-- RIGHT JOIN: Filter in WHERE clause
select * 
from 
  @customer as c
right outer join 
  @order as o
  on c.id = o.customerid
where 
  c.isactiveYN = 'Y'

-- RIGHT JOIN: Filter in ON clause
select * 
from 
  @customer as c
right outer join 
  @order as o
  on c.id = o.customerid
  and c.isactiveYN = 'Y'

Results : “Harry” is not in the first result-set because the row was filtered out by the WHERE clause. “Harry” is not in the second result-set because it has been filtered out and it is also not in the OUTER table.

Summary: When the criteria is in the WHERE clause, The results of the INNER, LEFT OUTER and the RIGHT OUTER queries are the same. Differences show up in the results of LEFT and RIGHT OUTER queries when the criteria is in the ON clause.

SQLErudition.com

Learning SQL Server

SQL Server