2013-04-25

<> operator

If you read my last article, I had optimized a query by transforming a sub-query in an inner join and reduced drastically the execution time of that query. This is the optimized query:
SELECT Column1, Column2, Column3, Column4, Column5
FROM dbo.wi  
INNER JOIN dbo.qi ON wi.r_object_id = qi.item_id 
INNER JOIN dbo.p ON wi.r_workflow_id = p.r_workflow_id 
INNER JOIN dbo.wk ON wi.r_workflow_id = wk.r_object_id 
CROSS JOIN dbo.dr 
    INNER JOIN dbo.dmi ON dmi.r_component_id = dr.r_object_id
WHERE (p.r_object_id = dmi.r_object_id) 
      AND (wi.r_runtime_state <> 2) 
      AND (wk.r_runtime_state = 1) 
      AND (qi.router_id <> '0000000000000000')
      AND (dr.i_is_deleted = 0) 
      AND (qi.task_state <> 'paused')
ORDER BY Column1

But it can be more optimized. This query handles more than 3 billions records and uses 3 '<>' operators. Substituting those operators for equal operators and then negate it did the job:
SELECT Column1, Column2, Column3, Column4, Column5
FROM dbo.wi  
INNER JOIN dbo.qi ON wi.r_object_id = qi.item_id 
INNER JOIN dbo.p ON wi.r_workflow_id = p.r_workflow_id 
INNER JOIN dbo.wk ON wi.r_workflow_id = wk.r_object_id 
CROSS JOIN dbo.dr 
    INNER JOIN dbo.dmi ON dmi.r_component_id = dr.r_object_id
WHERE (p.r_object_id = dmi.r_object_id) 
      AND NOT (wi.r_runtime_state = 2) 
      AND (wk.r_runtime_state = 1) 
      AND NOT (qi.router_id = '0000000000000000')
      AND (dr.i_is_deleted = 0) 
      AND NOT (qi.task_state = 'paused')
ORDER BY Column1

This new query ran in 5 seconds. Almost in half time the before one (8 seconds). So in two steps I could optimize a query that took 4 minutes and 25 seconds to execute in 5 seconds. 53 times faster and no indexes created!!

I made some more tests with this solution and looks like that you can see only results when you work with massive data. With few thousands records I did not see any difference.
Curious is that there was no changes in the query plan. Means that it is how the SQL Server engine treats the data when use <> or = operators.

2013-04-16

SUBQUERY - be careful when using it

Few years ago I had a call from a client that was complaining about one slow query.
As always I started a SQL Profiler to catch long running queries. It wasn't hard  to find in the trace the following query that took 4 minutes and 25 seconds to run:

SELECT Column1, Column2, Column3, Column4, Column5
FROM dbo.wi  
INNER JOIN dbo.qi ON wi.r_object_id = qi.item_id 
INNER JOIN dbo.p ON wi.r_workflow_id = p.r_workflow_id 
INNER JOIN dbo.wk ON wi.r_workflow_id = wk.r_object_id 
CROSS JOIN dbo.dr 
WHERE (p.r_object_id IN (SELECT r_object_id 
                   FROM dbo.dmi 
                   WHERE (r_component_id = dr.r_object_id)))        
      AND (wi.r_runtime_state <> 2) 
      AND (wk.r_runtime_state = 1) 
      AND (qi.router_id <> '0000000000000000')
      AND (dr.i_is_deleted = 0) 
      AND (qi.task_state <> 'paused')
ORDER BY Column1


The cross join and the sub-query called my attention immediately. Cross join produces the Cartesian product of the tables involved in the join. In this example the cross join returns more than 3 billions rows and each of these rows will be correlated with the sub-select.
I changed that query and transformed the sub-query in inner join clause. The new query:

SELECT Column1, Column2, Column3, Column4, Column5
FROM dbo.wi  
INNER JOIN dbo.qi ON wi.r_object_id = qi.item_id 
INNER JOIN dbo.p ON wi.r_workflow_id = p.r_workflow_id 
INNER JOIN dbo.wk ON wi.r_workflow_id = wk.r_object_id 
CROSS JOIN dbo.dr 
    INNER JOIN dbo.dmi ON dmi.r_component_id = dr.r_object_id
WHERE (p.r_object_id = dmi.r_object_id) 
      AND (wi.r_runtime_state <> 2) 
      AND (wk.r_runtime_state = 1) 
      AND (qi.router_id <> '0000000000000000')
      AND (dr.i_is_deleted = 0) 
      AND (qi.task_state <> 'paused')
ORDER BY Column1

This new query took 8 seconds to run.

The difference is in the number of executions that SQL Server engine needs to make so be very careful when using sub-queries. Only use it if necessary. Whenever you can use inner join instead.

This query can be more optimized but I will let that for my next article.

2013-04-06

SQL Server 2000 migration

The SQL Server 2000 SP4 extended support period is ending in 9th April. If your company still have SQL Server 2000 databases you should migrate them quickly.

Don't forget that you can't migrate SQL Server 2000 databases to SQL Server 2012 in one single step. You need to add an extra step that is migrate from SQL Server 2000 to SQL Server 2005 or 2008 or 2008R2.

If you want to upgrade the full instance, remember that SQL Server 2000 instance need to be with SP4 before you start. And if you are upgrading to SQL Server 2012, in the additional step, SQL Server 2005 should be at least with SP4 and SQL Server 2008 with SP2 and SQL Server 2008R2 with SP1 or later.

In-place migration will be more complex because you may also need to migrate the operating system for supporting earlier versions of SQL Server, so I always recommend side by side migrations.

I also recommend you to visit this MSDN article before you start the migration plan.

Have a nice migration :)