Normally I write in German, but I hope that maybe some Microsoft Engineer might read it and put the problem on their schedule. So, I apologise to my German readers for writing in poor English… 😉
When the SQL Server optimizer internally transforms a SQL Statement to an other, it normally results in a better execution plan. When You write for example a subquery the optimizer tries to make a join instead. This is fine as long a the result set is the same. But now we discovered a situation when this is not true: If you use a non-deterministic function in a query, the optimizer use the same query plan as if the function were deterministic. Let us examine this simple example:
We have a table with 5 rows: the key consists of the tupel (ID1,ID2). We add a new column to add a GUID later.
ID1 |
ID2 |
GUID |
1 |
1 |
NULL |
1 |
2 |
NULL |
2 |
1 |
NULL |
2 |
2 |
NULL |
3 |
1 |
NULL |
For each different value of ID1 we want to add a GUID with the non-deterministic function NewID():
SELECT
ID1
, NEWID() as [NewId]
FROM Table1
GROUP BY ID1
This results in something like:
ID1 |
NewId |
1 |
B30AD595-926B-40F6-A815-D8871C81CE89 |
2 |
6842624D-DC40-4A3F-A2EB-A8178814D12E |
3 |
12E9F691-7F8B-41F8-829B-8CED5E26C127 |
Now, we want to use the above query to assign the new values to the column GUID. To make the execution plan easier to understand, I used a SELECT, not an UPDATE. But the result is the same:
SELECT
t.ID1
, sub.NewId
FROM (SELECT
ID1
, NEWID() as [NewId]
FROM Table1
GROUP BY ID1) as sub
JOIN Table1 as t
ON t.ID1=sub.ID1
It should result in something like this:
ID1 |
NewId |
1 |
B30AD595-926B-40F6-A815-D8871C81CE89 |
1 |
B30AD595-926B-40F6-A815-D8871C81CE89 |
2 |
6842624D-DC40-4A3F-A2EB-A8178814D12E |
2 |
6842624D-DC40-4A3F-A2EB-A8178814D12E |
3 |
12E9F691-7F8B-41F8-829B-8CED5E26C127 |
But this is not the case. The result is:
ID1 |
NewId |
1 |
B30AD595-926B-40F6-A815-D8871C81CE89 |
1 |
4EC62D6E-8ABC-4563-8A81-2DB8E655D3CA |
2 |
6842624D-DC40-4A3F-A2EB-A8178814D12E |
2 |
73B7386E-688D-48D2-A585-C66C6911EFC3 |
3 |
12E9F691-7F8B-41F8-829B-8CED5E26C127 |
Do you see the difference? The reason for this is the wrong execution plan:
|--Compute Scalar(DEFINE:([Expr1004]=newid()))
|–Merge Join(Inner Join, MERGE:([tempdb].[dbo].[Table1].[ID1])=([t].[ID1]), RESIDUAL:([tempdb].[dbo].[Table1].[ID1]=[tempdb].[dbo].[Table1].[ID1] as [t].[ID1]))
|–Sort(DISTINCT ORDER BY:([tempdb].[dbo].[Table1].[ID1] ASC))
| |–Table Scan(OBJECT:([tempdb].[dbo].[Table1]))
|–Sort(ORDER BY:([t].[ID1] ASC))
|–Table Scan(OBJECT:([tempdb].[dbo].[Table1] AS [t]))
The optimizer choose to do the join first and to execute the function for each row. This is not correct.
So the general rule I recommend: Do not use non-deterministic functions in complex SQL queries. The result depends on the chosen query plan. Today the result may be correct, but if the row numbers change or you add an index the result may get incorrect! This is not as it should be.
Unfortunately I was not able to convince the German support that this is a bug. They send me a workaround like this:
DECLARE @Rows AS BigInt;
SELECT @Rows = Count(*) FROM Table1;
SELECT
t.ID1
, sub.NewId
FROM (SELECT TOP(@Rows)
ID1
, NEWID() as [NewId]
FROM Table1
GROUP BY ID1
ORDER BY NewId) as sub
JOIN Table1 as t
ON t.ID1=sub.ID1
The performance is very poor and I would recommend using a temporary table to store the new GUIDs instead, if you have a big row number. This seems to perform a little better.
Update (21.5.2008): Encouraged by Christoph I send the problem to Craig Freedman. He answered the very next day, that he presented the problem to the developer responsible for this code for his comments on that question. Here is the response of the developer Craig forwarded to me:
In general, SQL Server does not guarantee the timing of execution of scalar operators. For non-deterministic scalars (built-in and user-defined), that means the timing semantics (number of times executed and when) is not defined. In addition, it may change from one plan to another, or from one release to the next.
Craig added:
Unfortunately, I am not aware of (and was unable to identify) a better workaround than simply storing the GUIDs in a temp table before joining with the original table.
(My Thanks to Craig! I am very impressed that he ask the responsible developer. And thanks to the unknown developer.)
Update (11.6.2008): Itzik Ben-Gan did it! Microsoft confirmed to him this is a bug.
Currently I read the very good book "T-SQL Querying" written by Itzik. In the very first chapter he described that the optimizer can make shurtcuts in the physical execution plan if the result set is the same like using the logical execution plan. I asked him about his opinion about this problem and he made Microsoft to accept that this is a bug. He wrote:
I got back a response from Microsoft saying it is a bug, and apparently a regression from SQL Server 2000.
If you think Microsoft should fix this bug, then "vote" for this bug, please. The more people vote, the more likely MS will fix it.
Update (19.6.2008): Thanks for your support. The MS Developer Jim answered to this bug:
Thankyou for this bug report. One of the most interesting discussions around.
This hits to the very heart of the issue – is optimization allowed to change a program's semantics? Ie: if a program yields certain answers, but runs slowly, is it legitimate for a Query Optimizer make that program run faster, yet also change the results given?
Before shouting "NO!" (my own personal inclination too :-), consider: the good news is that, in 99% of cases, the answers ARE the same. So Query Optimization is a clear win. The bad news is that, if the query contains side-effecting code, then different plans CAN indeed yield different results. And NEWID() is one such side-effecting (non-deterministic) 'function' that exposes the difference. [Actually, if you experiment, you can devise others – for example, short-circuit evaluation of AND clauses: make the second clause throw an arithmetic divide-by-zero – different optimizations may execute that second clause BEFORE the first clause] This reflects Craig's explanation, elsewhere in this thread, that SqlServer does not guarantee when scalar operators are executed.
So, we have a choice: if we want to guarantee a certain behavior in the presence of non-deterministic (side-effecting) code – so that results of JOINs, for example, follow the semantics of a nested-loop execution – then we can use appropriate OPTIONs to force that behavior – as UC points out. But the resulting code will run slow – that's the cost of, in effect, hobbling the Query Optimizer.
All that said, we are moving the Query Optimizer in the direction of "as expected" behavior for NEWID() – trading off performance for "results as expected".
[…]
Anyhow, this bug is now assigned to the QO Dev team for a deeper look.
I presume "QO Dev team" is the "Query Optimizer Development team". 😉
Update (27.7.2008): In the last issue of the SQL Server Magazine Newsletter Itzik Ben-Gan wrote some bad news about this bug:
I posted the bug on Microsoft Connect (FeedbackID=350485), and after consideration, Microsoft decided to close the item and mark it as “Won’t Fix”. The reasoning behind the decision not to fix the bug is that in the vast majority of the cases, the optimization aspects that lead to the bug yield better performance without sacrificing the correctness of the query, and if you fall into one of the unusual cases where the correctness of the query is compromised, you can consider alternatives (e.g., physically materialize the data along with the NEWID values in a table).
This was quite unexpected for me. But I think there is nothing we can do about it. Thats a pity…