During my routine answer posting on Microsoft forum. I can across a very good blog that discuss on fn_split performance and represent result in graph. as i have already using CLR but this post provide solid evidence please have a lookhttp://sqlblog.com/blogs/aaron_bertrand/archive/2010/07/07/splitting-a-list-of-integers-another-roundup.aspxI want to share this with all my friends. look at this.
RBAR 1
This is the solution I see most often out in the wild. What does RBAR mean? "Row by agonizing row." I'm not sure where I first picked up that term, but because it sounds so much like "FUBAR," I really like it. While most solutions do use loops of some sort, this is the worst kind... take a chunk off the list, do some stuff with it, insert it into the table, assign a new value to the list, then move on to the next part of the list. Here is the function:
CREATE FUNCTION dbo.SplitInts_RBAR_1( @List VARCHAR(MAX), @Delimiter CHAR(1) )RETURNS @Items TABLE( Item INT)AS BEGIN DECLARE @Item VARCHAR(12), @Pos INT; WHILE LEN(@List)>0 BEGIN SET @Pos = CHARINDEX(@Delimiter, @List); IF @Pos = 0 SET @Pos = LEN(@List)+1; SET @Item = LEFT(@List, @Pos-1); INSERT @Items SELECT CONVERT(INT, LTRIM(RTRIM(@Item))); SET @List = SUBSTRING(@List, @Pos + LEN(@Delimiter), LEN(@List)); IF LEN(@List) = 0 BREAK; END RETURN;END
RBAR 2
This is a very slight variation on RBAR1 (can you spot the differences?), but results in about twice as many reads when just outputting the data (the reads are roughly the same when using the data for more practical purposes, like joining with existing tables). For those wishing to stick with RBAR for whatever reason (remember what it sounds like), you may want to compare RBAR 1 and RBAR 2 and see if you can ensure that your functions are not suffering from performance problems you might not know about.
CREATE FUNCTION dbo.SplitInts_RBAR_2( @List VARCHAR(MAX), @Delimiter CHAR(1) )RETURNS @Items TABLE( Item INT)AS BEGIN DECLARE @Item VARCHAR(12), @Pos INT; SELECT @Pos = 1; WHILE @Pos > 0 BEGIN SELECT @Pos = CHARINDEX(@Delimiter, @List); IF @Pos > 0 SELECT @Item = LEFT(@List, @Pos - 1); ELSE SELECT @Item = @List; INSERT @Items SELECT CONVERT(INT, LTRIM(RTRIM(@Item))); SELECT @List = RIGHT(@List, LEN(@List) - @Pos); IF LEN(@List) = 0 BREAK; END RETURN;END
CTE 1
This function uses a recursive CTE, with the first element and the remainder functioning as the anchor query, whittling down each element in turn. It inserts into the @table variable until you reach the last element.
CREATE FUNCTION dbo.SplitInts_CTE_1( @List VARCHAR(MAX), @Delimiter CHAR(1) )RETURNS @Items TABLE( Item INT)AS BEGIN WITH ints(item, remainder) AS ( SELECT item = SUBSTRING(@List, 1, CHARINDEX(@Delimiter, @List)-1), remainder = LTRIM(RTRIM(SUBSTRING(@List, CHARINDEX(@Delimiter, @List) + 1, LEN(@List)))) UNION ALL SELECT item = SUBSTRING(remainder, 1, CHARINDEX(@Delimiter, remainder)-1), remainder = LTRIM(RTRIM(SUBSTRING(remainder, CHARINDEX(@Delimiter, remainder) + 1, LEN(remainder)))) FROM ints WHERE CHARINDEX(@Delimiter, remainder) > 0 ) INSERT @Items SELECT item FROM ints OPTION (MAXRECURSION 0); RETURN;END
CTE 2
This function is similar to the above recursive query, however it uses the positions in the string and fewer LEN() calls to extract each value.
CREATE FUNCTION dbo.SplitInts_CTE_2( @List VARCHAR(MAX), @Delimiter CHAR(1) )RETURNS @Items TABLE (Item INT)AS BEGIN DECLARE @Len INT = LEN(@List) + 1; WITH a AS ( SELECT [start] = 1, [end] = COALESCE(NULLIF(CHARINDEX(@Delimiter, @List, 1), 0), @Len), [value] = LTRIM(RTRIM(SUBSTRING(@List, 1, COALESCE(NULLIF(CHARINDEX(@Delimiter, @List, 1), 0), @Len)-1))) UNION ALL SELECT [start] = CONVERT(INT, [end]) + 1, [end] = COALESCE(NULLIF(CHARINDEX(@Delimiter, @List, [end] + 1), 0), @Len), [value] = LTRIM(RTRIM(SUBSTRING(@List, [end] + 1, COALESCE(NULLIF(CHARINDEX(@Delimiter, @List, [end] + 1), 0), @Len)-[end]-1))) FROM a WHERE [end] < @len ) INSERT @Items SELECT [value] FROM a WHERE LEN([value]) > 0 OPTION (MAXRECURSION 0); RETURN;END
Numbers
This is the solution I am currently using in my production instances of SQL Server 2005 and SQL Server 2008. It uses a numbers table and, up until this week, was the fastest method I had tested to date.
CREATE FUNCTION dbo.SplitInts_Numbers( @List VARCHAR(MAX), @Delimiter CHAR(1) )RETURNS TABLE AS RETURN ( SELECT Item = CONVERT(INT, LTRIM(RTRIM( SUBSTRING(@List, Number, CHARINDEX(@Delimiter, @List + @Delimiter, Number) - Number)))) FROM dbo.Numbers WITH (NOLOCK) WHERE Number <= CONVERT(INT, LEN(@List)) AND SUBSTRING(@Delimiter + @List, Number, 1) = @Delimiter );
Inline 1
This is a similar approach to the numbers table solution (first spotted as written by Peso but with some minor adjustments), without requiring access to a numbers table. This does populate a local table variable in the body of the function, in order to make use of a less complex CTE to generate a sufficient set of integers.
CREATE FUNCTION dbo.SplitInts_Inline_1( @List VARCHAR(MAX), @Delimiter CHAR(1) )RETURNS @Items TABLE( Item INT)AS BEGIN WITH v0 AS ( SELECT n = 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9 UNION ALL SELECT 10 UNION ALL SELECT 11 UNION ALL SELECT 12 UNION ALL SELECT 13 UNION ALL SELECT 14 UNION ALL SELECT 15 ), v1 AS ( SELECT 16 * v0.n AS n FROM v0 ), v2 AS ( SELECT 256 * v0.n AS n FROM v0 ), v3 AS ( SELECT 4096 * v0.n AS n FROM v0 ), v4 AS ( SELECT 65536 * v0.n AS n FROM v0 WHERE n < 2 ) INSERT @Items SELECT Item = CONVERT(INT, (SUBSTRING(@Delimiter + @List + @Delimiter, w.n + 1, CHARINDEX(@Delimiter, @Delimiter + @List + @Delimiter, w.n + 1) - w.n - 1))) FROM ( SELECT n = v0.n + v1.n + v2.n + v3.n + v4.n FROM v0, v1, v2, v3, v4 ) AS w WHERE w.n = CHARINDEX(@Delimiter, @Delimiter + @List + @Delimiter, w.n) AND w.n < LEN(@Delimiter + @List); RETURN;END
Inline 2
Similar to the one above, but this time without the local table variable. You'll see that a lot more numbers had to be hard-coded into the function definition, but this will potentially pay off in terms of reads (stay tuned).
CREATE FUNCTION dbo.SplitInts_Inline_2( @List VARCHAR(MAX), @Delimiter CHAR(1) )RETURNS TABLE AS RETURN ( SELECT Item = CONVERT(INT, (SUBSTRING( @Delimiter + @List + @Delimiter, w.n + 1, CHARINDEX(@Delimiter, @Delimiter + @List + @Delimiter, w.n + 1) - w.n - 1 ))) FROM ( SELECT n = v0.n + v1.n + v2.n + v3.n FROM ( SELECT n = 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9 UNION ALL SELECT 10 UNION ALL SELECT 11 UNION ALL SELECT 12 UNION ALL SELECT 13 UNION ALL SELECT 14 UNION ALL SELECT 15 ) AS v0, ( SELECT n = 0 UNION ALL SELECT 16 UNION ALL SELECT 32 UNION ALL SELECT 48 UNION ALL SELECT 64 UNION ALL SELECT 80 UNION ALL SELECT 96 UNION ALL SELECT 112 UNION ALL SELECT 128 UNION ALL SELECT 144 UNION ALL SELECT 160 UNION ALL SELECT 176 UNION ALL SELECT 192 UNION ALL SELECT 208 UNION ALL SELECT 224 UNION ALL SELECT 240 ) AS v1, ( SELECT n = 0 UNION ALL SELECT 256 UNION ALL SELECT 512 UNION ALL SELECT 768 UNION ALL SELECT 1024 UNION ALL SELECT 1280 UNION ALL SELECT 1536 UNION ALL SELECT 1792 UNION ALL SELECT 2048 UNION ALL SELECT 2304 UNION ALL SELECT 2560 UNION ALL SELECT 2816 UNION ALL SELECT 3072 UNION ALL SELECT 3328 UNION ALL SELECT 3584 UNION ALL SELECT 3840 ) AS v2, ( SELECT n = 0 UNION ALL SELECT 4096 UNION ALL SELECT 8192 UNION ALL SELECT 12288 UNION ALL SELECT 16384 UNION ALL SELECT 20480 UNION ALL SELECT 24576 UNION ALL SELECT 28672 UNION ALL SELECT 32768 UNION ALL SELECT 36864 UNION ALL SELECT 40960 UNION ALL SELECT 45056 UNION ALL SELECT 49152 UNION ALL SELECT 53248 UNION ALL SELECT 57344 UNION ALL SELECT 61440 UNION ALL SELECT 65536 UNION ALL SELECT 69632 UNION ALL SELECT 73728 UNION ALL SELECT 77824 UNION ALL SELECT 81920 UNION ALL SELECT 86016 UNION ALL SELECT 90112 UNION ALL SELECT 94208 UNION ALL SELECT 98304 UNION ALL SELECT 102400 UNION ALL SELECT 106496 UNION ALL SELECT 110592 UNION ALL SELECT 114688 UNION ALL SELECT 118784 UNION ALL SELECT 122880 UNION ALL SELECT 126976 UNION ALL SELECT 131072 UNION ALL SELECT 135168 UNION ALL SELECT 139264 UNION ALL SELECT 143360 UNION ALL SELECT 147456 ) v3 ) w WHERE w.n = CHARINDEX(@Delimiter, @Delimiter + @List + @Delimiter, w.n) AND w.n < LEN(@Delimiter + @List) );
CLR
This was an adaptation of Adam Machanic's CLR approach (as mentioned above), with very minor modifications to return a table of integers instead of strings. I won't re-list his CLR function but will gladly share my modifications. You may want to play with Erland's CLR version as well, which is a lot more straightforward and will potentially scale equally well, at least to moderately-sized strings. Here is the wrapper to the function (I created a StringHelper.dll with several user-defined functions, including SplitInts()):
CREATE FUNCTION dbo.SplitInts_CLR( @List NVARCHAR(MAX), @Delimiter CHAR(1) )RETURNS TABLE( Item INT)EXTERNAL NAME [StringHelper].[UserDefinedFunctions].[SplitInts];GO
XML
Again, this solution was thrown into the mix due to some heavy touting and promise indicated by Brad Schulz. I've always been wary of XML solutions for splitting / concatenating because of the overhead of constructing and deconstructing the XML around the data. Soon we'll see if my concerns are justified.
CREATE FUNCTION dbo.SplitInts_XML( @List VARCHAR(MAX), @Delimiter CHAR(1) )RETURNS TABLE AS RETURN ( SELECT Item = CONVERT(INT, Item) FROM ( SELECT Item = x.i.value('(./text())[1]', 'INT') FROM ( SELECT [XML] = CONVERT(XML, '<i>' + REPLACE(@List, @Delimiter, '</i><i>') + '</i>').query('.') ) AS a CROSS APPLY [XML].nodes('i') AS x(i) ) AS y WHERE Item IS NOT NULL );
I'm leaving out error handling for brevity, and I'm also leaving out any validation that the list consists of only integers -- I'm going to assume that you can perform that kind of checking much more efficiently in your application code, and can add it to the T-SQL side if you see fit. I'm also not worrying at this stage about removing duplicates or about preserving the order that the values were originally defined in the list.
Finally, I wanted to ensure that each solution could handle 50,000 entries in the list; since your needs may be a lot more modest than this, you may want to pare down your entries to improve performance (for example, you don't need as many entries in "Inline 2", and your Numbers table can contain fewer rows - for some solutions, you'll need a numbers table covering 14 * <max number of elements> since an INT can be up to 12 digits and the comma and perhaps a space). In this situation, since I knew all of my values would be <= 2 characters and I didn't have any embedded spaces, I didn't really need a Numbers table quite so large, and I didn't really need to define so many permutations in the "Inline 2" function.
The Testing Process
First, since a few of the solutions require a numbers or tally table, I prepared the following table:
SET NOCOUNT ON;DECLARE @UpperLimit INT;SET @UpperLimit = 256000; WITH n AS ( SELECT x = 1 UNION ALL SELECT x = x + 1 FROM n WHERE x < @UpperLimit)SELECT [Number] = x INTO dbo.Numbers FROM n WHERE x <= 256000 OPTION (MAXRECURSION 0);GO CREATE UNIQUE CLUSTERED INDEX n ON dbo.Numbers([Number]);
Then I started a trace with a filter of TextData LIKE '%dbo.SplitInts%' and with the following columns: TextData, Reads, Writes, CPU, Duration. Once the trace was running, I ran the following code block three times, giving me a good sample of how each solution performs:
DECLARE @List VARCHAR(MAX) = '1,2,3,4,5,6,7,8,9,10,', @Delimiter CHAR(1) = ','; DECLARE c CURSOR FOR SELECT List FROM ( SELECT r = 1, List = REPLICATE(@List, 10) UNION ALL SELECT r = 2, List = REPLICATE(@List, 50) UNION ALL SELECT r = 3, List = REPLICATE(@List, 100) UNION ALL SELECT r = 4, List = REPLICATE(@List, 500) UNION ALL SELECT r = 5, List = REPLICATE(@List, 1000) UNION ALL SELECT r = 6, List = REPLICATE(@List, 2500) UNION ALL SELECT r = 7, List = REPLICATE(@List, 5000) ) AS x ORDER BY r; OPEN c; FETCH NEXT FROM c INTO @List; WHILE @@FETCH_STATUS = 0BEGIN SELECT Item FROM dbo.SplitInts_CLR (@List, @Delimiter); SELECT Item FROM dbo.SplitInts_RBAR_1 (@List, @Delimiter); SELECT Item FROM dbo.SplitInts_RBAR_2 (@List, @Delimiter); SELECT Item FROM dbo.SplitInts_Numbers (@List, @Delimiter); SELECT Item FROM dbo.SplitInts_XML (@List, @Delimiter); SELECT Item FROM dbo.SplitInts_CTE_1 (@List, @Delimiter); SELECT Item FROM dbo.SplitInts_CTE_2 (@List, @Delimiter); SELECT Item FROM dbo.SplitInts_Numbers_1(@List, @Delimiter); SELECT Item FROM dbo.SplitInts_Numbers_2(@List, @Delimiter); FETCH NEXT FROM c INTO @List;END DEALLOCATE c;
I ran the test 10 times, then pulled averages from the trace tables for each function and element count.
The Results
In terms of duration, CLR won every time. Heck, CLR won every time in every category. (This might not come as a big surprise to you. After all, just yesterday, I let the cat out of the bag by declaring that I'd finally given in and am letting the CLR into my life.)
But if you are one of many who are not able to implement CLR functions in your production environments, you may want to pay attention to some of the other methods. In the chart below I've highlighted the winners and losers as follows: longest duration at each interval (red), shortest duration at each interval (light green), and the shortest duration other than CLR (even lighter green). Note that "CTE 1" for 50,000 elements, at over 26,000ms total duration, is way off the chart.
Duration, in milliseconds, for splitting delimited strings of varying numbers of elements
We can see that duration for the CLR method and a few others creep up very slowly, while several of the remaining methods increase at a much more rapid pace. There is also an inexplicable bump for the XML method at 500 and 1,000 elements, where duration is longer than found with 25,000 elements. I thought this may be an anomaly or due to other pressure on the box, but this was repeatable time and time again. We can see that the "CTE 1" method is the absolute worst performer at 5,000 rows and above, and that aside from the CLR method, the numbers table was the winner up to 10,000 elements; beyond that, the "Inline 2" method took over -- but granted, not by much.
As mentioned before, I examined reads, writes, CPU and duration to compare these different splitting techniques. CPU correlated almost identically to duration, so I'm not going to bother repeating a similar chart as above. Writes did not seem to factor in; the only method that performed any writes to speak of was the "CTE 1" function, and based on duration alone we are discarding that as an alternative in any case. Here are the reads (again I've highlighted the winners in green and the losers in red):
Logical reads, in 8K pages, for splitting delimited strings of varying numbers of elements
I didn't graph the reads because, due to the range of values involved, I'd have to do it logarithmically to even present it on your screen, and that would take away some of the "punch" you get with these numbers. :-)
No comments:
Post a Comment