I wrote two blogs about Skewness and Kurtosis, one with T-SQL and one with C# UDA solution. I claimed the CLR solution performs better, because it needs only one pass through the data, and the T-SQL one needs two passes. This is correct, considering my solutions, the CLR one is twice as fast as the T_SQL one. But…

Steve Kass, and extraordinary SQL Server MVP (take a look at his site), has turned my attention on the fact I can use the same technique as I used for the UDA in T-SQL as well. And as always when he makes a remark, he is correct. Bellow you can see all three Skewness solutions together.

 

— Skewness

SELECT Skew = SUM(((Age*1.0m.mean)/m.[StDev])*((Age*1.0m.mean)/m.[StDev])*((Age*1.0m.mean)/m.[StDev])) * MIN(m.corrfact1)

  FROM vTargetMail v CROSS JOIN

       (SELECT mean = AVG(Age*1.0), [StDev] = STDEV(Age),

               corrfact1 = COUNT(*)*1.0 / (COUNT(*)-1) / (COUNT(*)-2)

          FROM vTargetMail v) AS

 

— Steve Kass

SELECT

   (rx3 3*rx2*av + 3*rx*av*av rn*av*av*av)

   / (stdv*stdv*stdv) * rn / (rn1) / (rn2)

FROM (

   SELECT

     SUM(Age) AS rx,

     SUM(Age*Age) AS rx2,

     SUM(Age*Age*Age) AS rx3,

     COUNT(Age) AS rn,

     STDEV(Age) AS stdv,

     AVG(Age) AS av

   FROM (SELECT 1e0*Age AS Age FROM vTargetMail) AS s

     ) AS s

 

— UDA – note it has to be deployed first

SELECT dbo.Skew(Age)

  FROM vTargetMail

GO

 

In SSMS, turn the execution plan on and run in the context of the AdventureWorksDW databases all three solutions in a single batch. just check the first number – query cost relative to the batch for each query. The first query (my first T-SQL solution) cost is 50% of the batch cost; (the second query (Steve’s T-SQL solution) cost is 25% of the batch cost; and the third query (my C# UDA) solution cost is 25% of the batch cost. So you can see you can get the same efectiveness in T-SQL as with UDA in this case. Finally, bellow is similar code for Kurtosis. You can check the performance of the solutions in same way as I showed for Skewness.

 

— Kurtosis

SELECT Kurt = SUM(SQUARE(SQUARE(((Age*1.0m.mean)/m.[StDev])))) * MIN(m.corrfact2) MIN(m.subfact)

  FROM vTargetMail v CROSS JOIN

       (SELECT mean = AVG(Age*1.0), [StDev] = STDEV(Age),

               corrfact2 = COUNT(*)*1.0 * (COUNT(*)+1) / (COUNT(*)-1) / (COUNT(*)-2) / (COUNT(*)-3),

               subfact = 3.0 * SQUARE((COUNT(*)-1)) / (COUNT(*)-2) / (COUNT(*)-3)

          FROM vTargetMail v) AS

 

— Steve Kass

SELECT

   (rx4 4*rx3*av + 6*rx2*av*av 4*rx*av*av*av + rn*av*av*av*av)

   / (stdv*stdv*stdv*stdv) * rn * (rn+1) / (rn1) / (rn2) / (rn3)

   3e0 * (rn1) * (rn1) / (rn2) / (rn3)

FROM (

   SELECT

     SUM(Age) AS rx,

     SUM(Age*Age) AS rx2,

     SUM(Age*Age*Age) AS rx3,

     SUM(Age*Age*Age*Age) AS rx4,

     COUNT(Age) AS rn,

     STDEV(Age) AS stdv,

     AVG(Age) AS av

   FROM (SELECT 1e0*Age AS Age FROM vTargetMail) AS s

     ) AS s

— UDA – note it has to be deployed first

SELECT dbo.Kurt(Age)

  FROM vTargetMail

GO

 

 

 

Dejan Sarka