In just couple of days, I have been asked twice about the same problem ‚Äď how to get the Logistic Regression formula from the mining model content. Terefore, it makes sense to write a blog about itJ

Well, you can use a hard and a simple solution ūüôā

A hard solution would be to go through quite complicated process. You could create a linked server from SQL Server to Analysis Services, and then use T-SQL OPENQUERY function to browse the content of the model with DMX query inside. You would have to browse the model three times – to get output nodes, input nodes, and input attributes distribution. Then you need to calculate Z-scores from input attributes distribution to fit them in the formula. Then you could join all of the results in a final query. Something as this (pseudo code):

 

WITH¬†¬† — we need four CTEs

¬†OutputNodes AS¬†¬†¬†¬†¬† — Output nodes CTE

¬† (SELECT col_list¬†¬†¬†¬† — T-SQL Select

      FROM OPENQUERY(linkedAS,

¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†‘SELECT¬†FLATTENED col_list FROM model.CONTENT¬†¬†¬† — DMX Select to get¬†output nodes
WHERE NODE_TYPE=23’)),

¬†InputNodes AS¬†¬†¬†¬†¬†¬†¬† — Input nodes CTE

¬† (SELECT col_list¬†¬†¬†¬† — T-SQL Select

      FROM OPENQUERY(linkedAS,

¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†‘SELECT¬†FLATTENED col_list¬†FROM model.CONTENT¬†¬†¬† — DMX Select to get¬†input nodes
WHERE NODE_TYPE=21’)),

¬†AttrDistribution AS¬†¬†¬†— Input attributes distribution

¬† (SELECT col_list¬†¬†¬†¬† — T-SQL Select

      FROM OPENQUERY(linkedAS,

¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†‘SELECT FLATTENED NODE_DISTRIBUTION¬†¬†¬†¬†¬†¬†¬† — DMX Select to get input attributes distribution

               FROM model.CONTENT
WHERE NODE_TYPE=24′)),

 AttrZScore AS

¬† (SELECT col_list¬†¬†¬† — This is pure T-SQL on previous CTE – calculate Z-scores here

     FROM AttrDistribution)

SELECT col_list¬†¬†¬†¬†¬†¬† — final T-SQL Select to put everything together

  FROM OutputNodes

       INNER JOIN InputNodes

        ON condition

       INNER JOIN AttrZScores

        ON condition;

 

The easy way would be to use something already preparedJ The MS Data Mining team has already prepared additional AS CLR stored procedure, which does all of this calculations for you. You can freely download it from SqlServerDataMining.com, if you register on the site. Besides the procedure you need, you can download also a prepared SSRS report – scorecard for logistic regression models. The link to the article is this: http://www.sqlserverdatamining.com/ssdm/Default.aspx?tabid=102&Id=41. On the
page, you will find the link to download the source code (search for “The download package is available here and it contains a stored procedure that:”. In the article, you will find detailed instructions how to build and deploy the procedure.

 

Dejan Sarka