Data Mining algorithms accept training data from relational or OLAP data sources. When you mine OLAP cubes, your case table is a dimension. Any measure from any measure group (fact table) connected with the selected dimension can be used as a case-level column. Additionally, you can add other dimensions connected to the same measure group as nested tables.

So everything looks all right. But I run into a problem when I wanted to use the Time Series algorithm to forecast a measure for columns from two dimensions, for example, to forecast sales amount for categories of products in countries of customers. The problem is that you can have only one case series per model, i.e. you can use customers (countries) or products (categories) as case-level table. Also, you need a dimension with time information. You can select only dimensions with time information as nested tables, because you need a Key Time column. So it is not possible, for example, to use customers as the case table and products as a nested table. Some time dimension is the only candidate for a nested table for Time Series algorithm.

I could not find any other workaround than to create a view in SQL Server using Analysis Services as a linked server. Of course, in the view I have to use MDX Select in the Openquery function to get the data from the OLAP cube. The MDX query has to return a two-dimensional cellset, i.e. a rowset that SQL Server can handle. Considering the power of MDX, you can get any data from the cubes in your view, as long as it is flattened to two dimensions only. Now you can mine the view.

Here is an example how you can define a view prepared for the Time Series algorithm from the AdventureWorksDW demo cube (you have to deploy the AS sample project on your local AS first, if you want to check this code):

 

— Multiple dimensions as cases for Time Series

 

— AS linked

USE master

GO

EXEC master.dbo.sp_addlinkedserver @server = N‘ASLOCAL’, @srvproduct=N, @provider=N‘MSOLAP.3’, @datasrc=N‘localhost’, @catalog=N‘Adventure Works DW’

GO

EXEC master.dbo.sp_serveroption @server=N‘ASLOCAL’, @optname=N‘collation compatible’, @optvalue=N‘false’

EXEC master.dbo.sp_serveroption @server=N‘ASLOCAL’, @optname=N‘data access’, @optvalue=N‘true’

EXEC master.dbo.sp_serveroption @server=N‘ASLOCAL’, @optname=N‘rpc’, @optvalue=N‘false’

EXEC master.dbo.sp_serveroption @server=N‘ASLOCAL’, @optname=N‘rpc out’, @optvalue=N‘false’

EXEC master.dbo.sp_serveroption @server=N‘ASLOCAL’, @optname=N‘connect timeout’, @optvalue=N‘0’

EXEC master.dbo.sp_serveroption @server=N‘ASLOCAL’, @optname=N‘collation name’, @optvalue=null

EXEC master.dbo.sp_serveroption @server=N‘ASLOCAL’, @optname=N‘query timeout’, @optvalue=N‘0’

EXEC master.dbo.sp_serveroption @server=N‘ASLOCAL’, @optname=N‘use remote collation’, @optvalue=N‘true’

GO

EXEC master.dbo.sp_addlinkedsrvlogin @rmtsrvname = N‘ASLOCAL’, @locallogin = NULL , @useself = N‘True’

GO

 

— View for forecasting Internet Sales Amount for categories of products sold in countries

USE AdventureWorksDW

GO

 

CREATE VIEW vCountryCategoryTimeSeries AS

SELECT CAST(mdx.“[Customer].[Country].[Country].[MEMBER_CAPTION]” As nvarchar(20)) + ‘ ‘ +

       CAST(mdx.“[Product].[Category].[Category].[MEMBER_CAPTION]” As nvarchar(20)) As CountryCategory,

       CAST(RIGHT(CAST(mdx.“[Date].[Calendar Year].[Calendar Year].[MEMBER_CAPTION]” As nvarchar(7)),4) As int) As CalendarYear,

       CAST(mdx.“[Measures].[Internet Sales Amount]” As numeric(10,2)) As InternetSalesAmount

  FROM

   OPENQUERY(ASLOCAL,

     ‘SELECT NON EMPTY { [Measures].[Internet Sales Amount] } ON COLUMNS,

             NON EMPTY { ([Customer].[Country].[Country].ALLMEMBERS *

                          [Product].[Category].[Category].ALLMEMBERS *

                          {[Date].[Calendar Year].&[2003], [Date].[Calendar Year].&[2004]}) } ON ROWS

       FROM [Adventure Works]’)

       AS mdx

GO

 

SELECT *

  FROM vCountryCategoryTimeSeries

ORDER BY 1,2

 

Now you can create a mining model using the AdventureWorksDW database as the relational data source and the vCountryCategoryTimeSeries view as the case table. Select CountryCategory as Key, CalendarYear as Key Time and InternetSalesAmount as input and predictable column for your forecasting model. After you deploy, process and view it you can clean up the AdventureWorksDW database and drop the linked server:

 

— Clean-up

DROP VIEW vCountryCategoryTimeSeries

GO

USE [master]

GO

EXEC master.dbo.sp_dropserver @server = N‘ASLOCAL’

GO

 

Dejan Sarka