This article pursues the goals of the first part, pointing out the ways to create and configure another interesting service application available in SharePoint 2010—the Web Analytics service application. As usual, I’ll introduce this topic with a general and architectural overview of the Web Analytics functionality, once again paying attention to its scalability and effectiveness.
TechNet gives a concise but effective definition of Web Analytics: “…a set of features that you can use to collect, report, and analyze the usage and effectiveness of a SharePoint Server 2010 deployment”. Notice the perspective used in this definition: The usage information is collected to analyze the overall SharePoint deployment, not only the way it is “navigated” by its users.
The main goal of this service application is not the simple production of a number of reports (there are actually a number of reports) but rather the provision of:
- A data warehouse that is usable as a data source for custom reporting
- A specialized Web Part to expose usage information in web pages
- A workflow that enables users to schedule the release of reports by either a time frame or when specific conditions are met
Web Analytics ArchitectureThe diagram in Figure 1 is a respectable representation of the overall architecture of the Web Analytics service application and the way usage data are collected, processed, aggregated, and exposed in reports. The complete workflow can be summarized as follows:
- The web front-end (WFE) servers gather data through an AP-extended Core Usage API.
- Data is saved in .usage files on each WFE server.
- A Windows SharePoint Services (WSS) timer job on each WFE box extracts the relevant data stored in .usage files and pushes the data to the Staging database through the Logging Web Service.
- A Log Batcher component batch processes the data in the Staging database and stores it back in the Staging database.
- The Log Batcher implements spam detection on the data to flag any data that has some values higher than the permissible thresholds.
- A Usage Behavior Analyzer gathers the data, a batch at a time, and writes it to the fact tables in the Staging database.
- The fact tables are read by a Report Consolidator.
- The Report Consolidator generates aggregation tables and stores them in the data warehouse (i.e, the Reporting database).
- Table valued functions (TVFs) represent clients’ queries and are used to query the data in the Web Analytics database.
The diagram (which is based on the one in the TechNet article “Capacity requirements for the Web Analytics Shared Service in SharePoint Server 2010”at http://technet.microsoft.com/en-us/library/gg440601.aspx) clearly shows the involvement of many SharePoint 2010 components operating across the three layers of a farm. There’s a difference between the original diagram and the one in Figure 1: I added the State Service and the related database because they play a role in the Web Analytics architecture. TechNet doesn’t describe in detail the interaction between the Web Analytics service application and the State Service. It only describes the State Service, noting that “The State Service is a shared service that is used by some Microsoft SharePoint Server 2010 components to store temporary data across related HTTP requests in a SQL Server database.” However, the instructions to configure the Web Analytics service application (as you’ll see in the following pages) clearly ask you to verify that the State Service is active and properly configured.
Because of the high amount of information collected on the web servers, one of the main issues when creating and configuring Web Analytics is the need for accurate capacity planning to determine the correct topology for this simple but high-data-consuming service application. Web Analytics provides reports related to three main categories of information:
- ATraffic (numbers and times related to the amount of requests the farm receives from clients)
- Search (numbers and times related to the specific search activities in the farmInventory (numbers related to the overall “composition” of the farm in terms of web applications, site collections, sites, pages, etc.)
For this reason, the capacity plan should be developed according to the following criteria:
- Total expected site traffic (clicks, search queries, and ratings)
- Number of SharePoint components (sites, site collections, and web applications) for each farm
- Topology and general organization of the Search Service
The Web Analytics service application is scalable, both at the application layer and the data layer. In the application layer, it is possible to create and configure as many instances of the service application as needed, depending on the previously described criteria. Figure 2 shows the Microsoft-estimated values about the traffic supported by one, two, and three instances of the Web Analytics service application. Notice that the curve is not exactly arithmetically growing. Each topology supports a certain amount of “records” each day; each click, search query, or rating makes up a record. Microsoft reckons that CPU and memory are less intensive and shouldn’t be considered as critical in the capacity plan.
On the data layer front (see Figure 3), the available topologies can be summarized as follows (consider each SQL Server topology as enforceable for each Web Analytics instance you’ll deploy):
- One instance of SQL Server for both the staging and reporting databases (1S+R
- Two instances of SQL Server, one staging database, and one reporting database (1S1R)
- Three instances of SQL Server, two staging databases, and one reporting database (2S1R)
Notice that the record numbers for the SQL Server instances are significantly less than those for the service application instances. It is plain to see that the Web Analytics service application consumes high amounts of storage and requires an adequate capacity plan. Just as an empiric example, consider that 90 days of data, based on 3 million daily clicks, requires about 250GB of storage.
Last but not least, the retention period is configurable between 1 and 25 months. This is another important factor to consider when designing a capacity plan. Once correctly created and configured, the Web Analytics service application can be enabled or disabled at the site collection level.
The Web Analytics reports provide many metrics at three different levels: web application level (Table 1), site collection level (Table 2), and site level (Table 3). Out-of-the-box, these reports are visible only to Administrators at each level. As for the content and the meaning of each report, I could write pages of explanations and examples, but that is another story!
Before discussing the creation and configuration procedures, I want to add one last note about the databases involved with the Web Analytics service application:
- The staging database is actually the working database. It contains un-aggregated fact data and asset metadata, batched in queue. Its retention is short-term (depending on the configuration of the timer jobs involved in the pre-elaboration of the data).
- The reporting database contains the aggregated standard report tables, the aggregated fact data and asset metadata, and the diagnostics data. Its retention is explicitly configurable between 1 month (the default value) and 25 months.
Creating and Configuring the Web Analytics Service Application Like most SharePoint service applications, the Web Analytics service application can be created and configured by using one of the following tools:
- Farm Configuration Wizard
- Central Administration
- Windows PowerShell
Running the Farm Configuration Wizard is the easiest way to configure any of the SharePoint components, so it will not be described in this article.
As an introductive note to the samples I’ll be using to illustrate the creation and configuration of a Web Analytics service application, here is the description of the environment I adopted for these tasks:
- Domain controller: Windows 2008 R2 assuming just the domain controller role
- SharePoint Server: Windows 2008 R2 SP1 with the binaries of SharePoint Server 2010 SP1 and the June cumulative updates
- SQL Server instance: Windows 2008 R2 with SQL Server code-named “Denali” CTP3
It has been an interesting experiment to verify (indeed successfully) the readiness of SQL Server Denali CTP3 as the data layer for SharePoint 2010. The upgrade at SP1 of the SharePoint binaries demonstrate a satisfying behavior of both the platforms, as you’ll see in the following pages. Using Central Administration to Create and Configure a Web Analytics Service ApplicationCentral Administration provides an interactive user interface to carry out most of the administrative tasks in SharePoint 2010. When you use it to create and configure the Web Analytics service application, however, you’ll find yourself in the rare situation in which the tool is not sufficient enough to achieve the final result, as you’ll soon discover. Anyway, the first step is verifying that the user account performing this procedure is a member of the Farm Administrators SharePoint group, as Figure 4 shows.
Next, I recommend asking the DBA to pre-create the databases needed in the Web Analytics service application, basing the main parameters (names, locations, sizes, file growth, etc.) on the capacity plan previously implemented. I recommend this step in almost every SharePoint service application article I write because it gives the SharePoint administrator and the DBA more control over the storage allocation. Figure 5 shows sample Staging and Reporting databases and the code that created them. The sample I’m showing is for a very small farm, but it’s possible to create more than one staging database. (They must be created on different SQL Server instances.) From the beginning, the databases should be large enough to support the traffic you anticipate in the capacity plan. There are other important considerations about the choice of the hardware, but you can find plenty of details in the previously mentioned TechNet article “Capacity requirements for the Web Analytics Shared Service in SharePoint Server 2010”.
Now it’s time to create the new Web Analytics service application instance. In the Central Administration page, select Manage service applications under the Application Management menu, as shown in Figure 6. There are many ways to get to this page in the Central Administration UI; this is simply one of them.
In the Manage service application page, click the New button and select the Web Analytics Service Application option from the drop-down menu (see Figure 7).
This selection brings the pop-up dialog box Create New Web Analytics Service Application in the foreground. In the higher part of the dialog box (see Figure 8), you should specify the name of the new Web Analytics service application. (It has to be univocal at the farm level.) You can choose between the option of using an existing application pool or creating a new one for the service application. In this sample, I decided to create a new application pool, using a managed account previously registered as the identity.
In the second section of the same dialog box (see Figure 9), you specify the names and location (the SQL Server instance) of the databases assigned to the service application you’re creating. Figure 9 clearly shows the limit of the Central Administration UI: you can assign just ONE Staging database and one Reporting database. It’s not possible to assign more than one Staging database here, so this is not the right way to proceed when the capacity plan suggests using more staging databases.
The last parameter you have to indicate is the Data Retention Period, which is configurable with a value between 1 and 25. Clicking the OK button, starts the process that creates the Web Analytics service application instance named CA Web Analytics. After a couple of minutes, the message shown in Figure 10 is displayed. It kindly alerts you to carry out some steps to complete the setup of the service application. To make things easier, the first two steps suggest how to carry out these elementary operations.
The first step tells you to verify that the Web Analytics Data Processing Service and the Web Analytics Web Service have been started. To do so, follow the instructions given (click System Settings and then click Manage Services on Server). As you can see in Figure 11, if one or both of the mentioned services are not started, you should click on the corresponding Start option.
The second step tells you to verify that all the required events are logged. This is really important because Web Analytics takes advantage of the Logging Service activities. Figure 12 shows the steps to complete this verification.
The third step demonstrates the weak spot of Central Administration when configuring the Web Analytics service application: If the State Service is not provisioned, there’s no way to create the corresponding service application and configure it in Central Administration. It needs to be done through PowerShell. The TechNet article “Manage the State Service (SharePoint Server 2010)” explains that there’s a second tool you can use to configure this service application—the Farm Configuration Wizard—but as I previously mentioned, this is not a tool I want to explore. At this point, I again recommend that you follow best practice of having the DBA pre-create the database used by the State Service service application. It is important that this database be created in the same SQL Server instance hosting the farm’s Configuration database. Figure 13 shows a sample implementation of this step.
Once the database has been created, open the SharePoint 2010 Management Shell (the instance of PowerShell in which the SharePoint snap-in is already registered) or any other PowerShell interface (in this case you should “add” the SharePoint snap-in). In this article, I’m using PowerGUI, but in Part 1 of this series, I used the Windows PowerShell Integrated Scripting Environment (ISE). No matter which PowerShell interface you’re using, you should execute the script in Listing 1.
Each cmdlet used in the script is fully described in TechNet (starting at “Windows PowerShell for SharePoint Server 2010”), but with the help of a couple of comments, it’s not difficult to interpret the meaning of the script. Notice that, having pre-created the database, the script uses the Mount-SPStateServiceDatabase cmdlet to associate the database to the service application, then uses the Initialize-SPStateServiceDatabase cmdlet to install the state database schema in the database. After the script has been successfully executed, you can verify that both the State Service service application and the related proxy have started in the Service Applications page, as illustrated in Figure 14.
There’s one last step to complete the configuration of the Web Analytics service application you just created. You need to associate the new service application to each web application for which you intend to collect usage data to feed Web Analytics reports. As for any service application, you have to assign the proxy to one of the two proxy groups available for each web application. To do so, navigate to the Central Administration > Service Applications > Configure service application associations page (Figure 15) and, for each web application to be associated, select it from the list of available web applications (Figure 16).
For each selected web application, the list of proxies (organized by proxy group) is displayed (Figure 17), and you can verify that the new Web Analytics service application (in this example, CA Web Analytics) is selected. Here it is possible to change the proxy group of the service application. In this example, it is included in the default proxy group, but nothing prevents you from including it in the custom proxy group.
At this point, the Web Analytics service application is completely configured. To display some reports, you usually have to wait at least 24 hours so that the collection of usage data can significantly feed the databases. Just the Administrators at the various levels (web application, site collection, and site) have access to the reports. Figure 18 shows the three steps you need to take to view the Web Analytics reports. Notice that the values in my example are all zeroes. That’s because the data collection and subsequent processes have not ran yet.
Lastly, there is a curiosity to explore. Looking at the properties of the Reporting database, you’ll notice that it is organized based on a number of filegroups. If you navigate to the Storage container in SQL Server Management Studio (SSMS) and expand the Partition Functions and Partition Schemes containers, you’ll find the reason for so many filegroups. As you can see in Figure 19, there is a partition function used by a partition schema defining weekly-based partitioning.
If you examine some of the database tables, you’ll notice that each table containing “aggregate by date” values is partitioned (Figure 20). Each partition will host data aggregations for each week included in the retention period configured during the Web Analytics service application configuration. The partition function and the partition schema will be progressively modified by one of the timer jobs that Web Analytics uses to update its own data.
Using PowerShell to Create and Configure a Web Analytics Service ApplicationIn the previous section, I pointed out that you can take just partial advantage of the Central Administration tool to create and configure the Web Analytics service application because the State Service can be created and configured only through PowerShell scripts. This is one of the two main reasons to adopt PowerShell for the whole process. The second (but not less important) reason is the fact that, with PowerShell, you can completely control the topology you choose for the Web Analytics service application. Indeed, in the architecture section of this article, you surely noticed that in case of high traffic it is strongly suggested that you distribute more than one Staging database on more than one SQL Server instance in the farm. This option is not directly available in Central Administration, but it is well managed via PowerShell scripting.
The first step is to have your DBA pre-create the databases. This time the DBA should create all the needed databases (including the State Service database) at once, as Figure 21 shows. That way, you can execute a script that creates and configures both service applications (Web Analytics and State Service).
Listing 2 shows the script that creates and configures both service applications. There are a lot of elements in this script, but I will point out only a few very important segments:
- In the Get Central Admin server section, the script stores the address of the server hosting the Central Administration site. This information will be used to correctly locate the activation of the Web Analytics services.
- In the Get default SQL Server instance section, the script stores the definition of the SQL Server instance hosting the Configuration database, which will be used to locate the databases associated with the service applications.
- In the Prepare variables for the subsequent job section, the most interesting thing is the insertion of two XML fragments defining the database collections that have to be associated with the Web Analytics service application. The $StagerSubscription and $WarehouseSubscription variables allow the definition of more than one database (each eventually assigned to a different SQL Server instance) to directly define a high-traffic/high-performance topology for the service application.
In this article, I’ve shown you the two main ways to create and configure a Web Analytics service application. If you compare the steps involved, I think it’s evident that the use of PowerShell can significantly simplify the job.