Developing an OLAP Cube

Developing an OLAP Cube

Overview

Using BIDS, after the DSV is developed, the next step is to create dimensions. Dimensions are of two types: Database Dimensions and Cube Dimensions. Database dimensions can be perceived as a master template, and Cube dimensions can be perceived as instances / children of this master template.

We will start our development with the creation of database dimensions. If you consider a dimension as a table, all the fields in this table can be perceived as attributes. Hierarchy in a dimension is a group of attributes logically related to each other with a defined cardinality. Finally we will create a cube using the dimensions we just developed, and fact tables to create dimensions (cube dimensions) and measure groups (from fact tables).

Creating a Dimension

Dimensions are of two types: database dimension and cube dimension. The dimensions that are defined at the solution level can be termed as a database dimension and the ones defined inside the cube are termed as a cube dimension. Dimension Wizard is the primary means of creating a dimension. We will create a dimension using the three dimension tables which we have included in our schema.

Explanation

Right-click the Dimensions folder and select “New Dimension”, this will invoke the Dimension Wizard. The first screen should look like the below screenshot. You have the options of using an existing table, creating a table in the data source and using a template. We already have the dimension table in our schema and we will use this, so select “Use an existing table” and click “Next”.

Select the DSV we created earlier in the DSV selection. We intend to create a dimension from the DimSalesTerritory table, so select the same table. Every dimension table needs to have a key attribute, and in this table SaleTerritoryKey is the primary key column which is guaranteed to identify each record uniquely. It would not make sense to browse this attribute using the Key, instead SalesTerritoryRegion field has unique values. We can also use this field as the key as well as name column. But for the purpose of our exercise, we will use the SaleTerritoryKey field as the key column and SalesTerritoryRegion as the name column. Though it looks inappropriate to use the key field, but when you are starting to develop an understanding of dimensions, this will help to set a rule in your mind that the key field is always required, mostly a surrogate key and you can set a name column to any field to facilitate a convenient browsing mechanism.

In the next screen, you need to make a selection of the attributes that will be present in the dimension. If you uncheck the “Enable Browsing” button, they won’t be visible to client applications when they browse the dimension. Attributes can be of different types and you can specify the type in the Attribute Type field. The Dimension Wizard removes the Name column you set from the key column as that is available due to the key column. So you won’t find that field in this list of available attributes.

Now the next step is to give a name to the dimension, name it “Cube Dim Sales Territory” or anything appropriate. After this step you have completed creating your first dimension.

In a similar manner create Product and Date dimension using the Dimension Wizard.

Creating Star Schema

03:35:00 Add Comment

Creating a Star Schema Using a Data Source View

Overview

A data warehouse or data mart from where we would source our data could contain ten to hundreds of tables. Also one would not have the liberty to change the schema of these tables to suit the requirements of the cube design. The Data Source View is an insulation layer between the actual data source and the solution. We can create and modify the schema we need in this layer and this is used as the data source for the different objects we create in the solution. A Star Schema is a schema structure where different dimension tables are directly connected to the fact table. If you imagine a fact table in the center and different dimensions attached to it, you would find the figure similar to a star and hence the name star schema. It’s the simplest form of the schema and hence we will use this in our exercise.

Explanation

Right-click on the Data Source View and select New Data Source View and a wizard should pop-up with a Welcome screen. Select “Next”, and the next screen should prompt you to select a relational data source. Select the data source we just created and click “Next”, the next screen should prompt you to select tables that we intend to use in our solution. Select the tables as shown in the below screenshot. The below fact and dimension tables are chosen as they are interlinked with each other and also suits the requirements of the exercises to follow.

Select “Next”, name the DSV to something appropriate and this should finally create your Data Source View. After arranging the tables in the DSV, your schema should look similar to the below screenshot.

In the above figure, you can see that both the fact tables are related to all three dimensions in the same manner. This is a typical case of a star schema. You can also browse the data, create calculated fields, assign primary keys and carry out other similar function in this designer to modify the schema without modifying the actual schema in the database.

Create New Project

01:12:00 Add Comment

Creating a SSAS Project

Overview

To start development, we need to create a new SSAS project using Business Intelligence Development Studio. After creating the new project, we need to create a data source that points to the AdventureWorks DW 2008 R2 database.

Explanation

Open Business Intelligence Development Studio (BIDS). Create a new SSAS Project, by selecting New Project from the File menu. Name this project “MyOLAPProject”. As soon as the new project opens up, you should find a list of folders in the explorer tab. Right-click on the data sources folder and select New DataSource. A Data Source wizard will open with a Welcome screen, select Next and you should find a screen to define your connection. We need to define a new connection, so select “New” and a screen should appear as shown below. Point the connection to the AdventureWorksDW2008R2 database and click OK.

After this, you need to specify the impersonation information for the data source. This information is used to specify how the solution will connect to the SSAS instance using the credentials specified. Every time you deploy or process the solution, this connection information will be used. So keep in mind that the account you use should have sufficient privileges. If you are not sure which account to use, it is suggested that you use an account with administrator privileges on your development machine. Please keep in mind that this is not recommended and should not be done in production environments. This is just suggested to quickly get you started with cube design and development.

After specifying this information, click “Next”. This should take you to the final screen where you need to name the data source. Name it something appropriate and click OK, which should create your data source.

Installing Sample Database

01:02:00 Add Comment

Installing AdventureWorks Sample Database

Overview

AdventureWorks is the sample database available from Microsoft for different purposes as well as different SQL Server versions. We need to use the AdventureWorks DW 2008 R2 database for our cube design and development. This database contains dimension and fact tables with prepopulated data. We can use this database as a launchpad to start our SSAS project. Developing a data mart is out of the scope of this tutorial, so we will use this sample database.

Explanation

To install the AdventureWorks database, navigate to the codeplex (http://msftdbprodsamples.codeplex.com/) site and download the MSI for the version of SQL Server you are using. This tutorial expects that the reader is using SQL Server 2008 R2, and all the exercises will be using this version of SQL Server.

After downloading, start the installer and you should get a screen similar to the one below.

AdventureWorks Data Warehouse 2008R2 is the database we need for our exercises. Point the installer to the SQL Server instance that you are using, and install the database. After the database in installed, open SQL Server Management Studio to verify the databases that were installed. You should find something similar to the below screenshot.

Expand the database higlighted above and check out the different Dim and Fact tables in this database. The tables having the prefix Dim are suited to be used as Dimension tables, and tables having prefix Fact are suited to be used as Fact tables.

Getting Started

00:54:00 Add Comment

Creating a Sample SSAS Project and Cube

Overview

Data in Online Transaction Processing (OLTP) systems is suited to support convenient data storage for user-facing applications. The data model in such systems is highly normalized. For data warehousing environments, data is required to be in a schema that supports a dimensional model. Data is therefore transformed from the OLTP storage systems to a data warehouse using ETL, so that data can be aligned in a suitable format to create data marts from the data warehouse.

Two major theories driving the design of a data warehouse and data marts are from Ralph Kimball and Bill Inmon which are mostly practiced in real time environments. Generally data is gathered from OLTP systems and brought to the data warehouse. From the data warehouse, context / requirement specific data marts are created, which can be perceived as a subset of the data warehouse. Cube source data from these data marts, and client applications connect to the cube. The schema for a cube falls into two categories: Star and Snowflake. In simple terms, Star Schema can be considered a more denormalized form of schema compared to Snowflake.

Designing and developing a data warehouse is out scope for this tutorial. For the purpose of development, we will install and use the AdventureWorks DW database. We will then create a SSAS project and create a data source which will connect to this database. Finally we will create a star schema using a Data Source View.

SSAS Tutorial Overview

00:45:00 Add Comment

SQL Server Analysis Services (SSAS)

Overview

SQL Server Analysis Services (SSAS) is the technology from the Microsoft Business Intelligence stack, to develop Online Analytical Processing (OLAP) solutions. In simple terms, you can use SSAS to create cubes using data from data marts / data warehouse for deeper and faster data analysis.

Cubes are multi-dimensional data sources which have dimensions and facts (also known as measures) as its basic constituents. From a relational perspective dimensions can be thought of as master tables and facts can be thought of as measureable details. These details are generally stored in a pre-aggregated proprietary format and users can analyze huge amounts of data and slice this data by dimensions very easily. Multi-dimensional expression (MDX) is the query language used to query a cube, similar to the way T-SQL is used to query a table in SQL Server.

Simple examples of dimensions can be product / geography / time / customer, and similar simple examples of facts can be orders / sales. A typical analysis could be to analyze sales in Asia-pacific geography during the past 5 years. You can think of this data as a pivot table where geography is the column-axis and years is the row axis, and sales can be seen as the values. Geography can also have its own hierarchy like Country->City->State. Time can also have its own hierarchy like Year->Semester->Quarter. Sales could then be analyzed using any of these hierarchies for effective data analysis.

A typical higher level cube development process using SSAS involves the following steps:

1) Reading data from a dimensional model
2) Configuring a schema in BIDS (Business Intelligence Development Studio)
3) Creating dimensions, measures and cubes from this schema
4) Fine tuning the cube as per the requirements
5) Deploying the cube

In this tutorial we will step through a number of topics that you need to understand in order to successfully create a basic cube. Our high level outline is as follows:

Design and develop a star-schema
Create dimensions, hierarchies, and cubes
Process and deploy a cube
Develop calculated measures and named sets using MDX
Browse the cube data using Excel as the client tool

When you start learning SSAS, you should have a reasonable relational database background. But when you start working in a multi-dimensional environment, you need to stop thinking from a two-dimensional (relational database) perspective, which will develop over time.

In this tutorial, we will also try to develop an understanding of OLAP development from the eyes of an OLTP practitioner.