Whilst completing my review on the Expert Cube Development with SSAS Multidimensional Models, I came across the Distinct Count Measure within SSAS and how this can affect query performance as well as processing performance.

 

You can find the reference on page 114

And here is a link to the actual book that I am reviewing:

http://www.packtpub.com/expert-cube-development-with-ssas-multidimensional-models/book

What I wanted to do is instead of using the Distinct Count Function which we can use within SQL Server Analysis Services (SSAS), (which as we know has some performance issues along with creating its own measure group), I wanted to find a way where I could use the SUM Function in our Measure group but still return the distinct count.

 

And then use this distinct count using a dimension to slice by

 

In our example below we want to extract from the AdventureWorksDW2012 Analysis Services Database

 

You can get a copy of this here: http://msftdbprodsamples.codeplex.com/releases/view/55330

·         I used the AdventureWorksDW2012 Data File and AdventureWorks Multidimensional Models SQL Server 2012

 

Example reference:

·         We are going to get the distinct number of Customers (CustomerKey) from the [dbo].[FactInternetSales]

o    So this will enable us to get a distinct count based on any of the attributes from our Customer Dimension

·         NOTE: The thing to note is that you will always want to create your distinct calculation on your lowest granularity.

·         As with our example in our [dbo].[FactInternetSales] the lowest level was the Date or OrderDateKey.

·         In order to follow how this works, we will be using the Order date of 03 March 2008.

o    The distinct number of Customers on 03 March 2008 is 51 Customers.

 

Creating the Distinct Number of Customers Calculation

1.       The first thing is we will be creating our distinct calculation and then insert this into a staging table.

a.        NOTE: What you could do is to use your staging table later as part of the loading into your Fact Table.

                                                               i.      But with our example we are going to update our Fact Table with our calculations.

                                                              ii.      We also insert the data into a Staging table so that we can use it as part of our Update Statement.

2.       Next below is the TSQL syntax that we used to create our distinct number of Customers Calculation, with an explanation afterwards

Select

                      convert(float,Count(Distinct(CustomerKey)))/ Count(1)asDistinctCustomerKey

                     ,OrderDateKey

  FROM[AdventureWorksDW2012].[dbo].[FactInternetSales]with (nolock)

         GroupbyOrderDateKey

         OrderbyOrderDateKey

a.        The only part that we are going to concentrate on is the actual calculation.

b.       We will start the explanation from the inside out, due to this being the logical way that I built this up.

                                                               i.      The part highlighted in RED is where we first are selecting the Distinct CustomerKey

1.       Distinct(CustomerKey)

                                                              ii.      The next section highlighted in PURPLE is where we are doing a count of the Distinct CustomerKey

1.       Count(Distinct(CustomerKey))

                                                            iii.      The next section highlighted in GREEN is where we are converting our values to a Float

1.       convert(float,Count(Distinct(CustomerKey)))

                                                            iv.      The final part of the calculation is highlighted in BLUE, where we are dividing our distinct count by the number of rows.

1.       / Count(1)

2.       This is so that we can then get the correct calculation.

                                                              v.      We also have the OrderDateKey, because this is our lowest level of granularity, we need to group by this so that the number of rows is grouped per day to work out the calculation correctly.

                                                            vi.      NOTE: The reason that we are converting this to a float is so that later when we sum the row details it will sum back up to the correct number.

3.       For the simplicity to understand below is the complete TSQL Syntax where we will be truncating and inserting our data into our Staging table in our AdventureWorksDW2012 database

TruncateTabledbo.Staging_tb_FactInterNetSales_DistinctCustomerKey

Insertintodbo.Staging_tb_FactInterNetSales_DistinctCustomerKey

Select

                      convert(float,Count(Distinct(CustomerKey)))/Count(1)asDistinctCustomerKey

                     ,OrderDateKey

                    

  FROM[AdventureWorksDW2012].[dbo].[FactInternetSales]with (nolock)

         GroupbyOrderDateKey

         OrderbyOrderDateKey

a.        As you can see above we are also grouping the calculation by OrderDateKey so that we can use this to insert the calculation for multiple days.

4.       With our example explained above if we had to look at our calculation for our Distinct Customers on the 03 March 2008 it would be the following:

a.        clip_image002[4]

 

Updating our Fact Table with our Distinct Customer calculation

1.       The first thing is that you need to ensure that you have your column created in your Fact Table.

a.        I manually created the following column in the dbo.FactInternetSales Table:

Altertable[dbo].[FactInternetSales]

  AddDistinctCustomersFloatNULL

2.       Next is our Update TSQL Statement where we are updating the column we created in Step 1 above, with an explanation below:

Update[dbo].[FactInternetSales]

SetDistinctCustomers=DistinctCustomerKey

From[dbo].[FactInternetSales]asFwith (nolock)

       Innerjoindbo.Staging_tb_FactInterNetSales_DistinctCustomerKeyasSwith (nolock)

              onF.OrderDateKey=S.OrderDateKey

a.        From above you can see that we are joining from our Fact table to our Staging table and using the OrderDateKey in our join.

                                                               i.      NOTE: Once again this is our lowest level of granularity

3.       As with our example if we now have a look at our Fact Table for 03 March 2008 we will see the following for our DistinctCustomers column.

a.        clip_image004[4]

 

Adding the new Measure to your SSAS Cube

1.       Open up your SSAS Project in SQL Server Data Tools (SSDT)

2.       Next open the Data Source View and refresh it.

a.        As our example we opened up the Adventure Works DW.dsv

b.       Once open we clicked Refresh, which you will then see the following:

                                                               i.      clip_image006[4]

c.        Click Ok.

3.       Next open your cube where you want to add your new Distinct Measure to.

a.        As with our example we opened the Adventure Works cube

                                                               i.      clip_image008[4]

4.       Under Measures if you click on the plus sign, and next to Internet Sales, right click and select New Measure

a.        clip_image010[3]

5.       This will open the New Measure Window

a.        Now as shown below under Usage: Sum

b.       Source Table: Internet Sales Facts

c.        Source Column: DistinctCustomers

d.       clip_image012[3]

e.       Then click Ok.

6.       You will now see your measure under the Internet Sales Measure Group

a.        clip_image014[3]

b.       Now we are going to rename the Distinct Customers measure so that we can use this name in our calculation member which will be explained in the next steps.

                                                               i.      NOTE: The reason we are creating a calculation is because we need to round up our values from our database in order to make the distinct count a whole number.

c.        Right click on the Distinct Customers Measure and select rename

                                                               i.      clip_image016[3]

d.       We are renaming it to Distinct Customers – Calc

                                                               i.      NOTE: The reason that we give it this name is so that we know it is used as part of a calculation.

e.       The final thing to do is to right click on our Distinct Customers – Calc and select properties.

                                                               i.      Then change the Visible Property to False

                                                              ii.      clip_image018[3]

f.         Then save your cube.

 

Creating a calculation in SSAS and rounding the value so that it will be shown as a whole number

1.       The final step is to create our calculated member so that when we display the values to the client tool that the numbers look correct.

2.       Click on the Calculations Tab.

3.       Then click on New Calculated Member

a.        clip_image020[3]

4.       We then configured it with the following as shown below:

a.        Name: [Distinct Customers]

b.       Parent Hierarchy: Measures

c.  Expression: Round([Measures].[Distinct Customers – Calc],0)

                                                               i.      NOTE: Here we are using the Round function and setting it to not keep any decimal points by specifying the zero (0).

1.       This will also enable if the value is higher than 0.5 to round up to 1.

2.       This is so that when the Calculated Measure is displayed it will always be a whole number.

d.       Format String: Standard

e.       Visible: True

f.         Non-empty behavior: Distinct Customers – Calc

g.        Associated Measure Group: Internet Sales

h.       clip_image022[3]

5.       Now finally process your cube.

a.        NOTE: If you are using the Adventure Works MultiDimensional project in SSDT, they are by default set to Query binding for the partitions and due to this you will have to add our new column to the following Measure Groups:

                                                               i.      Part to Add:

,[dbo].[FactInternetSales].[DistinctCustomers]

                                                              ii.      Measure Group partitions:

1.       clip_image024[3]

                                                            iii.      If you do not add the column name in there the processing will fail.

 

Viewing Distinct Customer Counts

1.       Finally we now can view our new Distinct Count Measure that we created.

2.       As with our example I am expecting to see for 03 March 2008 the Distinct Customers to be 51

a.        clip_image026[3]

3.       And this was the goal of this exercise to enable to get a Distinct Count that can be used on our Dimensions using the SUM Function

 

Getting a distinct count for another dimension

If you wanted to get a distinct count for another dimension, you would then need to create another column as well as all of the steps above.

 

1.       For our above example if we wanted to get the distinct count of Currency (CurrencyKey) we would need to modify our query to include the following:

TruncateTabledbo.Staging_tb_FactInterNetSales_DistinctCustomerKey

Insertintodbo.Staging_tb_FactInterNetSales_DistinctCustomerKey

Select

                      convert(float,Count(Distinct(CustomerKey)))/Count(1)asDistinctCurrencyKey

                      ,CurrencyKey

                     ,OrderDateKey

                    

  FROM[AdventureWorksDW2012].[dbo].[FactInternetSales]with (nolock)

         GroupbyOrderDateKey,CurrencyKey

         OrderbyOrderDateKey

a.         

2.       And then we would create all the column names and details in SSAS as detailed above.

0 thoughts on “SSAS – Using the SUM Function within a Measure Group to display a Distinct Count with SSAS (SQL Server Analysis Services)”

  1. All this great, but what if I have a long list if dimension keys in my factSales table i.e. orderDate is not my granularity level, what should I do? How should I devide customerKey?

  2. I would suggest then what ever is your lowest granularity that you have. So if for example in your data instead of OrderDateKey, it was ProductKey (Or Surrogate Key) then you would change the first query and use the ProductKey (SurrogateKey) as part of your Group By Clause.

    The way I did it was to first run the TSQL Query to make sure if I had to sum up the DistinctCustomerKey with the amount of rows that I had for a day that they would equal each other. That way you can ensure that your distinct calculation is correct.

    Does that help?

  3. Yes, now I got it. This is what your last part “Getting a distinct count for another dimension” is about, I might read it rapidly. Thank you!

  4. Nice post. I was checking continuously this weblog and I’m impressed!

    Extremely useful info particularly the closing part
    🙂 I deal with such info much. I was seeking this certain information for a long time.
    Thank you and good luck.

Leave a Reply

Your email address will not be published. Required fields are marked *