How to configure Incremental Refreshing in Power BI with DateKey or Date (instead of the default DateTime)
I was helping on the Power BI community forum (where I certainly learn a lot) and I came across this blog post where a Microsoft Employee had suggested that a Power Query function could be used as part of the Incremental refresh.
What I mean by this is when implementing Incremental refreshing in Power BI the default is that you must use a DateTime data type on your column. Whilst this is awesome a lot of Fact tables typically will have a DateKey (in the format of “YYYYmmdd” “20200309”) or Date (in the format of “YYYY-mm-dd” “2020-03-09”)
This got me thinking and I was certain that I could figure out a way to use a variation of the function above so that I could pass through the DateKey to my Fact table. This would mean a massive query performance increase because my SQL Server table is partitioned by DateKey and I have an index on the DateKey.
After some testing and playing around with the function, the underlying view and making sure it works I successfully got it working where I am using Incremental refreshing using my DateKey column.
This got my refresh times even quicker as shown below from an actual implementation
Below are the steps I completed to get this working.
Figuring out how to use the DateKey within Power Query
The first thing I had to figure out was how to use the DateKey instead of the DateTime column (which is a requirement for Incremental Refreshing)
The first thing I did was to make sure that I had configured my required parameters are per the requirements to incrementally refresh the data.
- As you can see below, I have my RangeStart and RangeEnd configured as DateTime
I then connected to my existing view where I have got my dataset and the column called DateKey
- Now I wanted to make sure that when I used the RangeStart that I could change the output from the DateTime to DateKey in the format of YYYYmmdd
I did this by creating a Custom Column and put in the following below.
Date.Year(RangeEnd)*10000 + Date.Month(RangeEnd)*100 + Date.Day(RangeEnd)
I then viewed the result and I could see it was correctly showing the RangeEnd in the format of YYYYmmdd
- I then copied the above code, knowing that it works and will put it in the format I require.
Modifying the Incremental Refresh from using the RangeStart/RangeEnd DateTime to using the DateKey
My next steps were to change the incremental refresh from the defaults when using the RangeStart and RangeEnd to use the DateKey, which I explain below.
- I first created the Incremental refreshing as per the requirement.
I went to my Date column which in my example is [Invoice Date Key] and applied the filters as shown below.
- When I looked in the Power Query Editor Step, I could see code created.
- In the steps below is where I figured out how to get it working with the DateKey
Next is where I changed it to use the code I had tested in the previous steps. This would then use the [InvoiceDateKey] column and would modify the RangeStart and RangeEnd to be in the format required for the DateKey
- And below is the code
Table.SelectRows(#"Added Custom1", each [InvoiceDateKey] >= Date.Year(RangeStart)*10000 + Date.Month(RangeStart)*100 + Date.Day(RangeStart) and [InvoiceDateKey] < Date.Year(RangeEnd)*10000 + Date.Month(RangeEnd)*100 + Date.Day(RangeEnd))
I could then see my table filtered with my parameter values from the RangeStart and RangeEnd
Verifying that my Power Query would still fold back to the SQL Server Source
One very important step was to make sure that my Power Query was still folding back to the SQL Server Source.
The first thing that I did to validate this is to right click on the final step in my table and select View Native Query
This then showed me that it was indeed folding back
- NOTE: There are some instances when a you might not see the “View Native Query” option enabled, and it can still fold back.
I wanted to make 100% sure that it did indeed fold back to the SQL Server Source.
- What I did was to change the parameters to be over a longer time range, this would allow me to run a very handy stored proc sp_WhoIsActive which would allow me to see the query running
- As shown below I could then see it was running the Native Query against my SQL Server Source (below is the critical piece of the generated TSQL Code)
The final step was to then configure the Incremental Refresh, which I did as shown below.
Because this was not the entire dataset, I created a table which just had the one date as shown below.
- This would allow me later to validate that the incremental refresh was indeed working.
- I then saved and uploaded my PBIX to the Power BI Service.
Testing the Incremental refresh
The last step was to test and make sure that the incremental refresh was working in the Power BI Service.
- Once uploaded I configured the dataset to use my On-Premise Data Gateway.
I then went to the dataset and clicked on Refresh Now
At the same time the refresh was running I then went to my SQL Server, where I could see the query that was generated which proved to me that it was folding the query back to the SQL Server Source (below is the critical piece of the generated TSQL Code)
When I looked at the report, I could now see data not only for 1 day but for multiple days
To test the Incremental refreshing, I went back to the dataset and clicked on Refresh Now again
I could now see the query had changed to only include the data as per my incremental refresh policy (below is the critical piece of the generated TSQL Code)
I could then see that it completed the refresh in record time (Yes, I know it is marginally quicker but on larger datasets this is a LOT quicker)
Incremental refresh using a Date column
I initially thought that it would be as simple as using a Date column with the data type set to Date and it would work with some code which took the RangeStart/RangeEnd and modified it to be a date.
When testing this out it kept on defaulting back to a DateTime.
My current proposed solution below is to ensure that you can create a DateKey column from your SQL Server Source. Once you have created the DateKey column the above steps will then work.
In summary I have shown how I changed the incremental refreshing from using the defaults of DateTime to use the DateKey. In doing so I was still able to configure and get the incremental refresh working.
If anyone has got any suggestions on how to get this working, please let me know and I will update my blog and obviously give you the credit!
Thanks for reading and if you got any questions or comments please let me know.