Power BI Premium Performance – Part 3 | How does a dataset refresh affect performance?
In the third part of my blog series on Performance, this time I am going to show how a data refresh can affect performance.
Below are the high-level steps as to what happens when you refresh a dataset
- The first step is it creates a blank copy of your existing dataset.
- It then loads all the updated data into this dataset.
- Once it is finished it then very cleverly swops out the current dataset with the updated dataset
This means that in order to a complete dataset refresh I will need twice the amount of memory to complete it successfully.
If this is your first time here, you can read about just first blog post in the series
- Part 1 | How to use the new XMLA End Point to see how much memory your datasets are using
- Part 2 | Optimization of your Power BI Model
- Part 4 | How does Row Level Security affect performance?
- Part 5 | How much memory will the PBIX consume?
NOTE: In order to get more granular details, I have done all my metrics using Azure Analysis Services, which allowed me to use the Log Analytics to capture the events I require.
What happens to the memory during a dataset refresh?
In the chart below I am going to show you what happens when I do a complete refresh of my dataset
As shown above before I completed any dataset refresh my dataset was using 436.91 MB
Next when I completed a complete dataset refresh it then spiked up to 981.39 MB which is just more than double the current memory that it was consuming.
As far as I am aware it has been suggested that it will consume typically at least the current memory of your dataset.
It is intersesting to see that after the refresh it consumed more memory than before.
And a subsequent refresh then also used just over double the amount of memory for the dataset refresh
What does this mean for me and the performance in Power BI Premium?
That the above demonstrates is that when you are doing a complete dataset refresh you will need to ensure that you have at the very least double the amount of memory available in your premium capacity for the dataset refresh to complete successfully.
For me to monitor this in Power BI Premium I can then use the Power BI Premium Capacity metasas App.
I would also like to refererence the whitepaper Deploying and Managing Power BI Premium Capacities, in the section “Keeping Datasets Up To Date” where it goes into more detail on how to view this.
How can I ensure that my dataset refresh uses the least amount of memory possible?
Fortunately, there is an alternative to using less memory.
And it is using Incremental Refresh which is available in Power BI Premium already.
I am not going to cover the details on how to configure the incremental refreshing, rather here is a link on how to achieve this Incremental refresh in Power BI Premium
Below I want to display how using incremental refreshing can affect the amount of memory used for a dataset refresh.
I did modify my data model so that the memory usage was different, this was to highlight the memory difference when using incremental refreshing.
As shown above the following memory was allocated for differences when refreshing
UPDATE: 02 May 2019
I was fortunate enough to run this past Christian Wade who answered some of my questions, which allowed me to update the details below.
Full Dataset refresh
This consumed the most amount of memory for the longest period.
As to be expected because it refreshed the entire dataset.
Entire Fact Table refresh
This consumed a fair amount of memory, this is because it is by far the largest table in my dataset. And it appears to consume a lot of memory due to the dictionaries that are used in the processing
But the duration of how long it consumed the memory was shorter.
Historical Partition refresh
I then refreshed the historical partition which would be data that has not changed over time.
As you can see, this consumed quite a lot of memory almost as much as the entire dataset refresh.
UPDATE: This could be due to the historical partition having to duplicate the dictionaries while the new data is being fetched.
Current Partition refresh
The final refresh was where I only refreshed the current data in my dataset.
Once again, the amount of memory required for this refresh was by far the lowest and once again it was very quick.
As I have shown how refreshing your Power BI dataset can affect the amount of memory required for this to complete successfully.
I also demonstrated that by using incremental processing I significantly reduced the amount of memory required for the dataset refresh. One thing to remember is that your mileage will always vary based on the structure of the tables, as well as how large your data model is, relationships and calculated columns.
This then allowed me to have be able to consume the extra memory for other requirements in Power BI Premium.
I hope that you found this blog post insightful and you learnt something new.
As always thanks for reading and if you got any comments or questions please leave them in the section below.
In my next blog post I am going to look at how Row Level Security affects the performance.