Data Loading – Part 3 | Migrating AAS to PPU
Welcome to the third instalment of my series on migrating AAS to PPU.
In this blog post I am going to cover what the difference to load data into AAS and how to load data into PPU and the associated performance.
Here are the previous 2 parts I have completed in the series.
Loading Data into AAS
Currently when loading data into AAS, this is done via a Power BI Gateway which is configured and setup via the Azure Portal.
When the data is loaded it transferred from the data source it depends on the source of the data.
As far as I understand, currently all data might traverse via the Power BI Gateway to ingest data into AAS if it is either an On-Premise’s data source or a combination of cloud and on-premises data source.
If it is only a cloud data source, then the gateway will NOT be required.
Loading data into PPU
When data is loaded for PPU, it depends on the data source on how it is loaded into the dataset.
This is because the dataset resides in the Power BI Service, it will then use the same mechanisms as I would use for a Power BI Pro dataset.
What this means is if my data source is for example an Azure SQL database, I will not require a Power BI Gateway Server to get the data into my PPU dataset.
Likewise, if I had my data coming via an ODBC connection and Azure Blob storage, I would then require having a Power BI Gateway Server due to the current configuration required for the dataset to refresh. If the data source was ONLY Azure Blob storage it would not require a Power BI Gateway.
Data Loading Performance
One important aspect when loading the datasets is how fast can I get the data loaded into the dataset.
I have found that when working with large datasets where my dataset is over 30GB of compressed data, I need to make sure I can get the data as fast as possible into the dataset.
Currently when loading data in AAS, I am dependant on the Power BI Gateway which involves the hardware of the Gateway Server, bandwidth, and location.
- The better the hardware the quicker it can process the data in the mashup engine.
- The faster the bandwidth, the faster it can send the data to AAS.
Finally, the tricky part is do I put the Power BI Gateway Server as close as possible to the data source, which means getting the data quickly into the mashup engine?
- Or do I put the Power BI Gateway Server as close as possible to AAS, which will get the data into AAS faster?
- What I have found is that I prefer having it closer to the data source, typically bandwidth is always constrained.
- Having the Power BI Gateway as close as possible to the data source, means I can query the data and get it processed by the mashup engine as fast as possible.
Now depending on how the data is being loaded for PPU, if it is using the Power BI Gateway it will have the same considerations as above.
But if it is loading data directly, then it potentially should be quicker because it does not have to load via the gateway and is quicker.
If the data is located in the same tenant it will then load a lot quicker than via the Power BI Gateway, likewise if it is located in a different region there will be network latencies which will affect the data loading speed.
Real World data loading
I have been working with a customer where I have got data in AAS and in PPU for the same dataset.
What I have found is that when the data is loading it is very similar in terms of how long the data takes to load.
With one of my customers as an example the data was being curated in Asia, whilst the business was running things from Australia. By hosting AAS/PPU where the data was curated meant that the data loading was significantly faster. Yes while the reports would have to access the data across the ocean, this only sends the results, so the performance of the reports was and is still blazingly fast!
I will caveat this by saying that both AAS and PPU must use the Power BI On-Prem Gateway for the data sources, so this could explain why they are similar in terms of data loading performance.
It should also be known that I am loading a fair amount of data each day (a few million rows) with incremental refreshing enabled to ensure that I do not have to reload the entire dataset.
In summary when comparing the data loading performance between AAS and PPU, they are comparable and will certainly not have any impact when migrating from AAS to PPU.
I hope that you found this useful and if you have got any comments, suggestions or queries please let me know.
More information on the supported data sources can be found here: Power BI data sources – Power BI | Microsoft Docs
Thanks for reading!