After my previous blog post on the different semantic model options and at the same time working with a Fabric customer, it got me thinking which is faster and which consumes less capacity when ingesting data into Power BI either via the SQL Endpoint to a Lakehouse or a query from the Warehouse.

Below you will find the information which I found very interesting indeed.

For both the Lakehouse and Warehouse source CSV’s there was a total of 237,245,585 rows.

NOTE: As always, your results may vary so always test on your own data, but this is a good indication below.

Loading data from CSV

The first step was to load the data from a CSV file into the Lakehouse and Warehouse

Here is the code that I used to load the data into Lakehouse

Shown below is how long it took along with the CUs

Next, I was to load the same data into the Warehouse, below is the code that I used

Shown below is how long this took

A quick comparison

Item Type Duration CUs
Lakehouse 588 14,464
Warehouse 88 1,923

As shown above the Warehouse ran 500 seconds faster or over 6.6x (660% quicker).

When looking at the Capacity Units comparison the Warehouse used 12,541 less capacity units or consumed 7.5x (750% less capacity units)

Transforming from Staging into Aggregated table

Next, what I did was to use the staging table to create an aggregated table (something typically done when moving from Bronze to Silver)

Below is the Spark SQL Code that I ran

Shown below is how long it took along with the CUs

I then ran the same query on the Warehouse inserting into a Warehouse table.

Shown below is how long it took along with the CUs

A quick comparison for loading the aggregated data.

Item Type Duration CUs
Lakehouse 255 2,818
Warehouse 55 1,727

As shown above the Warehouse ran 200 seconds faster or over 4.6x (460% quicker).

When looking at the Capacity Units comparison the Warehouse used 1,901 less capacity units or consumed 1.6x (160% less capacity units)

Query Comparison of LH vs WH and WH with Clustering

The final comparison I wanted to compare was to load from Lakehouse, Warehouse and Warehouse Clustered tables into my Semantic Model using Import Mode.

It did take some time to run, because it was importing 100 million rows.

All the details below used the default navigation of finding the table using Power Query with the SQL Server connection.

Lakehouse query below of CUs

Warehouse Query

Warehouse Query on Clustered table

A quick comparison for loading the aggregated data.

Item Type Duration CUs
Lakehouse 2,826 126,697
Warehouse 1,434 48,098
Warehouse Clustered Table 1,519 52,875

As shown above the Warehouse ran 1,329 seconds faster or over 1.97x (197% quicker).

When looking at the Capacity Units comparison the Warehouse used 78,599 less capacity units or consumed 2.6x (260% less capacity units)

While the Warehouse Clustered table ran faster than the Lakehouse the Warehouse was still the quickest.

Summary

When comparing the ingestion of the CSV file, creating an aggregated table and then importing the table into the semantic model the clear winner is the Warehouse as shown with the overall numbers below.

Process Lakehouse Duration Warehouse Duration % Diff (LH vs WH)
Loading CSV 588 88 668%
Table Aggregation 255 55 464%
Query into Semantic Model 2,826 1,434 197%
TOTALS 3,669 1,577 233%
Process Lakehouse CUs Warehouse CUs % Diff (LH vs WH)
Loading CSV 14,464 1,923 752%
Table Aggregation 2,818 1,727 163%
Query into Semantic Model 126,697 48,098 263%
TOTALS 143,979 51,748 278%

Thanks for reading I hope you found this insightful!