After my previous blog post on the different semantic model options and at the same time working with a Fabric customer, it got me thinking which is faster and which consumes less capacity when ingesting data into Power BI either via the SQL Endpoint to a Lakehouse or a query from the Warehouse.

Below you will find the information which I found very interesting indeed.

For both the Lakehouse and Warehouse source CSV’s there was a total of 237,245,585 rows.

NOTE: As always, your results may vary so always test on your own data, but this is a good indication below.

Loading data from CSV

The first step was to load the data from a CSV file into the Lakehouse and Warehouse

Here is the code that I used to load the data into Lakehouse

Shown below is how long it took along with the CUs

Next, I was to load the same data into the Warehouse, below is the code that I used

Shown below is how long this took

A quick comparison

Item Type	Duration	CUs
Lakehouse	588	14,464
Warehouse	88	1,923

As shown above the Warehouse ran 500 seconds faster or over 6.6x (660% quicker).

When looking at the Capacity Units comparison the Warehouse used 12,541 less capacity units or consumed 7.5x (750% less capacity units)

Transforming from Staging into Aggregated table

Next, what I did was to use the staging table to create an aggregated table (something typically done when moving from Bronze to Silver)

Below is the Spark SQL Code that I ran

Shown below is how long it took along with the CUs

I then ran the same query on the Warehouse inserting into a Warehouse table.

Shown below is how long it took along with the CUs

A quick comparison for loading the aggregated data.

Item Type	Duration	CUs
Lakehouse	255	2,818
Warehouse	55	1,727

As shown above the Warehouse ran 200 seconds faster or over 4.6x (460% quicker).

When looking at the Capacity Units comparison the Warehouse used 1,901 less capacity units or consumed 1.6x (160% less capacity units)

Query Comparison of LH vs WH and WH with Clustering

The final comparison I wanted to compare was to load from Lakehouse, Warehouse and Warehouse Clustered tables into my Semantic Model using Import Mode.

It did take some time to run, because it was importing 100 million rows.

All the details below used the default navigation of finding the table using Power Query with the SQL Server connection.

Lakehouse query below of CUs

Warehouse Query

Warehouse Query on Clustered table

A quick comparison for loading the aggregated data.

Item Type	Duration	CUs
Lakehouse	2,826	126,697
Warehouse	1,434	48,098
Warehouse Clustered Table	1,519	52,875

As shown above the Warehouse ran 1,329 seconds faster or over 1.97x (197% quicker).

When looking at the Capacity Units comparison the Warehouse used 78,599 less capacity units or consumed 2.6x (260% less capacity units)

While the Warehouse Clustered table ran faster than the Lakehouse the Warehouse was still the quickest.

Summary

When comparing the ingestion of the CSV file, creating an aggregated table and then importing the table into the semantic model the clear winner is the Warehouse as shown with the overall numbers below.

Process	Lakehouse Duration	Warehouse Duration	% Diff (LH vs WH)
Loading CSV	588	88	668%
Table Aggregation	255	55	464%
Query into Semantic Model	2,826	1,434	197%
TOTALS	3,669	1,577	233%

Process	Lakehouse CUs	Warehouse CUs	% Diff (LH vs WH)
Loading CSV	14,464	1,923	752%
Table Aggregation	2,818	1,727	163%
Query into Semantic Model	126,697	48,098	263%
TOTALS	143,979	51,748	278%

Thanks for reading I hope you found this insightful!

Blog

Microsoft Fabric: Why Warehouse Beats Lakehouse by 233% in Speed and 278% in Capacity Savings

Loading data from CSV

Transforming from Staging into Aggregated table

Query Comparison of LH vs WH and WH with Clustering

Summary

Related

Tags In

Leave a Reply Cancel reply

Recent Posts

Blog

Microsoft Fabric: Why Warehouse Beats Lakehouse by 233% in Speed and 278% in Capacity Savings

Loading data from CSV

Transforming from Staging into Aggregated table

Query Comparison of LH vs WH and WH with Clustering

Summary

Share this:

Related

Tags In

Related Posts

Running and viewing Automated Load Testing Results – Part 3

Comparing Fabric Capacity Consumption – Notebook vs Warehouse SQL

How to get Sempy (Semantic-link) to run when being triggered from a data pipeline which runs a Notebook in Fabric

Leave a Reply Cancel reply

Recent Posts

Tags