Microsoft Fabric: Why Warehouse Beats Lakehouse by 233% in Speed and 278% in Capacity Savings
After my previous blog post on the different semantic model options and at the same time working with a Fabric customer, it got me thinking which is faster and which consumes less capacity when ingesting data into Power BI either via the SQL Endpoint to a Lakehouse or a query from the Warehouse.
Below you will find the information which I found very interesting indeed.
Here is a link to the data I used from the SQL BI Data Generator: Release Ready to use data · sql-bi/Contoso-Data-Generator-V2-Data
For both the Lakehouse and Warehouse source CSV’s there was a total of 237,245,585 rows.
NOTE: As always, your results may vary so always test on your own data, but this is a good indication below.
Loading data from CSV
The first step was to load the data from a CSV file into the Lakehouse and Warehouse
Here is the code that I used to load the data into Lakehouse

Shown below is how long it took along with the CUs
![]()
Next, I was to load the same data into the Warehouse, below is the code that I used

Shown below is how long this took

A quick comparison
| Item Type | Duration | CUs |
| Lakehouse | 588 | 14,464 |
| Warehouse | 88 | 1,923 |
As shown above the Warehouse ran 500 seconds faster or over 6.6x (660% quicker).
When looking at the Capacity Units comparison the Warehouse used 12,541 less capacity units or consumed 7.5x (750% less capacity units)
Transforming from Staging into Aggregated table
Next, what I did was to use the staging table to create an aggregated table (something typically done when moving from Bronze to Silver)
Below is the Spark SQL Code that I ran

Shown below is how long it took along with the CUs

I then ran the same query on the Warehouse inserting into a Warehouse table.

Shown below is how long it took along with the CUs

A quick comparison for loading the aggregated data.
| Item Type | Duration | CUs |
| Lakehouse | 255 | 2,818 |
| Warehouse | 55 | 1,727 |
As shown above the Warehouse ran 200 seconds faster or over 4.6x (460% quicker).
When looking at the Capacity Units comparison the Warehouse used 1,901 less capacity units or consumed 1.6x (160% less capacity units)
Query Comparison of LH vs WH and WH with Clustering
The final comparison I wanted to compare was to load from Lakehouse, Warehouse and Warehouse Clustered tables into my Semantic Model using Import Mode.
It did take some time to run, because it was importing 100 million rows.
All the details below used the default navigation of finding the table using Power Query with the SQL Server connection.
Lakehouse query below of CUs

Warehouse Query

Warehouse Query on Clustered table

A quick comparison for loading the aggregated data.
| Item Type | Duration | CUs |
| Lakehouse | 2,826 | 126,697 |
| Warehouse | 1,434 | 48,098 |
| Warehouse Clustered Table | 1,519 | 52,875 |
As shown above the Warehouse ran 1,329 seconds faster or over 1.97x (197% quicker).
When looking at the Capacity Units comparison the Warehouse used 78,599 less capacity units or consumed 2.6x (260% less capacity units)
While the Warehouse Clustered table ran faster than the Lakehouse the Warehouse was still the quickest.
Summary
When comparing the ingestion of the CSV file, creating an aggregated table and then importing the table into the semantic model the clear winner is the Warehouse as shown with the overall numbers below.
| Process | Lakehouse Duration | Warehouse Duration | % Diff (LH vs WH) |
| Loading CSV | 588 | 88 | 668% |
| Table Aggregation | 255 | 55 | 464% |
| Query into Semantic Model | 2,826 | 1,434 | 197% |
| TOTALS | 3,669 | 1,577 | 233% |
| Process | Lakehouse CUs | Warehouse CUs | % Diff (LH vs WH) |
| Loading CSV | 14,464 | 1,923 | 752% |
| Table Aggregation | 2,818 | 1,727 | 163% |
| Query into Semantic Model | 126,697 | 48,098 | 263% |
| TOTALS | 143,979 | 51,748 | 278% |
Thanks for reading I hope you found this insightful!
Hi Gilbert,
The article is really interesting. We need a Data Lakehouse and not Warehouse. It is Data Lake + Data Lakehouse. I already published the blog on this as well.
Yeah I always say it depends on use case and what the requirements are!
Nice post
Thank you!
[…] Gilbert Quevauvilliers runs a test: […]
Do you think this applies to all type of ingests? We are ingesting data from an on premise SQL via a pipeline into a lakehouse, and then move the data to silver in a warehouse.
If what I think is the case then we should ingest into a warehouse, and go from there?
Hi Kobe,
This is certainly something that I would explore to see if it is faster and uses less capacity units, especially since you’re actually landing up in a warehouse. Anyway, if you need any help or suggestions, please let me know.
[…] show the impact I want to highlight a benchmark that included Fabric Spark on a single 19GB CSV input file (100M Contoso dataset, sales table) for […]
Are you able to share the files you used for your benchmark?
Hi Richard, I have updated the blog post, you can find a link to the files here: https://github.com/sql-bi/Contoso-Data-Generator-V2-Data/releases/tag/ready-to-use-data
Thank you!
Hi,
which capacity did you used for testing?
I used an F64 Capacity
[…] You can view it here for reference: Microsoft Fabric: Why Warehouse Beats Lakehouse by 233% in Speed and 278% in Capacity Savings –… […]