I was recently part of a discussion (which I have heard of multiple times), which was which semantic model to use in Microsoft Fabric.

This was the source for this blog post where I am going to compare Microsoft Direct Lake (DL) to an Import Semantic Model. The goal is to first explain how I set up and configured the comparison.

And in the next blog post I will show the tests and the outcome based on my testing.

Why do the comparison

There are now multiple different configurations that can be used for Direct Lake or Import Models, so this makes it difficult to decide which one to use.

To complete the comparison, I need to find a repeatable pattern in which I can monitor the following metrics below to do an accurate comparison.

Time taken for the query to complete
CPU duration (How much time is the CPU taking)
Capacity Units consumption (How much capacity is the semantic model using)

There are also other factors to consider such as refresh overhead, cached query, optimizations of the semantic models.

Setup for Load Testing

Below is how I set up the environment for me to load test the different semantic models, where I can repeat this across the semantic models and automate the process to ensure that the load test metrics are accurate.

Fabric Setup

I created a new App Workspace, which I then created a Lakehouse (with Schema enabled) and Warehouse which I will be using for the Direct Lake Semantic Models.

The App Workspace is using an F64 Fabric Capacity.

Data source

First I wanted to get a data source that was big enough to simulate enough load testing to stress out the capacity enough to be able to compare.

I went to SQL BI and downloaded the CSV source data from here: Contoso Data Generator – SQLBI

I downloaded the 100M sample.

The reason for the CSV source is that it would then allow me to load it into a Lakehouse or Warehouse in the different configurations I wanted to test.

This was then extracted and uploaded to OneLake.

Creation of the Initial Semantic Model

I first loaded the CSV data into the Lakehouse by reading the CSV files and loading them into the default schema in my Lakehouse

This was done using a Python Notebook.

From here I then used Power BI desktop to import the data and configure the semantic model where I created the following below which was the replicated to each semantic model.

Tables and relationships shown below.

Here is a list of the measures created.

A special note to the [Total Customers] which is doing a distinct count on CustomerKey on the Sales Fact table, which has got 100 million rows of data.

How to test the semantic models consistently

To test the semantic models consistently I used the performance analyzer to get a list of interactions from a Power BI Report.

This is critical to ensure that I could automate the testing and have the same queries where the only difference is the semantic model.

This is based off my previous blog post where I explained how to create the performance load test JSON file: Automating Power BI Load Testing with Fabric Notebooks – Part 1: Capturing Real Queries – FourMoo | Microsoft Fabric | Power BI

This is what the pages looked like below which were used to generate the queries. As you can see I did not make them look very pleasing as I wanted to generate queries from the visuals.

Semantic Models that were tested

Below is a list of all the semantic models that were tested. I am certain that there are more options that could be tested, but that would mean I would be testing and setting it up which would take even more time than it currently took me.

Import Semantic models

Below are the import semantic models I created with details using Power BI Desktop

Import11Years1Month
- This is where I am using Incremental Refresh and configured it with the 11 years of history and refresh the last month.
ImportAll
- Imported all the data with NO Incremental refresh.
ImportDaily
- This is where I am using Incremental Refresh and configured using the Day function for about 3400 days and refreshing the last 7 days
ImportOptimized
- This is using the Day function for about 3400 days and refreshing the last 7 days.
- I removed unused columns and set the “Available In MDX” to false where applicable.

Direct Lake Semantic Models from the Lakehouse

Below are the Direct Lake semantic models I created with using the Online Experience

I first had to load the data in the different formats using Python Notebooks.

Here is a summary of resource profiles which I used to create the different Lakehouse tables: Configure Resource Profile Configurations in Microsoft Fabric – Microsoft Fabric | Microsoft Learn

As shown below I created a different schema for each of the resource profiles in the Lakehouse

DL_adaptiveFileSize
- This is where it is using the new adaptive file size feature when loading the data into the Lakehouse table
DL_default
- This is where it is loading the data with the default configuration into the Lakehouse table
DL_readHeavyForPBI
- This is where it is loading the data so that it is optimized for read transactions from Power BI into the Lakehouse table
DL_readHeavyForSpark
- This is where it is loading the data so that it is optimized for read transactions from Spark into the Lakehouse table
DL_vorder
- This is where it is using the Vorder algorithm for Power BI queries into the Lakehouse table
DL_writeHeavy
- This is where it is loading with the Write heavy for Spark into the Lakehouse Table.

Direct Lake Semantic Models from the Warehouse

Below are the Direct Lake semantic models I created with using the Online Experience for the Warehouse.

To do this I used a copy job to copy the data from the Lakehouse to the Warehouse.

I then created the semantic models and made sure that the Direct Lake Behaviour was set to DirectQuery Only

Warehouse_DirectQuery
- This is where it is the default loading of data into the Warehouse
WH_Clustering
- This is where I used the new data clustering feature in the Warehouse.
- Here is the link: Data Clustering in Fabric Data Warehouse – Microsoft Fabric | Microsoft Learn

Testing the Semantic Models

The final step is to test the Semantic Models.

To do this I used the load test that was created by Phil Seamark which you can find the details here: Automating Load Testing Setting Up Your Fabric Lakehouse and Notebooks – Part 2 – FourMoo | Microsoft Fabric | Power BI

I modified the Notebook to accept a parameter for the Semantic Model name, and the underlying Python code to use this parameter to run the load test.

I then created a pipeline which loops through all the semantic model names and runs the performance load test.

Summary

Wow, that was a lot longer than what I planned.

In this blog post I have shown you how I setup and configured the load testing in a way that I can automate the testing.

The next blog post is where I will look at comparing the different semantic models to compare the differences between performance and capacity consumption.

Thanks for reading, comments are most welcome!

Comparing Microsoft Direct Lake vs Import– Which Semantic Model performs best?

Why do the comparison

Setup for Load Testing

Fabric Setup

Data source

Creation of the Initial Semantic Model

How to test the semantic models consistently

Semantic Models that were tested

Import Semantic models

Direct Lake Semantic Models from the Lakehouse

Direct Lake Semantic Models from the Warehouse

Testing the Semantic Models

Summary

Related

Tags In

Leave a Reply Cancel reply

Comparing Microsoft Direct Lake vs Import– Which Semantic Model performs best?

Why do the comparison

Setup for Load Testing

Fabric Setup

Data source

Creation of the Initial Semantic Model

How to test the semantic models consistently

Semantic Models that were tested

Import Semantic models

Direct Lake Semantic Models from the Lakehouse

Direct Lake Semantic Models from the Warehouse

Testing the Semantic Models

Summary

Share this:

Related

Tags In

Related Posts

Using an IN Condition for multiple values in Power Query (Power BI)

How to create an Excel table from a DAX (Power BI) query to be used for Data Validation

Understanding what is consuming your Fabric Storage

Leave a Reply Cancel reply