In this blog post I am going to show you how I completed the automated testing and then the results where I am going to compare Direct Lake, Import and DirectQuery and which one appears to be the best.

As always, your testing may very or be different to my tests below.

I would highly recommend that you use the method I have used and apply this testing technique to your data to understand and validate which semantic model would be best for your capacity.

Testing

As mentioned in my previous blog post I created identical semantic models and used the automated testing with the Notebook.

I completed the tests multiple times to ensure that the results I got were consistent.

The testing criteria was the following:

  • 5 concurrent users
  • Loop through the DAX queries twice
    • NOTE: I had put in Distinct Count measures as this is very often what I find leads to slower performance overall. This is also often valid for many customers that I work with who want distinct counts.
  • It was run when the capacity was not under load.

Multiple Tests were run

I ran the test multiple times, to ensure that the results were consistent.

To be honest it took me a while to get it all right where it would run successfully, and I did get it right as shown above.

It would typically take about 5 hours to run.

NOTE: The reason for the 5-minute wait, was so that I could then easily identify when each test was taking place.

Testing Order

What I did was to run the tests in the order below

  • DL_Default
    • ENGINE: Direct Lake
    • INGESTION: Spark with Default settings
  • DL_Vorder
    • ENGINE: Direct Lake
    • INGESTION: Spark V-Order applied
  • DL_adaptiveFileSize
    • ENGINE: Direct Lake
    • INGESTION: Spark Adaptive File Size
  • DL_readHeavyForPBI
    • ENGINE: Direct Lake
    • INGESTION: Spark with Read Heavy for Power BI
  • DL_readHeavyForSpark
    • ENGINE: Direct Lake
    • INGESTION: Spark with Read Heavy for Spark
  • DL_writeHeavy
    • ENGINE: Direct Lake
    • INGESTION: Spark with Write Heavy for Spark
  • Import11Years1Month
    • ENGINE: Power BI Analysis Services Engine
    • INGESTION: Import mode with Incremental Refresh
  • ImportAll
    • ENGINE: Power BI Analysis Services Engine
    • INGESTION: Import all tables as is
  • ImportDaily
    • ENGINE: Power BI Analysis Services Engine
    • INGESTION: Import mode with Daily Partitions
  • ImportOptimized
    • ENGINE: Power BI Analysis Services Engine
    • INGESTION: Import mode with Daily partitions and optimised by removing unwanted columns and IsAvailableMDX set to false where applicable.
  • WH_Clustering
    • ENGINE: Direct Query on Warehouse
    • INGESTION: Used Data Clustering on Date column
  • Warehouse_DirectQuery
    • ENGINE: Direct Query on Warehouse
    • INGESTION: Default settings

All-important Results

Overview – Query Duration

As you can see in the chart below, first what I am showing is how long each semantic model took.

This is an overall view which consists of the Duration that the queries took to complete.

This was interesting because I could see which semantic model was the most performant.

Or another way to look at this is Direct Lake / Import / Direct Query

OBSERVATIONS

  • It appears that while there are some differences, it is easy to see which has got the lowest query duration. This is the Import Models.
    • Import took about 174,000
  • Followed by Direct Lake
    • Direct Lake took about 247,000
  • Finally Direct Query.
    • Took about 724,000

What I wanted to show below is where the time is spent on the different Semantic Models.

As you can see for Direct Query a significant amount of time is spent on the Total Duration, because all the heavy lifting is done in the Warehouse.

NOTE: The CPU Time for the Power BI Analysis Services engine can be run in Parallel so that is why the timing might not add up.

Overview – Interactive Capacity Consumption

Below is what was happening from a Capacity Consumption perspective.

The reason I have put in the time buckets is that represents what was happening on the Capacity.

Or another way to look at this is Direct Lake / Import / Direct Query for Capacity Consumption.

OBSERVATIONS

  • All the Direct Lake semantic models were in the range of 80% or more.
  • The Import semantic models were between 30 – 100%
  • Direct Query is measured as a Background process, so as to be expected it was less than 1% of interactive capacity.

Just to be complete in my overview, below is what the Capacity Consumption was like when comparing both Interactive and Background Capacity usage.

Final Result

Thanks for following along so far, and here is the final result, which I am assuming you would have gathered already as to what was the fastest and used the least amount of capacity units.

I had to combine the Interactive Capacity and query duration together with the 5-minute interval to be able to overlay the two measures.

OBSERVATIONS

  • Overall, the lowest combination was the Import Mode
  • The Direct Lake V-Order was the best overall
  • And even though Direct Query did not consume much capacity it was by far the longest in terms of query duration.
    • It does get a little confusing on this chart as the query duration peaks at a similar point to where the Interactive Capacity is.

Summary

As shown above with all my testing Import mode was the fastest closely followed by Direct Lake.

It did take me a significant amount of time to get it all in place, then get the testing completed and repeatable

I am thinking about also looking into including time to ingest the data for DL and refresh time for Import and the CUs which also factor into overall how this consumes CUs

Any other suggestions for what to test or other semantic model combinations please let me know. One other option I am thinking of is a combination of DL for fact table and Import for Dimension tables?

Any comments or suggestions on what I should create, and test next please let me know in the comments.

Finally, if you would like all the pieces that I put together for this testing also let me know as I think it would be useful for others to be able to complete load testing on your own data.