Microsoft Fabric – Notebook session usage explained (And how to save CU’s or billed time)
I was working on a blog post to determine which consumed fewer Fabric Capacity Units (CU’s), and when I was initially testing this was getting some unexpected results.
In a future blog post I will compare a Dataflow Gen2 or Notebook and which one consumes less CU’s
In this blog post I’m going to explain the. Lessons are learned when we’re working with fabric notebook sessions.
I’m going to show you and explain how I initially configured and used the notebooks, which was consuming a lot more CU’s than what I expected.
And then how I updated my notebook session to use significantly less CU’s.
A special mention and a big thanks to the Microsoft Fabric team for assisting me in firstly understanding how notebook sessions consume CU’s, as well as then explaining how to significantly reduce the amount of CU’s consumed.
Thanks to Mim for advice on how to run a notebook which will automatically stop the notebook session once it has finished running.
To do this what I had to do was to create a new data pipeline.
As you can see below all that I had to do was to put in the notebook activity and then select my notebook I had created.
I then clicked on Schedule and created the schedule when I want it to run.
Once it ran, I then validated the details in the Fabric Metrics App, and I can see it only consumed CU(s) for the time it took the notebook to run as shown below.
In this blog post I have Demonstrated a way to save significant costs when using a fabric notebook.
To do this, I used a data pipeline to run the notebook for me, which automatically starts and then stops the notebook session.
I hope that you found this blog post useful. If you hve any questions or comments, please leave them in the section below, and here is to an awesome 2024.
PREVIOUS UNDERSTANDING BELOW
Running a notebook with default settings
What I mean when I say running a notebook as is, is that I was running a notebook with all the default settings.
When I was running the notebook, all that I was doing was opening the notebook and clicking run all on my code.
My assumption when doing this was that the notebook would run the code, and then once it was finished executing, it would say it completed successfully and it would then stop.
What are then did was I went into the fabric metric app to have a look to see how many capacity units my notebook had consumed and as you can see below, it was. Consuming a lot more capacity units than what I had expected to.
As you can see above this consumed 1320 seconds. Well, when I look at the notebook execution including the starting up of the spark session and the running of the code, it took less than three minutes as shown below.
This had me rather confused, because when I was looking Fabric metrics app it was not aligning with the notebook.
When I looked at the duration for the Dataflow Gen 2 this aligned in both the data flow refresh history as well as the fabric metrics app. So, this got me thinking that I must be doing something wrong in the notebook.
As shown below I am showing the refresh history in the Dataflow Gen2 and Fabric metrics app.
Dataflow Gen2 refresh history.
Changes that I made to the notebook to save on CU’s.
After getting more details from the Microsoft Fabric team I made some changes to my notebook which allowed me to save a significant amount of CU’s consumption.
What I learnt was that when I start running a notebook by clicking on the “Run All” button it starts the Spark session.
Once the Spark session starts, I can see this in the notifications window (I am running an F2 Fabric Capacity and this is why it takes a while to start and this will be fixed in the future to start quicker )
Now what I assumed happened is when the notebook completed successfully it meant that the spark session also stopped.
THIS IS NOT THE CASE.
As explained in the link below, what happens is that unless I click on the “Stop Session” button as shown below, the Spark session is still running and will run for 20 minutes before it automatically is stopped.
When I was running my notebook with the default settings, I WAS NOT stopping the session that is why it had a duration of 1320 seconds.
As shown below what I then did was to run my notebook and as soon as it was completed, I then clicked on “Stop Session”
It now ONLY consumed 67 seconds. This is 95% less CU’s consumed!
At the time of writing this blog post in Jan 2024 there is no way to programmatically stop the Spark session within the notebook. I did try a few things but none of them worked.
I am certain that in the future this will be possible.