In this post I will show how I used GitHub Copilot CLI / Agent mode in VS Code to create a Microsoft Fabric pipeline that checks Lakehouse table health and only optimizes tables that require maintenance. I’ll also show the prompts I used, the issues I ran into, and how Copilot helped me resolve them.

Recently Microsoft announced Lakehouse table health. The post showed how you can check can check the health of your Lakehouse table in a very simple way.

Lakehouse table health is useful because Delta tables can become less efficient over time, especially after frequent writes or ingestion processes. Maintenance can help by compacting small files, applying read optimizations such as V-Order, and cleaning up old unreferenced files with VACUUM.

It also highlighted that it would be possible to only work on the tables that need to be optimized.

That got me thinking, “Could I get this done using GitHub Copilot?”

The answer is YES, you can and below I will show you how I did this.

This also got me thinking about when I was first getting into Business Intelligence (Yes that is what it was called many many years ago 😊) and how I often had to learn by getting a completed project and I would then work backwards to re-create it. Fortunately, I know how to do this myself, but using GitHub Copilot allows someone to learn by viewing the output and then working backwards.

For this post I used the Copilot CLI agent experience inside VS Code. This allows Copilot to work through a task, update files, and iterate on fixes from within the editor.

Prerequisites

Before starting, I had the following configured:

Visual Studio Code
GitHub Copilot enabled
Copilot CLI / Agent experience available in VS Code
Access to a Microsoft Fabric workspace
A Lakehouse with existing Delta tables
Sufficient permissions to create and run Fabric pipelines and notebooks

NOTE: I already have got GitHub Copilot and Copilot CLI installed within my Visual Studio (VS) Code.

When I opened VS Code I saw a new option to “Try out the new Agents window”, so I thought why not give it a go and see what it does.

This then opened a new Agent Window, I could see on the bottom left-hand side what I have access to.

Next, I want to keep it really simple to see how good it is. I put in the following prompt below.

It took a few minutes to complete but once completed I could then see what it had done as shown below.

This prompt used the following credits below (NOTE I am using the Auto Model option which includes a 10% discount)

When this was created it did it in a way where the pipeline was created but it was for a single table. I was already impressed with what it created.

I then wanted to loop through all the tables in my Lakehouse and run the Optimize if needed, which I did with the following prompt below.

This prompt used the following credits below.

This then ran for a few minutes and updated the pipeline as shown below.

What Copilot Got Wrong

This was also a good reminder that Copilot can get you most of the way there, but you still need to validate the generated pipeline and notebook logic. In my case, I had to fix the table-name handling, adjust for my older Lakehouse structure, and ensure the notebook received the correct table parameter.

I then ran the pipeline and to be honest it did have a few errors which I had to use the prompt fix which were:

It has got the table names incorrectly from the List which it had to fix.
I am using an older version of Lakehouse, which means it has no schema, so that next got fixed.
Next, the notebook it created was not updated to pass through the valid table name.

My final prompt was to make the notebook easy to read. When it was initially created it had the Python code in a single line.

The notebook started looking like this below.

As shown below I then simply asked for it to be easily readable.

And then it updated the notebook code as shown below.

It used the following credits below.

The final step was to run the pipeline which ran successfully and updated the required tables.

Summary

In this blog post I have shown you how to use Copilot CLI within VS Code to use a brand-new feature in Microsoft Fabric and create a working version.

Things to keep in mind:

Test the generated pipeline in a development workspace first.
Review all generated code before running it against production data.
Be careful with VACUUM retention settings because aggressive cleanup can affect time travel and rollback scenarios.
V-Order is more useful for read-heavy analytical tables than write-heavy staging tables.
If your Lakehouse uses schemas, table naming may differ from older Lakehouses.

Thanks for reading I hope you found this useful, comments or questions are always most welcome.

Using GitHub Copilot CLI to Build a Microsoft Fabric Lakehouse Table Health Pipeline

Prerequisites

What Copilot Got Wrong

Summary

Related

Leave a Reply Cancel reply

Using GitHub Copilot CLI to Build a Microsoft Fabric Lakehouse Table Health Pipeline

Prerequisites

What Copilot Got Wrong

Summary

Share this:

Related

Related Posts

Data Insights Summit Details – Day 1 Summary

BI-NSIGHT – Power BI (Service Update, External Sharing, Admin Portal, Web & Video Widget, Zooming) – SQL Server 2016 (Mobile Publisher Preview 2)

Power BI – How to do Pagination in Power BI Reports

Leave a Reply Cancel reply