In this blog post I am going to explain how to query multiple Lakehouse tables into a data frame.

The example I am going to use is when you want to load new data into your staging tables, but you need to know the max date from your previous data load.

I am going to build on my previous blog post where I got data from an API into the Lakehouse: How to use SQL Code to write to Delta Tables in a Python Notebook – FourMoo | Microsoft Fabric | Power BI

If you are interested in more details on why I am using a Python Notebook please read this blog post: What are the advantages of using Python Notebook vs a Spark Notebook in Microsoft Fabric – FourMoo | Microsoft Fabric | Power BI

In my previous blog post I explained how to use SQL code to read from Delta tables in Microsoft Fabric which can be found here: How to use SQL Code to read from Delta Tables in a Python Notebook – FourMoo | Microsoft Fabric | Power BI

This blog post details how to use Semantic Link Labs to use a DAX Query to write to a Lakehouse table: Using a Python Notebook using Semantic Link Labs to write a DAX Query output to a Lakehouse Table – FourMoo | Microsoft Fabric | Power BI

At the end of this blog post I will have a link to the Python notebook if you want to try this code out in your own environment.

How to query multiple tables

In my working example I want to get the last time I loaded weather data and join it to my table called “datehour” so that I can get the Date column.

Below is the code with an explanation.

  1. I am creating a variable for my first table. called “datehour”
    1. NOTE: I created a variable with the same name as my table name. This makes it easier to troubleshoot.
    2. I can also create the SQL Query using Lakehouse EndPoint
  2. I am creating a variable for my second table called “weather_London_dt”
  3. I register the table using DuckDB called “datehour”
    1. You will notice in the CREATE TABLE statement, I am once again using the same name as the table name.
    2. NOTE: This CREATE TABLE statement is where you are creating the name to be used with DuckDB.
  4. I am registering the second table using DuckDB called “weather_London_dt”
  5. This is where I am creating the query that I want to run.
    1. In this step I am using my table called “datehour” which I created in step 3
  6. I am then joining to my table called ” weather_London_dt”, which I created in step 4

Validating the Query

I then validate the query to make sure that the data is returned.

I could also double-check what the date is by using the code below.

A computer code with black text

AI-generated content may be incorrect.

Using the Max Weather Date to get the updated weather

Next, I am now using the Max Weather Date from my previous query to run using the weather API to get the updated information.

A screenshot of a computer program

AI-generated content may be incorrect.

I can now see the weather for my Max Weather Date

A screenshot of a computer screen

AI-generated content may be incorrect.

The final step is to then take this data frame and load it into my Lakehouse table, which I have explained in my previous blog post How to use SQL Code to write to Delta Tables in a Python Notebook – FourMoo | Microsoft Fabric | Power BI

Summary

In this blog post I have shown you an example of querying multiple tables from the Lakehouse, use the output to query an API

In the next blog post I will show you how to loop through data, combine the loop and only write once to your Lakehouse table.

You can find a link to the notebook I used above here: Fabric/Blog – Python – DuckDB – Querying Multiple Tables.ipynb at main · GilbertQue/Fabric

Thanks for reading, any comments or suggestions please let me know!