How to add current DateTime to existing PySpark data frame in a Fabric Notebook

In the blog post below, I am going to describe how to add the current Date Time to your existing Spark data frame.

This is really useful when I am inserting data into a Fabric Lakehouse table, and I want to know when the data got inserted.

Here is my Pyspark data frame with some data loaded.

A screenshot of a computer

Description automatically generated

A screenshot of a computer

Description automatically generated

I then added the following to my notebook to create the additional column called “CurrentDateTime” as shown in line 15 below.

A screenshot of a computer program

Description automatically generated

This is then what it looks like when I run the cell, with the new column highlighted below.

NOTE: You will also see that the column data type is DateTime, so it will have this same data type when I query it with the SQL Analytics end point.

If you would like to get a copy of the code you can find it here: Fabric/Adding Date Time Column to Pyspark data frame.ipynb at main · GilbertQue/Fabric (github.com)

Summary

In this blog post I have demonstrated how to add the Current Date Time to your existing Spark data frame. I hope that you found this useful and can use this snippet in your notebooks.

Thanks for reading.