I really enjoy a good challenge, and with my customer they have all their data stored in AWS S3. Whilst there is no native connector, I thought there must be a way for me to get the data from AWS S3 into Power BI.

I did a bit of Googling and could not find any suitable solution. I also found and learnt that I could use AWS Athena to query the data living in S3. (I am definitely NOT an expert of have a lot of knowledge in the AWS space. I am fortunate that I have other people who know AWS and were able to setup, configure and give me the details to connect to S3 via AWS Athena)

Below are the steps on how I got this working.

Connection Details required

Before I could complete the steps below, I did require the following information from the people who had setup AWS Athena

  • AWS Region
  • Schema
  • Workgroup
  • S3 Output Location
  • Encryption Options
  • User (Access Key ID)
  • Password (Secret Access Key)

Installation and configuration of Athena ODBC Driver

The first step is to download and install the driver from the link below.

https://s3.amazonaws.com/athena-downloads/drivers/ODBC/SimbaAthenaODBC_1.0.5/Windows/Simba+Athena+1.0+64-bit.msi

NOTE: You can find the latest version of the driver here: Connecting to Amazon Athena with ODBC

Configured 64bit ODBC using the Simba Athena Driver

  • Opened the ODBC Data Source Administrator (64-bit)
  • Clicked on Simba Athen and then configure
    • I then put in the following details
      • AWS Region
      • Schema
      • Workgroup
      • S3 Output Location
      • Encryption Options
  • Next, I needed to update the Authentication Options for the authentication
    • The user (in my working example was called the “Access Key ID”
    • The password (in my working example was called “Secret Access Key”)
  • The final step in order to Power BI to query the data efficiently was to then click on Advanced Options
    • I changed the Rows to Fetch Per Block to be 1000
    • I then removed the tick from “Use Resultset Streaming”
  • Once I had put in all the details, I then clicked on Test . . . to test and make sure that the connection was successful.
    • And it was successful

Connecting to Athena in Power BI Desktop

Below are the steps to connect to Athena in Power BI Desktop

  • I opened Power BI Desktop, then clicked on Get Data and selected ODBC
  • On the From ODBC Source I clicked on the Data source name (DSN) and selected Simba Athena
    • I then clicked Ok
  • On the ODBC Driver window, I clicked on Database and put in the following
    • The user (in my working example was called the “Access Key ID”
    • The password (in my working example was called “Secret Access Key”)
    • I then clicked Connect
  • I was able to then see the Navigator and select the area of the data that I wanted to connect to
  • I could then load my data either into the Power Query Editor to transform it or load it into the data model.
  • Once I was happy, I then uploaded the PBIX to Power BI

Configuration the Power BI Service Gateway to connect to Athena

Below are the steps to configure the Gateway in the Power BI Service to connect to Athena

  • I logged into the Power BI Service and clicked on Managed Gateways
  • Then clicked on Add New Data Source
  • I then gave it a Data Source Name, and changed the Data Source Type to be ODBC
  • Then I put in the following into the Connection string
    • “driver=Simba Athena ODBC Driver;dsn=Simba Athena; data source=AwsDataCatalog”
    • From the above I am using the details from the ODBC window as shown previously.
    • I also made sure to put in my data source name, which once again I had selected previously when connecting to Athena
  • Then for the Username and password I put in the following:
    • The user (in my working example was called the “Access Key ID”
    • The password (in my working example was called “Secret Access Key”)
  • And this is what it looked like
    • This is the ConnectionString Used

      “driver=Simba Athena ODBC Driver;dsn=Simba Athena; data source=AwsDataCatalog;RowsToFetchPerBlock=1000”

  • I then clicked on Add
  • I could then see that the connection was successful
  • The final step was for me to map the dataset to the Gateway.
    • I went into the dataset settings and then under Gateway connection I selected the Gateway server and mapped it to “Athena” (The data source I created previously)
    • I then clicked Apply
    • I could then see the it was successfully updated.
  • I then manually refreshed the dataset to make sure it is working as expected, which I could see that the refresh was successful.