Import tables from Google Sheets in Databricks (no authentication)

Nicolas Meseth
4 min readJul 12, 2020

Google Sheets are a simple tool to manage small amounts of data. I often use it for mapping tables to enrich another data set. In this article, I provide you with a template to import any table from Google Sheets into Databricks and query it with SQL. No authentication required.

I’ll use Twitter as an example for this short tutorial. Suppose you have a huge data set with tweets that you want to analyze with Spark and SQL using Databricks. Databricks is a cloud service for data science and offers a ready to use setup of Apache Spark with all its APIs, such as Spark SQL to query data with SQL. Databricks has a free version called Databricks Community Edition that allows you to create a single node cluster (is that even a cluster?) with the complete Spark stack installed. So, you’re ready to create a notebook with Scala, Python or R and analyze your data in less than 5 minutes.

Set up your table as a Google Sheet

In data analysis, we often want to enrich our data with additional information. An example for tweets is a table with extra information about the different Twitter users in the data set. For that, I create a new spreadsheet in Google Sheets and add columns for the extra information. In the example, it is the screen name, which is the key for the mapping, and two more columns, which provide additional information:

  • A column called category, to describe the type of user. For example, I could use this to distinguish the users into politicians, scientists, corporate accounts etc.
  • A column called party, which in this example I could use to further specify the party a politician belongs to, if that is important for my analysis.

Publish your Google Sheet and import my notebook template

You can imagine creating your table and adding all the columns you need. Once you have done that, you are only 3 small steps away from running SQL statements on that data:

  1. Publish your Google Sheet to the web as a .CSV file and copy the link Google provides.
  2. Import my notebook template that contains the script to import the table in Databricks.
  3. Paste your publish link, specify your table name, and run the script in the notebook template. After the code ran, you’ll have the new table in Databricks.

Publish the Google Sheet

To publish a Google Sheet to web, click on File → Publish to the web. Select Entire Document in the left dropdown and Comma separated values (.CSV) in the second. Click on Publish and copy the generated link.

How to publish a Google Spreadsheet to web in CSV format.

Import the notebook template

To make it as easy a possible, I created and published a Databricks notebook that you can import into your Databricks account. Just open the link below to open the notebook as an HTML file and click on the Import Notebook button in the upper-right corner:

The button opens a dialog with the URL to the notebook. Copy the URL, go to your Databricks account, and click on Import and choose URL when asked for the source of the notebook. Paste the URL of the notebook, click Import, and after 2–3 seconds you should have a copy of the notebook in your account.

Copy the sheet’s link and run the code snippet

The second code block in the notebook contains a Scala script to import the Google Sheet as a table. You first need to choose a name for your table by changing the value for the variable tableName. Then, paste the URL from your published Google Sheet as the value for the variable url. Run the code snippet. That’s it — you now have a new table containing the data from your spreadsheet.

Replace the yellow highlighted values with your own.

Update your data and reload the table

What I like about this approach is that you can quickly change the data in the spreadsheet and reload the table by calling the script again. This is very useful in an exploratory analysis, where you constantly add data to your mapping tables.

If you liked this tutorial and found it helpful I appreciate your referral and claps very much! Let me know if you have any questions by reaching out on Twitter at @nicolasmeseth.

--

--

Nicolas Meseth

Father of two boys. Professor in Computer Science Education.