How to sync your customers' HubSpot data into Snowflake

Cloud data warehouses are popular analytical databases capable of storing and analyzing a vast amount of data. Many B2B SaaS companies today centralize their customers' data in data warehouses for analytics and machine learning use cases – which in turn power customer-facing features like data enrichment, lead scoring, customer segmentation, forecasting, and more.

Because the customer data often lives in the customer's CRM, it's important to sync CRM data to the cloud data warehouse continuously.

In this tutorial, we'll use Supaglue to sync your customers' HubSpot data into an AWS S3 bucket (using a service deployed on Render) and then load the data into Snowflake.

What is HubSpot?

HubSpot is a cloud-based customer relationship management (CRM) platform that helps organizations manage marketing, sales, and operations. At a 15,000-foot level, HubSpot helps your business to attract, convert, and close your customers.

What is Snowflake?

Snowflake is a cloud data warehouse (CDW) where you can store and analyze all your data records in one place. It can automatically scale up/down its compute resources to load, integrate, and analyze data.

What is Render?

Render is a fully-managed cloud platform that makes deploying your application as easy as pushing your code to source control.

Prerequisites:

1. A HubSpot App.

2. A HubSpot account to sync its customer data from.

3. A Supaglue cloud account, self-hosted, or local instance using docker-compose.

Overview

In this tutorial we will deploy a service (typescript-syncer) that listens for a webhook event from your Supaglue instance, when Supaglue finishes syncing data from your customer's CRM. Then it will upload the data to your AWS S3 bucket. Finally, Snowflake uses the S3 bucket as an external stage to load the data into the warehouse.

Step 1: Deploy typescript-syncer

typescript-syncer is a simple open-source Typescript application that uses Node.js Express as a web server and axios as a http client to paginate and sync all objects from Supaglue to S3. We'll deploy it using Render.

Step 1a: Use the link below to use a Render Blueprint to 1-click deploy it to the cloud:

Deploy to Render

Step 1b: Give the Blueprint a name, enter the environment variables below, and finally click on "Apply":

  • API_HOST: the URL of your Supaglue instance.
  • API_KEY: the API key you generated in your Supaglue Management Portal.
  • PROVIDER_NAME: hubspot
  • AWS_REGION: the AWS region where your S3 bucket lives.
  • AWS_S3_BUCKET: the name of your S3 bucket without any prefixes
  • AWS_ACCESS_KEY_ID: your AWS IAM user's access key ID.
  • AWS_SECRET_ACCESS_KEY: your AWS IAM user's secret access key.

Step 1c: Wait for the deployment to succeed. This will take a few minutes.

If you want to follow along with the deployment process, you can click on the "Create web service api" link, then navigate to "Events", and click on the "Deploying" event to monitor the deployment. When you see the "Example app listening on port 10000", the deploy will have completed.

Step 1d: Take note of your onrender.com URL in the upper-left corner of the Render website. You will need this for Step 2 and (Optional) Step 5 below.

Step 2: Sync data from Supaglue to S3

For this step, we're assuming you have a Supaglue instance that has synced over your customers' HubSpot data to its Postgres cache.

Step 2a:  We will call typescript-syncer's /supaglue_sync_webhook endpoint which will trigger a sync from Supaglue's Postgres cache to your AWS S3 bucket.:

curl -XPOST https://{YOUR_RENDER_URL}/supaglue_sync_webhook -H 'content-type: application/json' -d '{"type":"SYNC_SUCCESS", "customer_id": "{SUPAGLUE_CUSTOMER_ID}"}'

Replace the following variables:

  • {YOUR_RENDER_URL}: the Render URL you were provisioned in Step 1. It looks like: https://{subdomain}.onrender.com.
  • {SUPAGLUE_CUSTOMER_ID}: the Supaglue customer ID you want to initiate a sync for.

Step 2b: In your Render app, navigate to the "Logs" tab to monitor the progress of the sync.

You should see all the common model objects written to S3: contacts, accounts, opportunities, users.

Step 2c: Navigate to your AWS S3 console to view synced records.

typescript-syncer will write each sync into a snapshot folder based on the start time of the sync as an ISO string, followed by a child namespace for each object type, followed by files stored in ndjson format.

Step 3: Load data from AWS S3 to Snowflake

Let's load HubSpot contacts into Snowflake using a snapshot from your S3 bucket that typescript-syncer synced. Let's create a database and variant table in Snowflake, then use the "copy into" command to load json records from S3 into the variant table:

CREATE DATABASE supaglue;

USE DATABASE supaglue;

CREATE TABLE crm_contacts (blob VARIANT);

COPY INTO crm_contacts FROM  's3://{AWS_S3_BUCKET}/{SNAPSHOT_ISO_TIMESTAMP}/contacts' CREDENTIALS=(AWS_KEY_ID='{AWS_ACCESS_KEY_ID}' AWS_SECRET_KEY='{AWS_SECRET_ACCESS_KEY}') FILE_FORMAT=(TYPE='json'));

Replace:

  • {AWS_S3_BUCKET}: your S3 bucket from Step 1.
  • {SNAPSHOT_ISO_TIMESTAMP}: a valid snapshot ISO date string in your S3 bucket.
  • {AWS_ACCESS_KEY_ID}: your IAM user's access key ID from Step 1.
  • {AWS_SECRET_ACCESS_KEY}: your IAM user's secret access key from Step1.

If everything worked, at this point you should have your customers' CRM contacts in your instance of Snowflake. As an exercise left to the reader, you can load in the remaining HubSpot objects and also leverage Snowflake Tasks or Snowpipe to continuously load in data from S3.

(Optional) Step 4: Register a webhook in Supaglue to auto-sync to S3

In Step 2 we manually curled the /supaglue_sync_webhook endpoint which synced data from Supaglue to AWS S3. To have that happen automatically every time Supaglue finishes a sync from HubSpot to its Postgres cache we can register the /supaglue_sync_webhook endpoint in Supaglue.

Use the Supaglue Management Portal to register the webhook endpoint.

Under the "Webhook URL" field, enter the Render URL you copied during Step 1d. It looks something like the following:

https://{YOUR_RENDER_URL}/supaglue_sync_webhook

If everything was successful, at this point you should have automatic snapshots synced from your customers' HubSpot instances to your AWS S3 bucket which is then loaded into Snowflake so you can run queries against it along with all your other Snowflake data.

Have questions? Reach out to us on Slack!

Have feature requests or bugs? Log a GitHub Issue.

Accelerate your integrations roadmap with Supaglue

Supaglue is joining Stripe! Read more.