Cloud data warehouses are popular analytical databases capable of storing and analyzing a vast amount of data. Many B2B SaaS companies today centralize their customers' data in data warehouses for analytics and machine learning use cases – which in turn power customer-facing features like data enrichment, lead scoring, customer segmentation, forecasting, and more.
Because the customer data often lives in the customer's CRM, it's important to sync CRM data to the cloud data warehouse continuously.
In this tutorial, we'll use Supaglue to sync your customers' HubSpot data into an AWS S3 bucket (using a service deployed on Render) and then load the data into Snowflake.
HubSpot is a cloud-based customer relationship management (CRM) platform that helps organizations manage marketing, sales, and operations. At a 15,000-foot level, HubSpot helps your business to attract, convert, and close your customers.
Snowflake is a cloud data warehouse (CDW) where you can store and analyze all your data records in one place. It can automatically scale up/down its compute resources to load, integrate, and analyze data.
Render is a fully-managed cloud platform that makes deploying your application as easy as pushing your code to source control.
1. A HubSpot App.
2. A HubSpot account to sync its customer data from.
3. A Supaglue cloud account, self-hosted, or local instance using docker-compose.
In this tutorial we will deploy a service (typescript-syncer) that listens for a webhook event from your Supaglue instance, when Supaglue finishes syncing data from your customer's CRM. Then it will upload the data to your AWS S3 bucket. Finally, Snowflake uses the S3 bucket as an external stage to load the data into the warehouse.
typescript-syncer is a simple open-source Typescript application that uses Node.js Express as a web server and axios as a http client to paginate and sync all objects from Supaglue to S3. We'll deploy it using Render.
Step 1a: Use the link below to use a Render Blueprint to 1-click deploy it to the cloud:
Step 1b: Give the Blueprint a name, enter the environment variables below, and finally click on "Apply":
Step 1c: Wait for the deployment to succeed. This will take a few minutes.
If you want to follow along with the deployment process, you can click on the "Create web service api" link, then navigate to "Events", and click on the "Deploying" event to monitor the deployment. When you see the "Example app listening on port 10000", the deploy will have completed.
Step 1d: Take note of your onrender.com URL in the upper-left corner of the Render website. You will need this for Step 2 and (Optional) Step 5 below.
For this step, we're assuming you have a Supaglue instance that has synced over your customers' HubSpot data to its Postgres cache.
Step 2a: We will call typescript-syncer's /supaglue_sync_webhook endpoint which will trigger a sync from Supaglue's Postgres cache to your AWS S3 bucket.:
curl -XPOST https://{YOUR_RENDER_URL}/supaglue_sync_webhook -H 'content-type: application/json' -d '{"type":"SYNC_SUCCESS", "customer_id": "{SUPAGLUE_CUSTOMER_ID}"}'
Replace the following variables:
Step 2b: In your Render app, navigate to the "Logs" tab to monitor the progress of the sync.
You should see all the common model objects written to S3: contacts, accounts, opportunities, users.
Step 2c: Navigate to your AWS S3 console to view synced records.
typescript-syncer will write each sync into a snapshot folder based on the start time of the sync as an ISO string, followed by a child namespace for each object type, followed by files stored in ndjson format.
Let's load HubSpot contacts into Snowflake using a snapshot from your S3 bucket that typescript-syncer synced. Let's create a database and variant table in Snowflake, then use the "copy into" command to load json records from S3 into the variant table:
CREATE DATABASE supaglue;
USE DATABASE supaglue;
CREATE TABLE crm_contacts (blob VARIANT);
COPY INTO crm_contacts FROM 's3://{AWS_S3_BUCKET}/{SNAPSHOT_ISO_TIMESTAMP}/contacts' CREDENTIALS=(AWS_KEY_ID='{AWS_ACCESS_KEY_ID}' AWS_SECRET_KEY='{AWS_SECRET_ACCESS_KEY}') FILE_FORMAT=(TYPE='json'));
Replace:
If everything worked, at this point you should have your customers' CRM contacts in your instance of Snowflake. As an exercise left to the reader, you can load in the remaining HubSpot objects and also leverage Snowflake Tasks or Snowpipe to continuously load in data from S3.
In Step 2 we manually curled the /supaglue_sync_webhook endpoint which synced data from Supaglue to AWS S3. To have that happen automatically every time Supaglue finishes a sync from HubSpot to its Postgres cache we can register the /supaglue_sync_webhook endpoint in Supaglue.
Use the Supaglue Management Portal to register the webhook endpoint.
Under the "Webhook URL" field, enter the Render URL you copied during Step 1d. It looks something like the following:
https://{YOUR_RENDER_URL}/supaglue_sync_webhook
If everything was successful, at this point you should have automatic snapshots synced from your customers' HubSpot instances to your AWS S3 bucket which is then loaded into Snowflake so you can run queries against it along with all your other Snowflake data.
Have questions? Reach out to us on Slack!
Have feature requests or bugs? Log a GitHub Issue.