As an embedded integrations platform, Supaglue interfaces not only with our direct customers but also our customers’ customers. So, when there’s a bug or an outage, not only does it tarnish our credibility, it also reflects poorly on our customers. Therefore, we take reliability and consistency very seriously.
In this blogpost, we’ll dive into one example of keeping our system consistent. We’ll explain the problem, explore a few solutions, and walk you through our thought process in choosing one.
One of Supaglue’s core features is to periodically sync data from providers like Salesforce and Hubspot. To do this, we use Temporal, a powerful and flexible workflow engine, to periodically poll these providers for new data and update our database.
We also allow our customers to configure the frequency at which data is synced. This frequency is persisted in our system in 2 places:
1. Our system has an Integration object, which refers to a provider like Salesforce or HubSpot. It looks something like this:
The periodMs represents the amount of time between successive data sync runs. The Integration object settings are shown in Supaglue’s Management Portal UI.
2. We use Temporal Schedules, which provides a nice abstraction for us to trigger workflow runs on a schedule. Whenever a customer connects a provider, we create a Temporal Schedule with a config specifying that it should spawn a new Workflow to sync data every periodMs milliseconds. A single Integration can have many Schedules associated, one for each customer. Temporal offers an API for creating or updating schedules with a periodMs.
Our customers can update the periodMs parameter by modifying and saving the Integration object in the Supaglue Management Portal UI.
When the UI issues a request to update the frequency (periodMs), we need to do both of the following:
If we do not perform these two actions atomically, we will have consistency issues. For example, if we first updated the periodMs value in the DB, and then updated the periodMs in Temporal. If the DB write succeeded and then the Temporal write failed, we would end up in an inconsistent state. Our customers would see periodMs updated to, say, 30 minutes in the UI, but our system would keep syncing data every, say, 60 minutes. Even worse, this discrepancy would never be resolved unless there were another UI update, making our system not eventually inconsistent.
To tackle this issue, we considered a few different approaches:
Upon an update to the Integration object in the Management Portal, instead of the naive solution of:
we considered using Temporal to orchestrate the two-phase update process instead:
There is a major benefit to using this approach: using Temporal’s Typescript SDK, we can write almost identical code to what we would have written in the naive solution, and Temporal would durably execute the two actions (steps 3 and 4 in the diagram).
However, there is also one drawback: as Temporal is now an additional dependency in the critical path of updating the Integration object, if Temporal were to have an outage, we would be unable to update the Integration object, which is not ideal.
One reason we have a consistency problem is that the “source of truth” for periodMs is in the DB, but the schedules in Temporal must have matching periodMs. What if we moved the source of truth to Temporal? That is, we would split up the Integration object on the UI so that all the other settings would be saved to the DB, while the periodMs would be in a separate panel, perhaps, with its own Save/Update button.
There are some problems with this approach too:
Upon an Integration object update from the UI:
Some pros of this approach:
However, a major con of this approach is the amount of additional work and complexity required; we need to create a new DB table, make an additional write, and write reconciliation code.
There are other systems with similar patterns; in Kubernetes, users can apply configs, and operators will periodically (technically, subscribe to events) reconcile the latest config with the actual system state.
After evaluating the pros and cons of the approaches, we decided to go with Approach 3. It’s hard to say that any of these approaches are clearly the best, but we liked that Approach 3 is rather future-proof to additional config fields on Integration in the future and that it did not introduce Temporal as a dependency in the Integration config update path.