Synchronizing config changes with system behavior

As an embedded integrations platform, Supaglue interfaces not only with our direct customers but also our customers’ customers. So, when there’s a bug or an outage, not only does it tarnish our credibility, it also reflects poorly on our customers. Therefore, we take reliability and consistency very seriously.

In this blogpost, we’ll dive into one example of keeping our system consistent. We’ll explain the problem, explore a few solutions, and walk you through our thought process in choosing one.

The Problem

One of Supaglue’s core features is to periodically sync data from providers like Salesforce and Hubspot. To do this, we use Temporal, a powerful and flexible workflow engine, to periodically poll these providers for new data and update our database.

We also allow our customers to configure the frequency at which data is synced. This frequency is persisted in our system in 2 places:

1. Our system has an Integration object, which refers to a provider like Salesforce or HubSpot. It looks something like this:

type Integration = { 
    id: string; type: 'salesforce' | 'hubspot' | ...; 
    periodMs: number; // number of milliseconds between each data sync    
    // more config settings... 
 };

The periodMs represents the amount of time between successive data sync runs. The Integration object settings are shown in Supaglue’s Management Portal UI.

2. We use Temporal Schedules, which provides a nice abstraction for us to trigger workflow runs on a schedule. Whenever a customer connects a provider, we create a Temporal Schedule with a config specifying that it should spawn a new Workflow to sync data every periodMs milliseconds. A single Integration can have many Schedules associated, one for each customer. Temporal offers an API for creating or updating schedules with a periodMs.

Our customers can update the periodMs parameter by modifying and saving the Integration object in the Supaglue Management Portal UI.

When the UI issues a request to update the frequency (periodMs), we need to do both of the following:

  • Write periodMs into the Integration table in the DB.
  • Send a request to Temporal to update periodMs for all associated Schedules.

If we do not perform these two actions atomically, we will have consistency issues. For example, if we first updated the periodMs value in the DB, and then updated the periodMs in Temporal. If the DB write succeeded and then the Temporal write failed, we would end up in an inconsistent state. Our customers would see periodMs updated to, say, 30 minutes in the UI, but our system would keep syncing data every, say, 60 minutes. Even worse, this discrepancy would never be resolved unless there were another UI update, making our system not eventually inconsistent.

Exploring Solutions

To tackle this issue, we considered a few different approaches:

Approach 1: Spawning a Temporal Workflow for Each Update

Upon an update to the Integration object in the Management Portal, instead of the naive solution of:

  1. UI issues request to backend
  2. backend saves to DB
  3. backend saves to Temporal

we considered using Temporal to orchestrate the two-phase update process instead:

  1. UI issues request to backend,
  2. backend spins up Temporal workflow for that request
  3. Temporal workflow saves to DB and then updates all associated syncs

There is a major benefit to using this approach: using Temporal’s Typescript SDK, we can write almost identical code to what we would have written in the naive solution, and Temporal would durably execute the two actions (steps 3 and 4 in the diagram).

However, there is also one drawback: as Temporal is now an additional dependency in the critical path of updating the Integration object, if Temporal were to have an outage, we would be unable to update the Integration object, which is not ideal.

Approach 2: Moving the Source of Truth

One reason we have a consistency problem is that the “source of truth” for periodMs is in the DB, but the schedules in Temporal must have matching periodMs. What if we moved the source of truth to Temporal? That is, we would split up the Integration object on the UI so that all the other settings would be saved to the DB, while the periodMs would be in a separate panel, perhaps, with its own Save/Update button.

There are some problems with this approach too:

  • Our periodMs is on the Integration object in the DB, while periodMs is on each Temporal Schedule, of which there can be many belonging to a single Integration. We’d still need to have some way to atomically update the periodMs on multiple Temporal Schedules at once, which is not supported by the Temporal API.
  • If we were to add more config options to the Integration object, we would need to do the same exercise on the UI for each option with a consistency problem. Perhaps we could make each field auto-saving and not have a single Save/Update button, but we didn’t want to do that.

Approach 3: Periodically Synchronize Settings with Temporal

Upon an Integration object update from the UI:

  1. Transactionally write the updated config to the integrations table and a change event to another table called integration_changes.
  2. Use Temporal to periodically run a workflow (using the Schedules API mentioned earlier) that polls the integration_changes table, updates the corresponding Temporal Schedules, and deletes the processed change events.

Some pros of this approach:

  • The config update doesn’t depend on Temporal, making the system more reliable.
  • We can even extend the background polling job to do manual or automatic reconciliations of the DB state with Temporal; if someone were to accidentally modify a Temporal schedule outside of the normal update process, we could trigger the workflow to apply the “source of truth” config from the DB to Temporal.

However, a major con of this approach is the amount of additional work and complexity required; we need to create a new DB table, make an additional write, and write reconciliation code.

There are other systems with similar patterns; in Kubernetes, users can apply configs, and operators will periodically (technically, subscribe to events) reconcile the latest config with the actual system state.

Conclusion

After evaluating the pros and cons of the approaches, we decided to go with Approach 3. It’s hard to say that any of these approaches are clearly the best, but we liked that Approach 3 is rather future-proof to additional config fields on Integration in the future and that it did not introduce Temporal as a dependency in the Integration config update path.

Accelerate your integrations roadmap with Supaglue

Supaglue is joining Stripe! Read more.