Skip to main content

Sagas

When building distributed systems, it is crucial to ensure that the system remains consistent even in the presence of failures. One way to achieve this is by using the Saga pattern.

Sagas are a way to manage transactions that span multiple services. They allow you to run compensations when your code crashes halfway through. This way, you can ensure that your system remains consistent even in the presence of failures.

Implementing Sagas in Restate

Let’s assume we want to build a travel booking application. The core of our application is a workflow that first tries to book the flight, then rents a car, and finally processes the customer’s payment before confirming the flight and car rental. When the payment fails, we want to undo the flight booking and car rental.

Restate lets us implement this purely in user code:

  • Wrap your business logic in a try-block, and throw a terminal error for cases where you want to compensate and finish.
  • For each step you do in your try-block, add a compensation to a list.
  • In the catch block, in case of a terminal error, you run the compensations in reverse order, and rethrow the error.

Restate guarantees us that all code will execute. So if a terminal error is thrown, all compensations will run:

const bookingWorkflow = restate.workflow({
name: "BookingWorkflow",
handlers: {
run: async (ctx: restate.WorkflowContext, req: BookingRequest) => {
const {flights, car, paymentInfo} = req;
// create a list of undo actions
const compensations = [];
try {
// Reserve the flights and let Restate remember the reservation ID
// This sends an HTTP request via Restate to the Restate flights service
const flightBookingId = await ctx.serviceClient(FlightsService).reserve(flights);
// Use the flightBookingId to register the undo action for the flight reservation,
// or later confirm the reservation.
compensations.push(() => ctx.serviceClient(FlightsService).cancel({flightBookingId}));
// Reserve the car and let Restate remember the reservation ID
const carBookingId = await ctx.serviceClient(CarRentalService).reserve(car);
// Register the undo action for the car rental.
compensations.push(() => ctx.serviceClient(CarRentalService).cancel({carBookingId}));
// Generate an idempotency key for the payment; stable on retries
const paymentId = ctx.rand.uuidv4();
// Register the refund as a compensation, using the idempotency key
compensations.push(() => ctx.run(() => paymentClient.refund({ paymentId })));
// Do the payment using the idempotency key (sometimes throws Terminal Errors)
await ctx.run(() => paymentClient.charge({ paymentInfo, paymentId }));
// Confirm the flight and car reservations
await ctx.serviceClient(FlightsService).confirm({flightBookingId});
await ctx.serviceClient(CarRentalService).confirm({carBookingId});
} catch (e) {
// Terminal errors tell Restate not to retry, but to compensate and fail the workflow
if (e instanceof restate.TerminalError) {
// Undo all the steps up to this point by running the compensations
// Restate guarantees that all compensations are executed
for (const compensation of compensations.reverse()) {
await compensation();
}
}
// Rethrow error to fail this workflow
throw e;
}
}
},
});
restate.endpoint()
.bind(bookingWorkflow)
.bind(carRentalService)
.bind(flightsService)
.listen(9080);

When to use Sagas

Restate runs invocations till completion, with infinite retries and recovery of partial progress. In that sense, you do not require to run compensations in between retries. Restate will start the retry attempt from the point where the invocation failed.

However, there can still be cases in your business logic where you want to stop a handler from executing any further and run compensations for the work done so far.

You will also need sagas to end up in a consistent state when you cancel an invocation (via the CLI or programmatically). For example, if an invocation gets stuck because an external system is not responding, you might want to stop executing the invocation while keeping the overall system state consistent.

Registering compensations

Because this is all implemented in pure user code, there are no restrictions on what you can do in compensations, as long as its idempotent. It is for example possible to reset the state of the service, call other services to undo previously executed calls, or run ctx.run actions to delete previously inserted rows in a database.

Adding compensations

Depending on the characteristics of the API, adding the compensation might look different:

  1. The flights and cars require to first reserve, and then use the ID you get to confirm or cancel. In this case, we add the compensation after creating the reservation (because we need the ID).

  2. The example of the payment API requires you to generate an idempotency key yourself, and executes in one shot. Here, we add the compensation before performing the action, using the same UUID. This way, we ensure that a payment which throws a terminal error did not go through.