How We Use Both Process Orchestration and Process Choreography

Chad Musick 2024-08-08

Message passing in software

At their most atomic, all software processes rely on message passing. Various architectures are constructed or (for those of us who like to anthropomorphize our work) arise naturally around the messengers and the messages to control their flow. In hardware, these messages may be stored in memory registers. In software, these messages are handled in a variety of ways according to the language and its operating model.

In online systems, distributed processes are typical. In an ideal example, you open a browser and come to Bellroy’s website. We serve you enticing images of our products and show you their features. You pick one or more and order. We accept your order, take payment, and instruct one of our warehouses to send your new carry goods to you.

There are lots of obvious message partners here:

User to/from their browser
The user’s browser to/from Bellroy’s website
Bellroy’s website to/from Bellroy’s internal applications
Bellroy’s internal applications to/from the payment service
Bellroy’s internal applications to/from the accounting system
Bellroy’s internal applications to/from the warehouses

Three basic models of message passing predominate in distributed systems:

Synchronous: The sender sends a message and waits for the recipient to fully respond
Asynchronous to queue: The sender sends a message, which enters a queue and is (maybe) processed later
Asynchronous to subscribers: The publisher sends a message, which it then forwards to a set of zero or more subscriber processes.

In practice, these are often combined. A browser might send a synchronous request to a website, which might synchronously submit an event to an event-stream platform like Apache Kafka, which might acknowledge that message and then publish it to a number of subscribers. It gets complicated. Without some kind of coordination, that complication makes systems hard to reason about and hard to build correctly.

Many different methods can be used to harness the complications of this distributed process without losing the power of the message-passing paradigm. Most such methods can be classified as orchestration or choreography (or both). Orchestration is well-suited to centrally directing the actions of systems known to the orchestrator, such as in workflows, and ill-suited to handling reactions from systems unknown to the orchestrator, such as auditing and analytics systems. In contrast, choreography is well-suited to handle reactions from arbitrary sytems but is typically slower than orchestration.

Orchestration: Centralised control

Orchestration acts in a way similar to historical software practices. There may be many different services (or functions), and these may be spread across multiple computers—and for orchestration, distributed systems are typically involved—but a centralised controller calls out to these services as needed to accomplish a specific task, end-to-end. For most purposes, an orchestrated system can be treated as a black box: you put your choices in, you get results or effects out.

There are different theoretical models for orchestration. Among these, the concept of a finite state machine is the most widely implemented. In a finite state machine, only a finite number of “states” are possible, and the machine transitions from one state to another as it runs. Additional information can be carried through the machine, but this information is not typically considered to be part of the state. A variety of open source and proprietary orchestration engines use this basic model for their operation. At Bellroy, we use AWS Step Functions for orchestration of some AWS services and Apache NiFi for much of our data orchestration.

Choreography: Independent actors deciding how to perform

In contrast with orchestration, choreography does not rely on a centralised understanding of the order in which things will happen. Instead, each component process listens for signals and decides how to respond to those signals; this can include ignoring them.

The granularity of choreography can differ greatly, from pixels in a diagram to the behavior of entire companies. At the most granular level of choreography are cellular automata, which behave by consistent rules that consider only their current state and the state of their neighbors. Conway’s Game of Life is a standard example of this type of system. Despite the simplicity of Conway’s rules, complex systems can arise from non-local effects of local interactions. In business, orchestration is the typical way for companies to interact with one another; companies make requests but don’t typically dictate how those requests are accomplished or concern themselves with the internal details of their vendors.

Where is the boundary?

Orchestration and choreography can look similar in their implementation details for a particular system, but there are important distinctions between them. In a purely orchestrated system, the orchestrator is concerned with all elements of execution, so replacing any element requires the system to be updated. In a purely choreographed system where all reacting systems are also purely choreographed, no work is performed because every actor is simply signalling others.

Nearly all choreography will be started by some kind of orchestration, and some choreographed responses will themselves be orchestrated. The boundary between the two types of control emerges primarily from the degree of control exerted over the exact process.

In the software context, publish–subscribe (pub/sub) systems are often used to define the boundaries between orchestration (direct invocation) and choreography (message publication). The publisher of a message does not know how many systems (if any) will respond to the message. The subscriber who receives a message does not know how many other subscribers (if any) are also responding to the message.

In the business context, there is often a defined set of steps to follow for a process. This is an orchestrated process. However, the business may pay for an outside vendor to achieve some outcome without caring how the other business does so. From the client perspective, this is a choreographed step.

How Bellroy uses both orchestration and choreography

Bellroy needs to accomplish several distinct but related tasks to effectively serve our current and future customers. Let’s look at how Bellroy does this at high level.

Serve our website: Primarily choreography

Bellroy’s website relies on a variety of different elements being loaded, such as images, text, scripts, and design specifications. The order in which these are loaded is not important. Bellroy’s servers respond to these requests independently. The browser requests elements as it learns about them, such as reading image URLs from HTML tags, and renders the page when it has enough information to do so.
Process orders: Primarily orchestration

Our order fulfilment process includes the following steps:
- Check that stock is available for the shipping address.
- Calculate the order total and verify that it matches what was displayed.
- Collect payment information in the front-end and relay it to our payment processor; then, record the results (which may take a very long time to come if manual verification is necessary).
- Send notices to our warehouses to ship the products.
- Record the transaction in our accounting system.
Perform internal processes: Primarily choreography
- Gather orders from other electronic and physical marketplaces and ship those that require shipping.
- Keep track of inventory levels.
- Analyze how changes to Bellroy’s website affect sales.
- Predict how much inventory to order from suppliers.

When the order of execution is unimportant, choreography allows for design efficiency at the cost of execution complexity; the same initial signal can result in many different routes for execution, including routes not known at the time of building the thing that generates the signal.

In many ways, orchestration is the micromanagement of processes. These are instructed to execute in a particular order, and the response status and result of each process is tracked with the intent of using the information to inform the next process. This is necessary when the order of execution matters (we don’t want to collect payment if we can’t fulfil an order, for example). Beyond this, it can be very efficient since it does not inherently require asynchronous message passing.

Used together, they can provide fast, flexible, reliable service that is easy to audit and expand without refactoring older systems. This requires conscious effort and an understanding of the tradeoffs involved in the choices of system types and boundaries between them.