The Problem Modern Systems Struggle With
Distributed systems fail constantly.
Services crash. Containers restart. Networks drop packets. APIs time out. Humans delay approvals. Deployments interrupt execution mid-process.
Yet we still need to run business processes that:
- span minutes, hours, or even days
- have multiple steps across multiple services
- depend on external systems that may be temporarily unavailable
- must not repeat work or lose progress when something goes wrong
Most teams try to solve this using a combination of tools:
- Kafka or RabbitMQ for async communication
- cron jobs for scheduled tasks
- retry loops and backoff strategies
- database state machines to track progress
- custom recovery scripts for when things go sideways
It works – until it doesn’t. The state machine grows. The recovery scripts pile up. The on-call engineer is paged at 3 AM because a payment was charged twice after a worker restarted at exactly the wrong moment.
This is the problem Temporal exists to solve.
Is Temporal a New Concept?
No. The ideas behind Temporal have existed for decades.
What is new is how cleanly, safely, and developer-ergonomically Temporal implements them – in pure code, without XML, DSLs, or brittle infrastructure glue.
Concepts That Existed Before Temporal
1. Workflow Engines
Enterprise systems have had workflow engines since the 1990s. Tools like IBM MQ Workflow, jBPM, Activiti, Camunda, and BPEL/BPMN engines could model multi-step processes. But they relied heavily on XML or graphical DSLs, heavyweight infrastructure, and offered limited developer ergonomics. They were hard to test, painful to debug, and difficult to evolve as business requirements changed.
Temporal takes the same concept – durable, stateful, multi-step process execution – and brings it into the world of plain code.
2. The Saga Pattern
The Saga pattern, popularised in the context of microservices, breaks a long-running distributed transaction into a sequence of steps. Each step has a corresponding compensation action that can be triggered if a later step fails – essentially a rollback across service boundaries.
Many teams implement sagas manually using queues, state tracking tables, and custom orchestrator services. The result is often hundreds of lines of boilerplate just to handle failure cases correctly.
Temporal is effectively a durable Saga engine – the pattern is implemented for you, in real code, with persistence and recovery handled automatically.
3. Amazon Simple Workflow Service (SWF)
Temporal is a direct descendant of Amazon SWF, and the lineage is not incidental – the founders of Temporal actually built SWF.
The Origin Story: From SWF to Cadence to Temporal
Understanding where Temporal came from is essential to appreciating why it works the way it does. It was not designed in a vacuum. It carries hard-won lessons from over 20 years of building mission-critical distributed systems.
Maxim Fateev and Samar Abbas
The two co-founders of Temporal, Maxim Fateev and Samar Abbas, met at Amazon around 2009.
Maxim (CTO and co-founder) joined Amazon in the early 2000s when the company was transitioning from a single monolithic binary to a service-oriented architecture. He served as tech lead for the Amazon messaging infrastructure that eventually became Amazon SQS – the first platform service launched by AWS. He then led the architecture and development of Amazon Simple Workflow Service (SWF), one of the earliest cloud-native workflow engines.
Samar (CEO and co-founder) also worked on SWF at Amazon. After Amazon, Samar joined Microsoft, where he led development on Azure Service Bus. As a side project, he created the Azure Durable Task Framework – an orchestration library for managing long-running, stateful workflows – which became so popular it was adopted by the Azure Functions team and is still available today as Azure Durable Functions.
The two reunited at Uber in 2015, where they co-created Cadence – a fully open-source workflow engine built on the lessons of SWF, designed to handle Uber’s massive internal orchestration needs. Within three years, over 100 use cases at Uber were running on Cadence.
In 2019, Maxim and Samar left Uber and co-founded Temporal Technologies, creating Temporal as the direct successor to Cadence – with an entirely rewritten server, better multi-tenancy support, improved APIs, and official multi-language SDKs (Go, Java, TypeScript, Python, and more).
Key insight: Temporal isn’t a startup idea – it’s the product of two engineers who spent 20 years repeatedly solving the same hard problem across Amazon, Microsoft, and Uber, each time refining their approach.
timeline title The Road to Temporal 2004 : Maxim leads Amazon messaging infra → becomes SQS 2009 : Maxim & Samar work together on Amazon SWF 2014 : Samar joins Microsoft, builds Azure Durable Task Framework 2015 : Maxim & Samar reunite at Uber, co-create Cadence (open source) 2019 : Maxim & Samar found Temporal Technologies : Temporal launched as successor to Cadence
Temporal vs Kafka / RabbitMQ
Temporal is frequently confused with message queues and event streaming platforms. This confusion is understandable – all three deal with asynchronous processing – but the comparison breaks down quickly.
What Kafka and RabbitMQ Actually Do
Apache Kafka is a distributed event streaming platform, originally developed at LinkedIn. It is designed for high-throughput, persistent, replayable event streams. You publish events to topics; consumers read them at their own pace. Kafka is excellent for building real-time data pipelines, analytics platforms, and event-driven architectures where multiple services react to a single stream of events.
RabbitMQ is a general-purpose message broker. It supports multiple protocols (AMQP, MQTT, STOMP), flexible routing via exchanges, and is well suited for task distribution, microservice communication, and low-latency messaging scenarios where you need a task done once and confirmed.
Both are fundamentally tools for moving data from one place to another. They don’t know what a “business process” is. They don’t know what step you’re on, whether that step succeeded three days ago, or what should happen next after a human approves a request.
What Temporal Actually Does
Temporal is concerned with orchestrating the execution of a process over time – not moving messages. It knows:
- Which steps have been completed
- What the result of each step was
- What should happen next
- How many times a step has been retried
- That a 72-hour timer started on Tuesday should fire on Friday, even if every server in the cluster was replaced in the meantime
Comparison Table
| Concern | Kafka | RabbitMQ | Temporal |
|---|---|---|---|
| Primary purpose | Event streaming | Message delivery | Workflow orchestration |
| Business process state | External (you manage it) | External (you manage it) | Built-in and durable |
| Long-running processes | Poor fit | Poor fit | Native and first-class |
| Timers and delays | Manual (external service) | Manual | Built-in sleep / timers |
| Retry logic | Basic (consumer responsibility) | Basic (DLQ) | First-class with backoff policies |
| Crash recovery | Manual re-consume | Manual re-queue | Automatic replay from history |
| Exactly-once semantics | At-producer level only | Per-message acknowledgement | Guaranteed at workflow level |
| Human-in-the-loop steps | Not supported | Not supported | Supported via Signals |
| Visibility into state | Consumer offset tracking | Management UI | Full event history per workflow |
| Best for | High-throughput event streams | Task queuing / low-latency messaging | Multi-step business processes |
Do They Compete?
Not really. They solve different problems, and you’ll often find Temporal and Kafka coexisting in the same architecture. For example, a Kafka event might trigger a Temporal workflow, while the workflow orchestrates the long-running business logic that follows.
A Real-World Analogy
The Ultra-Reliable Personal Assistant
Imagine you hire a personal assistant and give them a complex task:
“Organise my overseas business trip.”
Steps:
- Book flights
- Reserve a hotel
- Apply for a visa
- Schedule ground transport
- Send confirmation to all parties
This takes days or weeks, depends on external parties, and requires holding state across many interactions.
Without Temporal (the traditional approach): The assistant keeps notes in their head and on scraps of paper. If they get sick, the context is lost. If interrupted, steps may be repeated. You, the manager, constantly have to check in to know where things stand. If step 3 fails, you don’t always know what was already done or what to roll back.
With Temporal: Every completed step is permanently recorded in a notebook that anyone can pick up and continue. If the assistant is replaced mid-process, their successor opens the notebook, sees exactly what was done, and continues from the right step. Waiting three weeks for a visa response costs zero active resources – no polling loop, no cron job, no in-memory state. Nothing is forgotten. Nothing is duplicated.
That notebook is Temporal’s event history – an immutable, durable log of every decision and result in the workflow.
What Temporal Actually Is
Temporal is a durable execution engine that lets you write long-running business logic as ordinary code, with automatic recovery from crashes, restarts, and infrastructure failures.
The concept Temporal calls this is Durable Execution: the full running state of your workflow is persisted at every step. If the server crashes, the network fails, or a deployment happens mid-workflow, Temporal replays the recorded history on a new worker and resumes exactly where it stopped – without re-executing completed work.
Key properties:
- Workflows can run for seconds, days, or years
- Workflow state is persisted automatically by the Temporal server
- Code is replayed deterministically – completed steps are not re-run
- Sleep calls and timers consume zero active resources
- Failures are expected and handled by default, not as edge cases
- Workflows are testable with standard unit tests – no XML, no DSLs
Temporal Architecture
High-Level Overview
graph TD
CS[Client Services / Applications] -->|Start workflow, send signal, query state| TS[Temporal Server]
TS -->|Persists event history| DB[(Persistence Layer\nCassandra / PostgreSQL / MySQL)]
TS -->|Dispatches tasks via task queues| WP[Worker Processes]
WP -->|Poll for workflow & activity tasks| TS
WP -->|Execute activities - API calls, DB writes, emails| EXT[External Services]
DB -.->|Replay history on crash recovery| WP
style TS fill:#6B4FBB,color:#fff
style DB fill:#2D6A4F,color:#fff
style WP fill:#1D3557,color:#fff
style CS fill:#457B9D,color:#fff
style EXT fill:#E63946,color:#fffCore Components Explained
Workflows are deterministic functions that describe your business process – the what should happen logic. Because they must be deterministic (same inputs always produce same outputs), you don’t make direct API calls or use random numbers inside a workflow. Side effects live in Activities.
Activities are where side effects happen: calling a payment API, writing to a database, sending an email. Activities can fail and be retried independently of the workflow. A failed activity doesn’t restart the whole workflow – just that step.
The Temporal Server is the orchestration brain. It stores every event in the workflow’s history, schedules tasks to workers via durable task queues, enforces retry policies, and fires timers. It is stateless in the sense that any node can be replaced – the database holds all truth.
Workers are your application processes. They host your workflow and activity code, poll the Temporal server for tasks, execute them, and report results back. Workers are completely stateless and disposable – if one crashes, another picks up the task.
Signals allow external events (including human actions) to be injected into a running workflow without polling. A workflow can await a signal for days if needed.
Queries allow external callers to read the current state of a running workflow synchronously.
How Data Flows
sequenceDiagram
participant Client
participant Temporal Server
participant Worker
participant External Service
Client->>Temporal Server: StartWorkflow(orderId)
Temporal Server-->>Temporal Server: Record WorkflowStarted event
Temporal Server->>Worker: Schedule ActivityTask (chargeCustomer)
Worker->>External Service: POST /payments
External Service-->>Worker: 200 OK
Worker->>Temporal Server: Report ActivityCompleted
Temporal Server-->>Temporal Server: Record ActivityCompleted event
Note over Temporal Server: Worker crashes here
Temporal Server->>Worker: Replay history, resume at next step
Temporal Server->>Worker: Schedule ActivityTask (shipOrder)
Worker->>External Service: POST /shipping
External Service-->>Worker: 200 OK
Worker->>Temporal Server: Report WorkflowCompletedThe critical guarantee: when a worker crashes between steps, Temporal replays the recorded history on a new worker. Because completed activities are recorded and not re-executed during replay, the customer is never charged twice.
A TypeScript Example
Let’s make this concrete with a real order processing workflow.
Workflow – the business logic
import { proxyActivities, sleep } from '@temporalio/workflow';
import type * as activities from '../activities/orderActivities';
const { chargeCustomer, sendConfirmationEmail, shipOrder } =
proxyActivities<typeof activities>({
startToCloseTimeout: '1 minute',
retry: {
maximumAttempts: 5,
initialInterval: '1s',
backoffCoefficient: 2,
},
});
export async function orderWorkflow(orderId: string): Promise<void> {
// Step 1: Charge the customer
await chargeCustomer(orderId);
// Step 2: Send a confirmation email immediately
await sendConfirmationEmail(orderId);
// Step 3: Wait for fulfilment window (e.g. next business day)
// This sleep costs zero resources — no polling loop required
await sleep('1 day');
// Step 4: Ship the order
await shipOrder(orderId);
}
Activities – where side effects happen
export async function chargeCustomer(orderId: string): Promise<void> {
// Simulate a flaky payment provider
if (Math.random() < 0.3) {
throw new Error('Payment provider temporarily unavailable');
}
console.log(`Charged customer for order ${orderId}`);
}
export async function sendConfirmationEmail(orderId: string): Promise<void> {
// Call your email service here
console.log(`Confirmation email sent for order ${orderId}`);
}
export async function shipOrder(orderId: string): Promise<void> {
console.log(`Shipped order ${orderId}`);
}
What Temporal guarantees here
| Scenario | What happens |
|---|---|
| Payment provider returns 500 | Activity is retried with exponential backoff, up to 5 times |
| Worker crashes after charge, before email | Email activity runs on a new worker; charge is NOT retried |
| Server restarts during the 1-day sleep | Sleep resumes correctly on restart; no polling loop |
| Worker crashes during shipping | Shipping is retried; no double-charge possible |
| You deploy a new version mid-workflow | In-flight workflows continue on the old code path until complete |
None of this requires any custom retry logic, state tables, or recovery scripts from you.
Temporal’s Determinism Requirement
One concept that trips up engineers new to Temporal: workflows must be deterministic.
This means inside a workflow function, you must not:
- Call
Math.random()orDate.now()directly (use Temporal’s equivalents) - Make network calls directly (use Activities instead)
- Use non-deterministic data structures
Why? Because Temporal replays your workflow history to recover state. If your workflow code produces different results on replay than it did originally, Temporal cannot correctly reconstruct the execution state. This is a constraint worth accepting – it’s what makes the durability guarantees possible.
Temporal provides safe equivalents: workflow.now() for time, workflow.random() for random numbers, and Activities for all I/O.
Signals and Queries – Human-in-the-Loop Workflows
One of Temporal’s most powerful and underappreciated features is its support for human-in-the-loop steps.
A workflow can pause and wait for an external signal indefinitely – whether that’s a manager approving an expense report, a customer confirming an address, or a background check clearing. There’s no polling loop, no cron job, no webhook table to manage.
import { defineSignal, setHandler, condition } from '@temporalio/workflow';
const approvalSignal = defineSignal<[boolean]>('approval');
export async function expenseApprovalWorkflow(expenseId: string) {
let approved = false;
setHandler(approvalSignal, (isApproved: boolean) => {
approved = isApproved;
});
// Wait up to 7 days for a manager to approve
const timedOut = !await condition(() => approved !== undefined, '7 days');
if (timedOut || !approved) {
await rejectExpense(expenseId);
} else {
await processPayment(expenseId);
}
}
The manager’s approval UI calls temporalClient.signal(workflowId, 'approval', true). The workflow wakes up and continues. No polling. No database flags. No scheduled jobs.
When Should You Use Temporal?
Temporal is the right tool when you need:
- Long-running processes – steps that span hours, days, or weeks
- Multi-step business workflows – order processing, onboarding flows, financial transactions
- Complex retry logic – with backoff, per-step timeout policies, and compensation
- Human-in-the-loop – approval flows, verification steps, manual interventions
- Exactly-once guarantees – billing, provisioning, anything where duplication is catastrophic
- Distributed systems that must not lose progress – workflows that survive server restarts and deployments
- Testability – workflows written in code can be unit tested like any other function
When Temporal Is NOT the Right Tool
- Simple request-response APIs – if it completes in one HTTP call, you don’t need Temporal
- High-throughput event streaming – if you’re processing millions of events per second, Kafka is a better fit
- Pure pub-sub fan-out – if you just need to broadcast an event to many consumers, use a message bus
- Simple background jobs – a job queue like BullMQ or Sidekiq may be sufficient for short-lived, single-step tasks
Temporal adds operational complexity – a server to run, workers to deploy, and a new programming model to learn. That investment pays off when correctness and durability matter. It doesn’t make sense for every use case.
Temporal in the Wild
Temporal has been adopted by a broad range of engineering teams. HashiCorp, Datadog, Stripe, Netflix, DoorDash, Box, Checkr, and Snap are among the well-known adopters. Use cases span payment processing, infrastructure provisioning, AI agent orchestration, CI/CD pipelines, compliance workflows, and customer onboarding flows.
Netflix has noted that engineers spend significantly less time writing logic to maintain application consistency or guard against failures because Temporal handles it for them. This shift – from defensive infrastructure glue code to business logic – is exactly what Temporal is designed to enable.
The Ecosystem
Temporal supports official SDKs for Go, Java, TypeScript/JavaScript, Python, and .NET. This means you write workflows and activities in your language of choice, with full IDE support, type safety, and standard testing tools.
Temporal is MIT-licensed and fully open source. You can self-host the Temporal server (backed by Cassandra, PostgreSQL, or MySQL) or use Temporal Cloud, the managed SaaS offering, which handles infrastructure, scaling, and multi-region replication for you.
graph TD
subgraph SDKs
Go
Java
TS[TypeScript]
Python
NET[.NET]
end
subgraph Deployment Options
SH[Self-Hosted\nTemporal Server]
TC[Temporal Cloud\nManaged SaaS]
end
subgraph Persistence Backends
Cassandra
PostgreSQL
MySQL
end
SDKs --> SH
SDKs --> TC
SH --> Cassandra
SH --> PostgreSQL
SH --> MySQLFinal Mental Model
Kafka moves events.
RabbitMQ distributes tasks.
Temporal remembers intent and progress over time.
The concept of durable workflow orchestration is not new – it goes back to enterprise workflow engines of the 1990s, the Saga pattern, and Amazon SWF. What Temporal brings is a modern, developer-friendly implementation: write pure business logic in code, test it like code, deploy it like code, and let the platform handle durability, retries, timers, and recovery for you.
For any system where distributed correctness and process durability matter, Temporal is one of the most important infrastructure tools available to engineers today.