ADR Example: Message Queuing on a Node.js Server

Every decision in ReflectRally follows a structured format designed for long-term governance — not just documentation. Below is a fully filled-out record showing every field available when creating or reviewing a decision.

Use BullMQ with Redis for Message Queuing on Node.js API Server

Accepted Risk: Medium Impact: High

Context

Our Node.js API server (Fastify-based) currently processes all work synchronously within request handlers. This includes sending transactional emails, generating PDF reports, syncing data to third-party analytics services, and running nightly aggregation jobs. As traffic has grown from ~500 to ~5,000 requests per minute, we've seen increased p95 latency on endpoints that perform background work inline, occasional timeout errors during peak load, and growing coupling between our API layer and downstream services.

The platform runs on a single-region cloud deployment with PostgreSQL as the primary datastore and Redis already provisioned for session caching. The engineering team consists of 6 backend developers, all experienced with TypeScript and the Node.js ecosystem.

Problem Statement

We need a reliable mechanism to offload non-critical, potentially slow, or failure-prone work from the request-response cycle. Specifically, we need to decouple email sending, report generation, and third-party syncs from API response times while ensuring these tasks are executed reliably — with retries, observability, and the ability to prioritize certain jobs over others.

Without this, our API latency will continue to degrade as traffic grows, and a failure in any downstream service (e.g., an email provider outage) will cascade into user-facing errors.

Decision

We will adopt BullMQ backed by our existing Redis instance as the message queuing solution for our Node.js API server.

All non-critical background work — email dispatch, PDF generation, analytics syncs, and scheduled aggregation jobs — will be enqueued as BullMQ jobs from the API handlers and processed by dedicated worker processes running the same TypeScript codebase.

We will organize work into named queues by domain (e.g., email, reports, sync) and configure per-queue concurrency limits, retry policies with exponential backoff, and dead-letter handling. The BullMQ dashboard (Bull Board) will be integrated into our admin panel for operational visibility.

Alternatives Considered

1. BullMQ + Redis ✓ Chosen

Node.js-native, TypeScript-first queue library built on Redis Streams. Supports priorities, delayed jobs, rate limiting, repeatable jobs, and provides a built-in dashboard. Reuses our existing Redis infrastructure with zero additional provisioning. The team is already familiar with the Redis mental model.

2. RabbitMQ with amqplib Rejected

A dedicated message broker with strong routing, exchange patterns, and protocol-level guarantees. However, it requires provisioning and operating a separate service (or managed instance), introduces AMQP protocol complexity, and adds operational overhead our team isn't staffed to absorb. Better suited for organizations with polyglot services or advanced routing needs we don't currently have.

3. AWS SQS + Lambda Rejected

A fully managed, serverless approach. Eliminates infrastructure management entirely but introduces vendor lock-in, cold start latency, and a fundamentally different deployment model (Lambda functions vs. our current process-based architecture). Would also require splitting our codebase or maintaining separate deployment pipelines. Considered premature for our current scale and team structure.

4. PostgreSQL-based job queue (pg-boss) Rejected

Uses PostgreSQL as the queue backend via SKIP LOCKED, eliminating the need for additional infrastructure. Attractive for simplicity, but introduces write amplification on our primary database, performs poorly under high-throughput scenarios, and lacks the rich job lifecycle features (priorities, rate-limiting, repeatable schedules) that BullMQ provides natively.

Consequences

Positive: API response latency for endpoints performing background work will drop significantly, as all heavy processing moves out of the request cycle.
Positive: Downstream service failures (email provider, analytics API) will no longer cause user-facing errors — failed jobs will retry automatically with exponential backoff.
Positive: Zero new infrastructure to provision. Our existing Redis instance (currently used for sessions) has sufficient capacity, and BullMQ's resource footprint is minimal.
Positive: Operations team gains visibility through Bull Board dashboard integrated into the admin panel.
Negative: Redis becomes a more critical dependency. A Redis outage would now affect both session management and job processing. We will mitigate this with Redis persistence (AOF) and monitoring alerts.
Negative: Introduces eventual consistency for background tasks — users may not see results of background work (e.g., report generation) immediately after triggering it. We'll handle this with status polling or WebSocket notifications.
Negative: Team needs to learn BullMQ-specific patterns (job lifecycle, event handling, worker concurrency tuning). Estimated ramp-up: 1–2 days per developer.

Assumptions

Our existing Redis instance has sufficient memory and throughput headroom to handle job queuing alongside session caching (currently using ~15% of available capacity).

Job volume will remain under 50,000 jobs/day for the next 12 months, within the comfortable operating range for a single Redis-backed BullMQ setup.

We do not need cross-service message routing or fan-out patterns. All producers and consumers are within the same Node.js application boundary.

The team will remain primarily Node.js/TypeScript-focused, and we won't need polyglot consumers in the near term.

What Makes This a Good ADR?

This example illustrates several practices that make architecture decisions useful over time — not just at the moment they're written.

Rich Context

The context section doesn't just describe the problem — it captures traffic numbers, infrastructure state, and team composition that future readers need to evaluate whether the decision still applies.

Honest Alternatives

Each alternative is described fairly with clear reasoning for rejection. This prevents teams from relitigating the same debate months later.

Explicit Assumptions

The assumptions are concrete and falsifiable. When one breaks — say, job volume exceeds 50K/day — the decision surfaces for review automatically in ReflectRally.

Risk & Impact

Tagging the decision as Medium Risk / High Impact helps teams prioritize which decisions to review first during architectural health checks.

Signals

In ReflectRally, you can link real-world triggers — called Signals — to decisions and assumptions. For example, a "Redis major version release" signal linked to this decision would automatically flag it for review when triggered, via API webhook or manually.

Periodic Reviews

Accepted decisions can have a review schedule (e.g., every 6 months). When the review date passes, the decision surfaces in your team's attention queue — so nothing silently goes stale.

From Example to Practice

Writing a well-structured ADR is just the beginning. In ReflectRally, this decision becomes a living record with active governance built in:

Signals — Link real-world triggers to decisions and assumptions. A signal like "Redis major version release" or "Monthly job volume exceeds 50K" can be triggered via the UI, or automatically via API webhook from your CI/CD pipeline or monitoring tools. When a signal fires, every linked decision and assumption is flagged instantly — no one needs to remember which decisions to check.
Periodic Reviews — Set a review schedule on accepted decisions (e.g., "review every 6 months"). When the date arrives, the decision appears in your team's attention queue with full context. Reviews can confirm the decision is still valid, or kick off a revision if the world has changed.
Assumption Tracking — Each assumption listed above is individually tracked. If a signal breaks an assumption, or a team member manually flags one as questionable, the parent decision is surfaced for attention — without waiting for the next scheduled review.
Dependency Graph — Link decisions that depend on each other. If an upstream decision is superseded, all downstream decisions are automatically flagged as needing attention. When any decision enters an attention state, its downstream dependents are marked as impacted for visibility.
Full Audit Trail — Every status change, review, signal trigger, and assumption update is recorded with who did it and when.

Want to see how your own team's decisions would look? Start with a free workspace:

→ Create Your First Decision

More Resources

What are Architecture Decision Records? — An introduction to ADRs and why documentation alone isn't enough.
Free ADR Templates — Download Nygard, MADR, and Y-Statement templates to get started.

ADR Example: How Decisions Look in ReflectRally