Workers And Queues
Operate table-backed queues, workers, DLQ replay, and scheduler behavior.
Workers And Queues
The backend uses a PostgreSQL table-backed queue. Jobs are claimed with locking semantics and dead-lettered after retry exhaustion. The operator source of truth is the database plus structured worker logs and OpenTelemetry metrics.
Queues
| Queue | Producer | Consumer | Purpose |
|---|---|---|---|
proofs.validated | Proof submit routes | worker-proof | Validate proof, decrypt payload, create certificate, consume license, sign PDF/receipt. |
certificates.to_anchor | Certificate service | worker anchor | Submit or update blockchain anchor metadata and regenerate certificate PDF. |
notifications.to_send | User, billing, webhook flows | worker notifications | Send SMTP/webhook notifications, retry failures, write delivery metadata. |
billing.exports | Billing export routes | worker reports | Generate requested export artifacts. |
reports.monthly | Report scheduler | worker reports | Generate monthly billing report artifacts. |
proofs.retry | Retry routes/scheduler | worker retry | Re-enqueue due failed or awaiting-license proofs. |
The local PostgreSQL/PGMQ setup also seeds underscore queue names for local
compatibility: proofs_validated, certificates_to_anchor,
notifications_to_send, billing_exports, reports_monthly, and
proofs_retry.
Worker Commands
| Worker | Command | Useful variables |
|---|---|---|
| Proof | /app/worker-proof | WORKER_ID, WORKER_POLL_INTERVAL, WORKER_CLAIM_LIMIT, WORKER_ONCE. |
| Anchor | /app/worker anchor | Same worker variables plus blockchain/signer/storage settings. |
| Notifications | /app/worker notifications | Same worker variables plus notification/URL policy settings. |
| Reports | /app/worker reports | Same worker variables plus storage, billing, PAdES settings. |
| Retry | /app/worker retry | Same worker variables; enqueues due proof retries before claims. |
| Maintenance | /app/worker maintenance | Uses retention interval when WORKER_POLL_INTERVAL is unset. |
Set WORKER_ONCE=true for a controlled single pass:
WORKER_ONCE=true WORKER_CLAIM_LIMIT=10 /app/worker retry
Queue Inspection
Use SQL for baseline triage:
SELECT queue, status, count(*)
FROM jobs
GROUP BY queue, status
ORDER BY queue, status;
DLQ summary:
SELECT queue, reason, count(*), max(created_at) AS last_seen
FROM jobs_dlq
GROUP BY queue, reason
ORDER BY last_seen DESC;
Newest DLQ entries:
SELECT id, original_job_id, queue, payload, reason, created_at
FROM jobs_dlq
ORDER BY created_at DESC
LIMIT 20;
Replay Rules
- Confirm the downstream dependency is healthy before replaying anything.
- Inspect representative DLQ rows and the referenced business record.
- Replay by inserting a new
jobsrow with the same queue and payload. - Keep the original DLQ row for audit.
- Do not update a
deadjob back topending.
If a payload is malformed or references a deleted resource, leave it in DLQ and record the incident decision in the operations log.
Proof Retry Checks
SELECT id, tenant_id, organization_id, status, attempt_count, next_retry_at,
rejection_reason, received_at
FROM proofs
WHERE status IN ('FAILED', 'AWAITING_LICENSE', 'REJECTED')
ORDER BY received_at DESC
LIMIT 50;
| Status | Replay guidance |
|---|---|
FAILED | Replay only after fixing signer, storage, PDF, validation config, or other dependency failures. |
AWAITING_LICENSE | Import/allocate licenses or fix quota scope first, then retry. |
REJECTED | Terminal validation failure; replay only if policy or code was wrong. |
Worker Scaling
Proof, notifications, reports, retry, and anchor workers are stateless and can scale horizontally where the workload allows it. Use queue depth, in-flight jobs, dependency capacity, and idempotency guarantees to decide scale. The anchor and monthly report schedulers should be scaled conservatively until production chain and scheduler ownership are finalized.