On this page
crisis_alert
Incident Response
Triage Keycloak, signer, storage, chain, proof, billing, and public verification incidents.
Incident Response
Use this page for first response. Record the incident, capture request IDs, trace IDs, affected tenant/organization IDs where permitted, and the exact time window before replaying or deleting anything.
Keycloak Outage
Impact:
- New login/token requests fail.
- Existing API calls may continue briefly while tokens and JWKS cache remain valid.
- User, invitation, IdP, and provisioning operations should pause.
Triage:
- Check Keycloak and Keycloak PostgreSQL health.
- Query realm metadata from the API network.
- Check API logs for JWT/JWKS errors and elevated
401. - Confirm public issuer URLs still match configured JWT
iss.
Recovery:
- Restore Keycloak database first, then Keycloak.
- Restart API/workers only if issuer or realm import changed.
- Run password-token smoke tests for
masterandexawipe. - Retry failed onboarding/admin work after confirming idempotency behavior.
Signer Outage
Impact:
- Proof processing can fail during decrypt, HMAC, PAdES signing, receipt signing, or future transaction signing.
- Proof jobs may retry and eventually move to DLQ.
Triage:
- Check
WipeSignerFailuresby operation. - Inspect proof worker logs for affected proof IDs.
- Confirm
SIGNER_MODE, endpoint, timeout, bearer/mTLS material, key IDs, and TSA settings. - Inspect
proofs.validated,proofs.retry, and DLQ depth.
Recovery:
- Restore signer availability and key material.
- Run one proof upload smoke test.
- Replay only jobs whose referenced proof is still eligible.
Storage Outage
Impact:
- Proof upload, PDF generation, canonical JSON download, billing exports, and monthly reports can fail.
Triage:
- Check MinIO/S3 endpoint health, credentials, bucket existence, and network path.
- Inspect API/proof/report worker logs for object keys and error classes.
- Confirm bucket names match the environment.
Recovery:
- Restore storage and verify read/write permissions.
- Re-run failed proof/report/export jobs through queue replay rules.
- Sample certificate PDF and canonical JSON downloads.
Chain Or Anchor Outage
Impact:
- Anchor jobs can retry or DLQ.
- Certificates may remain
CERTIFIED_NO_ANCHORwhen anchoring is disabled or unavailable by policy. - Public verification can validate certificate integrity but may report chain unavailability or missing anchor status.
Triage:
- Check anchor worker logs and
WipeAnchorFailures. - Confirm
BLOCKCHAIN_ENABLED, default chain, explorer URL, lookup metadata, and provider reachability. - Inspect unanchored certificates:
SELECT id, tenant_id, organization_id, status, anchor_chain_id, anchor_tx_hash, created_at
FROM certificates
WHERE anchor_tx_hash = ''
ORDER BY created_at DESC
LIMIT 50;
Recovery:
- Restore chain connectivity, authorized sender, and funding/permission.
- Replay
certificates.to_anchorDLQ jobs only after verifying certificate state. - Sample
/verifyfor anchored and unanchored certificates.
Public Verification Abuse
Impact:
- Increased
RATE_LIMITED,UNKNOWN_CERTIFICATE, or anomaly counts. - Potential scanning of public verification identifiers.
Triage:
- Check
/admin/api/v1/verification-logand anomaly metrics. - Confirm rate limits and captcha settings.
- Check reverse proxy logs for source distribution.
Recovery:
- Tighten edge rate limits or captcha policy.
- Keep response bodies minimal; do not add internal identifiers to support debugging.
- Export logs for SIEM review if the pattern persists.
License Or Billing Incident
Impact:
- Proofs enter
AWAITING_LICENSE. - Billing reports or exports fail.
- Consumption receipt chain may alert.
Triage:
- Check active grants, allocation hierarchy, quotas, validity dates, and revocations.
- Inspect
billing.exportsandreports.monthlyjobs. - Check report worker logs and storage permissions.
- For chain alerts, inspect maintenance worker output before changing data.
Recovery:
- Import or correct license grants with a
PLATFORM_ADMIN. - Create/adjust allocations with the tenant admin path.
- Retry affected proofs or billing export jobs.
- Preserve receipt-chain evidence for audit.