Concepts
Observability
OpenTelemetry traces across every Authio service — from the hosted-UI request all the way through Postgres.
Every Authio service (auth-core, sso, scim, fga, audit, webhooks, management-api, dashboard, hosted-ui, marketing) ships with OpenTelemetry tracing enabled. By default, spans are written as JSON lines to stdout — Railway captures stdout, so trace IDs appear alongside the rest of the service logs. Flip a few env vars to forward traces to your own backend.
The trace graph
A typical end-user sign-in produces one logical trace that spans five or more services:
hosted-ui GET /
└─ POST /v1/auth/passkey/login/begin (auth-core)
└─ SELECT * FROM webauthn_credentials (pgxpool)
└─ POST /v1/auth/passkey/login/finish (auth-core)
├─ SELECT * FROM users
├─ INSERT INTO sessions
└─ INSERT INTO audit_events
└─ webhook-worker poll
└─ POST https://customer.example.com/webhook (otelhttp.NewTransport)Service-to-service calls all propagate W3C traceparent, so a single trace ID stitches the request together. Customers can receive that trace ID on inbound webhook deliveries via the traceparent header and continue the trace in their own backend.
Default exporter: stdout
In production, services default to OTEL_EXPORTER=stdout. Each span is a single JSON line written to stdout, which Railway's log viewer surfaces alongside the rest of the service logs. This gives operators "trace IDs in logs" for free with zero extra infrastructure.
X-Request-Id response header to find every span the request produced.Wiring up an OTLP backend
Authio works with any OTLP/HTTP-compatible backend. Set these env vars on every Authio service that you want to forward from (typically all of them):
OTEL_EXPORTER=otlp
OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp-gateway.example.com
OTEL_EXPORTER_OTLP_HEADERS=Authorization=Bearer abc...,X-Org-Id=42
OTEL_SERVICE_NAME=authio_auth-core # optional override
OTEL_SAMPLER_RATIO=1.0 # optional, 0.0–1.0Grafana Cloud
OTEL_EXPORTER=otlp
OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp-gateway-<region>.grafana.net/otlp
OTEL_EXPORTER_OTLP_HEADERS=Authorization=Basic <base64(instance_id:token)>Honeycomb
OTEL_EXPORTER=otlp
OTEL_EXPORTER_OTLP_ENDPOINT=https://api.honeycomb.io
OTEL_EXPORTER_OTLP_HEADERS=x-honeycomb-team=<api-key>,x-honeycomb-dataset=authioDatadog
OTEL_EXPORTER=otlp
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector.example.com:4318
# Run Datadog's OTLP receiver as a collector in front of the agent.What's instrumented
- HTTP servers — every Go service uses
otelhttp.NewHandler; the TypeScript management-api uses a Hono middleware that opens a SERVER span per request and tags it withproject_id,api_key_id, andhttp.status_code. - HTTP clients — service-to-service calls use
otelhttp.NewTransport(Go) and a span+propagation wrapper (TS) so the trace continues across the wire. - Postgres — Go services use
otelpgx; the TS management-api wraps thepostgrestagged template via Proxy. Every query becomes a CLIENT child span withdb.system=postgresqland a redacteddb.statement. - Webhook deliveries — the webhooks worker wraps its outbound HTTP client with
otelhttp.NewTransport, so each delivery is a span with the customer endpoint URL and HTTP status as attributes. Trace context propagates to the receiver viatraceparent.
What's not instrumented (yet)
- Edge-runtime requests in Next.js services (the OTel SDK doesn't support Workers/edge yet — those requests still emit access logs but no spans).
- Vitest test runs (intentionally — we don't want CI noise in aggregated trace counts).
- Build-time work (no spans during
next buildorgo build).
Sampling
By default Authio uses parent-based always-on sampling so every request produces spans. For high-traffic projects, drop the ratio to keep cardinality manageable:
OTEL_SAMPLER_RATIO=0.1 # keep 10% of root tracesChild spans always inherit their parent's sampling decision, so a 1.0 incoming request will produce a full trace regardless of the ratio set here.
See also
- Live status page → — synthetic uptime, latency, and active incidents for every Authio service, refreshed every 30 seconds. The numbers there come from the same Postgres tables your OTel traces correlate against, so an incident on the status page lines up with the spike in your dashboard.