Concepts

Risk engine

Per-sign-in risk scoring with operator-tunable signal weights and thresholds. Drives step-up auth and outright blocks; surfaces every decision in the dashboard.

On every primary authentication, Authio runs the request through a deterministic, additive risk model. Each enabled signal contributes a weighted score; the sum lands in [0, 100]. The two thresholds threshold_step_up and threshold_block split that range into three regions:

score < threshold_step_up   → allow      (mint a session)
score ≥ threshold_step_up   → step-up    (persist a step_up_challenges row, no session)
score ≥ threshold_block     → block      (403 blocked_by_risk_policy, no session)

Every evaluation — including the trivial allowed clean sign-in — writes one row to risk_decisions and one auth.risk_evaluated event to audit_events. The dashboard's /risk page reads from risk_decisions; webhook subscribers and audit-stream destinations pick up auth.risk_evaluated in real time.

Signals

Signal names are stable identifiers — they double as JSONB keys in risk_policies.signal_weights and as the strings you see in risk_decisions.signals.fired. Defaults below match what a brand-new project gets.

SignalDefault weightFires when…
impossible_travel40Two recent sign-ins from different countries that no commercial flight can connect in the elapsed time (currently ≤ 60 minutes).
new_device15UA-derived device fingerprint never seen for this user before.
new_country25First sign-in from a country this user hasn't used.
new_ip_block10First sign-in from this /24 (IPv4) or /48 (IPv6).
headless_ua30UA string matches a known automation harness (Headless Chrome, Puppeteer, Playwright, Selenium, PhantomJS, SlimerJS).
velocity_burst20≥ 10 sign-in attempts for this user in the last 5 minutes.
tor_exit35IP appears on the published Tor exit list (daily refresh).
datacenter_ip20IP belongs to a known hosting / data-center range (AWS, GCP, OVH, Hetzner, …).
known_bad_ip75IP is on an active threat-intel feed (botnet C2, mass-scan source).
breached_email20Email appears in a public credential-leak corpus (HIBP-style lookup).
bot_score_high35Upstream bot detection (Cloudflare Bot Management, Datadome) returned a score > 70.
stale_session10Refresh attempted after the session has been idle longer than the project's stale-window. Not enabled by default.

Default thresholds

  • threshold_step_up = 50 — common reason: new device or new country alone won't step up, but impossible_travel (40) plus new_device (15) will (= 55).
  • threshold_block = 90 — picked so that a single extreme signal (a hit on known_bad_ip at 75) does NOT block on its own, but the combination of two strong signals does. Use blocks sparingly; step-up is almost always a better UX.
Use Test policy at the bottom of /risk/policy to validate a new threshold before saving it. The widget shows the exact per-signal contribution breakdown.

Block vs. step-up

Step-up means “I'm not sure about you, but a fresh second factor convinces me.” The user re-asserts a passkey or clicks a magic link, the step_up_challenges row is consumed atomically, and a session is minted from the original primary-auth context. UX cost: one extra page; trust cost: minimal.

Block means “no path to a session, full stop.” The user sees a 403 with a recovery hint. Reserve this for signals that are diagnostic of automation/abuse, not for signals that just mean “unusual but plausible.” In practice the only signals strong enough to push a single attempt over the default threshold_block of 90 are known_bad_ip stacked with anything else.

Tuning the policy

Edit /risk/policy in the dashboard. The form posts to PUT /v1/risk/policy; auth-core caches policy rows for 60 seconds in-process, so a change rolls out to every replica within that window. The dashboard surfaces a “Updated; takes effect within 60 seconds” toast on save.

Tuning workflow:

  1. Watch /risk for a few days. Note the step-up % and block %.
  2. If step-up % is too high (users complaining of friction), bump threshold_step_up by 5–10, or disable a noisy signal entirely.
  3. If block % is non-zero on legitimate users, drop the weight on whatever signal is dominating their decisions (see the per-decision detail page).
  4. If you want a single signal to be aspirational rather than contributing (e.g. you're still validating a new bot-score source), set its weight to 0 — Authio still records the signal fired in risk_decisions.signals, so you can retroactively count "what would have happened if I'd set this to 30?"

What gets persisted

Per evaluation, three things happen:

  1. risk_decisions row appended with id = rsk_…, score, decision, full signal breakdown, IP / country / UA / device, and (when a step-up was created) the linked challenge_id.
  2. auth.signin_attempt audit event for back-compat with existing audit-stream subscribers.
  3. auth.risk_evaluated audit event carrying the decision id — used by the dashboard's audit log filter pills.

Bytes-on-disk is around 1–2 KB per decision. At 10 sign-ins/sec sustained that's ~25 MB/day, well inside the partition rotation window.

Geo-policy interaction

Authio also ships an operator-defined geo policy that runs before the risk score. The contract is deliberately simple:

  1. The geo gate evaluates the request country against project_geo_policies. If it returns block, the request is denied with HTTP 403 { error: "blocked_by_geo_policy", country: "XX" } immediately — the risk score is never computed.
  2. A matching travel grant on the user demotes a would-be block to allow and stamps geo_grant_used: tgt_… on the risk_decisions row. The risk engine then runs as normal — so adaptive MFA can still demand a passkey step-up if other signals fire (new device, impossible travel, etc.).
  3. When the geo policy is in alert-only mode, a country trigger contributes the country_in_policy_alert signal (default weight 20) to the score instead of hard-blocking. Combined with the existing thresholds this means a previously-clean user from a flagged country gets step-up, not 403.

In short: the geo block is a fast hard gate; the risk engine is the nuanced soft gate; travel grants only relax the hard gate. The two systems share a country source (auth-core's in-memory MaxMind / DB-IP resolver) so they never disagree on which country an IP is in.

See also

  • Geo policy — country blocking + travel grants, evaluated before the risk score.
  • Audit log — every risk evaluation also lands in audit_events for export and streaming.
  • Passwordless auth methods — the primary-auth flows that feed the risk engine.