Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.lumiqtrace.com/llms.txt

Use this file to discover all available pages before exploring further.

The Incidents page groups related problems — anomalies, error spikes, latency regressions — into a single timeline so you can investigate them as a unit. When LumiqTrace detects that multiple signals are happening at the same time and likely have a common cause, it opens an incident automatically. You can also open incidents manually for any issue you want to track to resolution.
Incident detection and management require the Team or Scale plan.

How incidents are created

LumiqTrace creates incidents automatically when:
  • An anomaly is detected (cost spike, error rate jump, latency surge) and a related alert rule fires within 15 minutes
  • Three or more traces with the same error code occur within a 10-minute window
  • An AI analysis detects a pattern across multiple traces that it classifies as a systemic issue
You can also open an incident manually from the New incident button on the Incidents page, or by clicking Open incident on any anomaly card in the AI Hub.

Incident states

Each incident moves through three states:
StateMeaning
DetectingThe platform is still gathering signals — the issue may still be developing
ActiveConfirmed ongoing issue requiring attention
ResolvedThe issue is no longer occurring
LumiqTrace auto-resolves an incident when its driving metric returns to baseline for 30 consecutive minutes. You can also resolve an incident manually.

The incidents list

The main Incidents page shows a table of all incidents, ordered by most recent. Each row shows:
  • Severityhigh, medium, or low, based on impact to cost or error rate
  • Title — a one-sentence summary of the issue
  • Affected metric — which measurement is out of range
  • State — detecting, active, or resolved
  • Duration — how long the incident has been open
  • Models affected — which model(s) are involved
Use the State filter to see only active incidents, or the Severity filter to focus on high-severity issues.

Incident detail

Click any incident to open its detail view. The detail view shows:

Timeline

A chronological feed of all signals related to this incident:
  • Anomaly detections with their explanation and severity
  • Alert rule triggers with threshold and actual value
  • Related error traces (grouped by error code)
  • Configuration changes that may have contributed (from the audit log)

Root cause summary

For incidents where an AI analysis has been run, a root cause summary appears at the top of the timeline. This is generated by running the AI root cause analyzer across the correlated traces and summarizing the findings into a coherent narrative.

Affected traces

A filtered list of the specific traces associated with this incident, with error codes, latency, and cost. Click any trace to open it in the flame graph view.

Resolution notes

A free-text field where you can record what you found and how you fixed it. Resolution notes are preserved after the incident closes and appear in the incident history. Use them to build a runbook for recurring issues.

Resolving an incident

Click Mark resolved on any active incident. You’ll be prompted to add a brief resolution note. Once resolved:
  • The incident state changes to “Resolved”
  • The resolution timestamp and note are saved
  • If the same underlying issue recurs, a new incident opens automatically — it does not reopen the closed one
Always add a resolution note before closing an incident. Future incidents of the same type will surface the previous resolution notes so your on-call engineer can see what worked before.

Auto-remediation

On the Scale plan, you can configure auto-remediation rules that LumiqPilot applies automatically when an incident of a specific type opens. For example:
  • “When a GPT-4o error spike incident opens, switch to GPT-4o-mini”
  • “When a rate-limit incident opens, reduce sample rate to 50%”
Auto-remediation rules are configured in Settings → Incidents and require explicit approval from an Owner-level user to activate.
Auto-remediation applies SDK config changes without human confirmation. Only enable it for actions you have validated are safe to apply automatically. All auto-remediation actions are logged in the audit trail.

Notifications

Incidents trigger the same notification channels as alert rules — email for Pro/Team, and webhooks for Scale. If an incident is opened while an alert for the same metric is active, LumiqTrace deduplicates the notification so you do not receive duplicate alerts.