Incidents — detect, correlate, and resolve LLM issues

The Incidents page groups related problems — anomalies, error spikes, latency regressions — into a single timeline so you can investigate them as a unit. When LumiqTrace detects that multiple signals are happening at the same time and likely have a common cause, it opens an incident automatically. You can also open incidents manually for any issue you want to track to resolution.

Incident detection and management require the Team or Scale plan.

How incidents are created

LumiqTrace creates incidents automatically when:

An anomaly is detected (cost spike, error rate jump, latency surge) and a related alert rule fires within 15 minutes
Three or more traces with the same error code occur within a 10-minute window
An AI analysis detects a pattern across multiple traces that it classifies as a systemic issue

You can also open an incident manually from the New incident button on the Incidents page, or by clicking Open incident on any anomaly card in the AI Hub.

Incident states

Each incident moves through three states:

State	Meaning
Detecting	The platform is still gathering signals — the issue may still be developing
Active	Confirmed ongoing issue requiring attention
Resolved	The issue is no longer occurring

LumiqTrace auto-resolves an incident when its driving metric returns to baseline for 30 consecutive minutes. You can also resolve an incident manually.

The incidents list

The main Incidents page shows a table of all incidents, ordered by most recent. Each row shows:

Severity — high, medium, or low, based on impact to cost or error rate
Title — a one-sentence summary of the issue
Affected metric — which measurement is out of range
State — detecting, active, or resolved
Duration — how long the incident has been open
Models affected — which model(s) are involved

Use the State filter to see only active incidents, or the Severity filter to focus on high-severity issues.

Incident detail

Click any incident to open its detail view. The detail view shows:

Timeline

A chronological feed of all signals related to this incident:

Anomaly detections with their explanation and severity
Alert rule triggers with threshold and actual value
Related error traces (grouped by error code)
Configuration changes that may have contributed (from the audit log)

Root cause summary

For incidents where an AI analysis has been run, a root cause summary appears at the top of the timeline. This is generated by running the AI root cause analyzer across the correlated traces and summarizing the findings into a coherent narrative.

Affected traces

A filtered list of the specific traces associated with this incident, with error codes, latency, and cost. Click any trace to open it in the flame graph view.

Resolution notes

A free-text field where you can record what you found and how you fixed it. Resolution notes are preserved after the incident closes and appear in the incident history. Use them to build a runbook for recurring issues.

Resolving an incident

Click Mark resolved on any active incident. You’ll be prompted to add a brief resolution note. Once resolved:

The incident state changes to “Resolved”
The resolution timestamp and note are saved
If the same underlying issue recurs, a new incident opens automatically — it does not reopen the closed one

Always add a resolution note before closing an incident. Future incidents of the same type will surface the previous resolution notes so your on-call engineer can see what worked before.

Auto-remediation

On the Scale plan, you can configure auto-remediation rules that LumiqPilot applies automatically when an incident of a specific type opens. For example:

“When a GPT-4o error spike incident opens, switch to GPT-4o-mini”
“When a rate-limit incident opens, reduce sample rate to 50%”

Auto-remediation rules are configured in Settings → Incidents and require explicit approval from an Owner-level user to activate.

Auto-remediation applies SDK config changes without human confirmation. Only enable it for actions you have validated are safe to apply automatically. All auto-remediation actions are logged in the audit trail.

Notifications

Incidents trigger the same notification channels as alert rules — email for Pro/Team, and webhooks for Scale. If an incident is opened while an alert for the same metric is active, LumiqTrace deduplicates the notification so you do not receive duplicate alerts.

Overview

Traces & Sessions

Agents

Costs

Quality

Reliability

AI Features

Prompts & Tools

Performance

Incidents — detect, correlate, and resolve LLM issues

How incidents are created

Incident states

The incidents list

Incident detail

Timeline

Root cause summary

Affected traces

Resolution notes

Resolving an incident

Auto-remediation

Notifications

Overview

Traces & Sessions

Agents

Costs

Quality

Reliability

AI Features

Prompts & Tools

Performance

Documentation Index

​How incidents are created

​Incident states

​The incidents list

​Incident detail

​Timeline

​Root cause summary

​Affected traces

​Resolution notes

​Resolving an incident

​Auto-remediation

​Notifications

How incidents are created

Incident states

The incidents list

Incident detail

Timeline

Root cause summary

Affected traces

Resolution notes

Resolving an incident

Auto-remediation

Notifications