Simulations let you run your LLM configuration against a prepared dataset of test cases before deploying a change. Instead of finding out a prompt change broke something in production, you run a simulation, review the outputs and evaluation scores, and only promote the change when you’re confident it behaves correctly.Documentation Index
Fetch the complete documentation index at: https://docs.lumiqtrace.com/llms.txt
Use this file to discover all available pages before exploring further.
Simulations require the Team or Scale plan.
Core concepts
Dataset
A collection of test cases. Each case has an input (the user message or prompt variables) and optionally an expected output or reference answer.
Scenario
A test configuration: which dataset to use, which prompt version to test, which evaluators to run, and which model to call.
Simulation run
One execution of a scenario — the platform runs each dataset item through your LLM configuration and collects outputs and scores.
Batch run
Multiple simulation runs executed in parallel, typically used to compare different prompt versions side by side.
Creating a dataset
Before running simulations, you need a dataset of test cases.Click Datasets, then New dataset
Name your dataset (e.g.,
support-test-cases) and add a description.Add test items
Each item has:
- Input — the user message or template variables to inject into your prompt
- Expected output (optional) — a reference answer for similarity scoring
- Tags — optional labels for grouping items
Creating a scenario
Configure the prompt
Either select a prompt from your prompt library (by name and version/label) or paste a prompt directly into the editor.
Add evaluators
Attach one or more evaluators to score the outputs. You can use any evaluator defined on the Evaluations page, plus built-in ones:
| Evaluator | What it measures |
|---|---|
exact-match | Whether the output exactly matches the expected value |
contains | Whether the output contains a required substring |
similarity | Semantic similarity to the expected output (0–1) |
length | Whether the output length is within a specified range |
| Custom | Any LLM-judge evaluator you’ve defined |
Running a simulation
Click Run now on any scenario to start a simulation run. LumiqTrace:- Iterates over every item in the dataset
- Calls your chosen model with the prompt + item input
- Records the response, latency, token count, and cost
- Runs each configured evaluator on the output
- Aggregates results into a run summary
Batch runs — comparing versions
A batch run executes the same dataset against multiple configurations simultaneously, making it easy to compare prompt versions head-to-head.Click New batch run
Select two or more scenarios (or one scenario with multiple prompt version variants).
Reading run results
Each simulation run’s detail view shows:- Summary cards — average score per evaluator, total cost, average latency
- Per-item table — each dataset item with its output, scores, and a link to the full trace
- Score distribution — histogram of score spread across items
- Failed items — items where the model returned an error or a score below threshold
Plan limits
| Plan | Datasets | Scenarios | Monthly simulation runs |
|---|---|---|---|
| Free | None | — | — |
| Pro | 3 | 5 | 500 items total |
| Team | Unlimited | Unlimited | 10,000 items/month |
| Scale | Unlimited | Unlimited | Unlimited |