LLM tuning
Cross-check disagreements
Two-model triage calls where the primary and secondary models disagreed on severity. Review regularly to tune the system prompt and dispatch thresholds.
| When | Primary | Secondary | Final | Δ | Escalated | Tokens (p/s) |
|---|---|---|---|---|---|---|
| Apr 9, 01:54 PM | SEV 3gpt-4o | SEV 2gpt-4o-mini | SEV 2 | 1 level | Escalated | 284 / 112 |
| Apr 9, 12:39 PM | SEV 4gpt-4o | SEV 2gpt-4o-mini | SEV 2 | 2 levels | Escalated | 311 / 126 |
| Apr 9, 11:04 AM | SEV 5gpt-4o | SEV 4gpt-4o-mini | SEV 4 | 1 level | Escalated | 245 / 98 |