LLM tuning

Cross-check disagreements

Two-model triage calls where the primary and secondary models disagreed on severity. Review regularly to tune the system prompt and dispatch thresholds.

WhenPrimarySecondaryFinalΔEscalatedTokens (p/s)
Apr 9, 01:54 PMSEV 3gpt-4oSEV 2gpt-4o-miniSEV 21 levelEscalated284 / 112
Apr 9, 12:39 PMSEV 4gpt-4oSEV 2gpt-4o-miniSEV 22 levelsEscalated311 / 126
Apr 9, 11:04 AMSEV 5gpt-4oSEV 4gpt-4o-miniSEV 41 levelEscalated245 / 98