LLM tuning

Cross-check disagreements

Two-model triage calls where the primary and secondary models disagreed on severity. Review regularly to tune the system prompt and dispatch thresholds.

When	Primary	Secondary	Final	Δ	Escalated	Tokens (p/s)
Apr 9, 01:54 PM	SEV 3gpt-4o	SEV 2gpt-4o-mini	SEV 2	1 level	Escalated	284 / 112
Apr 9, 12:39 PM	SEV 4gpt-4o	SEV 2gpt-4o-mini	SEV 2	2 levels	Escalated	311 / 126
Apr 9, 11:04 AM	SEV 5gpt-4o	SEV 4gpt-4o-mini	SEV 4	1 level	Escalated	245 / 98