Model Summary
| model | n | parse_success_rate | parse_success_ci95 | parse_fallback_rate | parse_failure_rate | avg_latency_ms | p95_latency_ms | center_selection_rate | center_selection_ci95 | instability_score | label_sensitivity_rate |
|---|---|---|---|---|---|---|---|---|---|---|---|
| qwen2.5:7b | 1344 | 99.93% | [99.78%, 100.00%] | 0.00% | 0.07% | 11668 | 14992 | 38.35% | [35.82%, 40.88%] | 0.628 | 49.55% |
| qwen3:8b | 1344 | 99.93% | [99.78%, 100.00%] | 0.00% | 0.07% | 9494 | 11139 | 30.68% | [28.29%, 33.06%] | 0.610 | 40.48% |
| gemma4:latest | 1248 | 99.92% | [99.76%, 100.00%] | 0.00% | 0.08% | 2471 | 2837 | 41.30% | [38.65%, 43.95%] | 0.558 | 42.12% |
| mistral:latest | 1248 | 99.60% | [99.28%, 99.92%] | 0.00% | 0.40% | 3036 | 3576 | 49.32% | [46.50%, 52.13%] | 0.593 | 52.58% |
| gemma3:4b | 1344 | 99.33% | [98.88%, 99.70%] | 0.00% | 0.67% | 5964 | 6831 | 40.97% | [38.50%, 43.45%] | 0.613 | 50.30% |
| phi4-mini:3.8b | 1344 | 96.58% | [95.61%, 97.54%] | 2.38% | 1.04% | 2189 | 2717 | 37.22% | [34.73%, 40.00%] | 0.619 | 41.52% |
| llama3.2:3b | 1344 | 92.93% | [91.67%, 94.27%] | 0.00% | 7.07% | 1572 | 1993 | 41.15% | [38.59%, 43.96%] | 0.625 | 52.68% |