Counterfactual Effects

model n_pairs label_sensitivity_rate ci95_low ci95_high
llama3.2:3b 298 0.5268456375838926 0.47315436241610737 0.5838926174496645
mistral:latest 310 0.5258064516129032 0.46766129032258064 0.5806451612903226
gemma3:4b 332 0.5030120481927711 0.44879518072289154 0.5572289156626506
qwen2.5:7b 335 0.4955223880597015 0.4417910447761194 0.5492537313432836
gemma4:latest 311 0.4212218649517685 0.3664790996784566 0.47596463022508034
phi4-mini:3.8b 330 0.41515151515151516 0.3606060606060606 0.4636363636363636
qwen3:8b 336 0.40476190476190477 0.3482142857142857 0.45535714285714285