RootCauseunvalidated
raw p is heavily reweighted before it matters; what counts is whether p is CALIBRATED — a noisy-OR merge with a structural prior, then a discount on model-only items (0.6 for entities, 0.8 for relations). Tension: A set-overlap macro-F1/recall gate has no notion of calibration. Outcome: a predicted 0.7 is right ~70% of the time post-merge.
bd6bb656-dc27-4ca2-9d12-fd49d208d7fd
raw p is heavily reweighted before it matters; what counts is whether p is CALIBRATED — a noisy-OR merge with a structural prior, then a discount on model-only items (0.6 for entities, 0.8 for relations). Tension: A set-overlap macro-F1/recall gate has no notion of calibration. Outcome: a predicted 0.7 is right ~70% of the time post-merge.