API: Diagnostics

Aggregation reliability

Estimate whether an aggregated mean(g) context signal is trustworthy.

Parameters:

Name	Type	Description	Default
`min_effective_n`	`float`	Effective-N threshold below which aggregation is flagged untrustworthy. Defaults to `200` -- comfortably above the `N ~ 50` regime that scored near-chance and well below the `N ~ 10,000` regime that saturated, while leaving headroom for the autocorrelation discount.	`200.0`
`max_autocorr`	`float`	Median within-context lag-1 autocorrelation above which dependence is judged high enough to undermine the i.i.d. assumption.	`0.2`

Notes

The thresholds are conservative defaults derived from the empirical reference points (see :data:REFERENCE_POINTS); tune them for your domain. This guard is the explicit remedy for silently exposing context_uncertainty = mean(g).

`analyze(g, groups)`

Compute per-context N_eff / autocorrelation and a trustworthiness verdict.

Parameters:

Name	Type	Description	Default
`g`	`ArrayLike`	Per-item epistemic estimate `g(x_i)`.	required
`groups`	`ArrayLike`	Per-item context label (e.g. date). Items within a group are assumed to be in natural (temporal) order.	required

`aggregate(g, groups, *, warn=True)`

Return (context_labels, mean_g_per_context, verdict).

Emits a :class:UserWarning when the aggregate is judged untrustworthy (unless warn=False). This is the guarded alternative to a bare mean(g) API.

Outcome of an aggregation-reliability check.

Autocorrelation-discounted effective sample size.

Uses the standard lag-1 AR(1) inflation factor N_eff = N * (1 - rho) / (1 + rho) where rho is the lag-1 autocorrelation. Independent data (rho ~ 0) gives N_eff ~ N; strong positive dependence (rho -> 1) shrinks N_eff toward 1. The order of values matters: pass them in their natural (e.g. temporal) order.

Parameters:

Name	Type	Description	Default
`values`	`ArrayLike`	The within-context per-item signal in natural order.	required
`n`	`int \| None`	Override the raw count (defaults to `len(values)`).	`None`

Convenience: return a trust verdict (+ reason) for mean(g) aggregation.

Thin wrapper over :class:AggregationReliability for one-off checks.

Composite health index

Fuse complementary component signals into one context-reliability scalar.

Parameters:

Name	Type	Description	Default
`components`	`list[tuple[str, ComponentFn]] \| None`	List of `(name, fn)` pairs. Each `fn(idx, arrays)` returns one scalar per context where higher = worse (more unhealthy). Defaults to the three signals from Finding 2 (realized loss, drift PSI, model disagreement); supply your own to extend or replace them.	`None`
`weights`	`ArrayLike \| None`	Optional per-component weights (defaults to equal). Length must match `components`.	`None`
`threshold`	`float`	Health-score gating threshold in `[0, 1]`; contexts at or above it are "trustworthy / trade". Default `0.5`.	`0.5`

Notes

Component values are z-scored across contexts (so heterogeneous scales combine sensibly), summed with weights into a "badness" score, then mapped to a [0, 1] health score via min-max with health = 1 - normalized_badness. This is the low-N/non-i.i.d. remedy and is intended to stay off the high-N i.i.d. default path (where individual-level g already saturates).

`compute(groups, arrays)`

Compute per-context health from the provided per-item arrays.

Parameters:

Name	Type	Description	Default
`groups`	`ArrayLike`	Per-item context labels.	required
`arrays`	`dict[str, ArrayLike]`	Dict of per-item arrays needed by the components (e.g. `loss`, `feature` + `feature_reference`, `disagreement`). Reference arrays (keys ending in `_reference`) are passed through unindexed.	required

Per-context health scores and gating verdicts.

`verdict(label)`

Gate decision for a single context label.

Mean realized loss in the context (higher = worse). Requires arrays['loss'].

Population Stability Index of the context feature vs. a reference distribution.

Requires arrays['feature'] (per-item scalar feature) and arrays['feature_reference'] (1-D reference sample). Higher PSI = more drift.

Mean ensemble/model disagreement in the context. Requires arrays['disagreement'].