Computes the mean entropy of transmission trees from outbreaker2
, quantifying uncertainty in inferred infectors.
By default, entropy is normalised between 0 (complete certainty) and 1 (maximum uncertainty).
Details
Entropy quantifies uncertainty in inferred infectors across posterior samples using the Shannon entropy formula:
$$H(X) = -\sum p_i log(p_i)$$
where \(p_i\) is the proportion of times each infector is inferred. If normalise = TRUE
, entropy is scaled by its maximum possible value, \(log(K)\), where \(K\) is the number of distinct inferred infectors:
$$H^*(X) = \frac{H(X)}{log(K)}$$
This normalisation ensures values range from 0 to 1:
0: Complete certainty — the same infector is inferred across all samples.
1: Maximum uncertainty — all infectors are equally likely.
Examples
# High entropy
out <- data.frame(alpha_1 = sample(c("2", "3"), 100, replace = TRUE),
alpha_2 = sample(c("1", "3"), 100, replace = TRUE))
class(out) <- c("outbreaker_chains", class(out))
get_entropy(out)
#> alpha_1 alpha_2
#> 0.9997114 0.9953784
# Low entropy
out <- data.frame(alpha_1 = sample(c("2", "3"), 100, replace = TRUE, prob = c(0.95, 0.05)),
alpha_2 = sample(c("1", "3"), 100, replace = TRUE, prob = c(0.95, 0.05)))
class(out) <- c("outbreaker_chains", class(out))
get_entropy(out)
#> alpha_1 alpha_2
#> 0.3659237 0.4021792