Skip to contents

Computes the mean entropy of inferred transmission trees from outbreaker2, quantifying uncertainty in who infected whom. By default, entropy is normalised to range from 0 (complete certainty) to 1 (maximum uncertainty).

Usage

get_entropy(out, normalise = TRUE)

Arguments

out

A data frame of class outbreaker_chains containing posterior samples of transmission ancestries (alpha).

normalise

Logical. If TRUE (default), entropy is normalised between 0 and 1. If FALSE, raw Shannon entropy is returned.

Value

A numeric value representing the mean entropy of transmission trees across posterior samples.

Details

Entropy measures uncertainty in inferred infectors across posterior samples. It is computed as:

$$H(X) = -\sum p_i \log p_i$$

where \(p_i\) is the proportion of times each infector is inferred for a case.

If normalise = TRUE, entropy is scaled by its maximum possible value, \( K\), where \(K\) is the number of distinct inferred infectors:

$$H^*(X) = \frac{H(X)}{\log K}$$

This ensures values range from 0 to 1, where:

  • 0 complete certainty — the same infector is inferred across all samples.

  • 1 maximum uncertainty — all infectors are equally likely.

Examples

# High entropy
out <- data.frame(alpha_1 = sample(c("2", "3"), 100, replace = TRUE),
                  alpha_2 = sample(c("1", "3"), 100, replace = TRUE))
class(out) <- c("outbreaker_chains", class(out))
get_entropy(out)
#>   alpha_1   alpha_2 
#> 0.9997114 0.9953784 

# Low entropy
out <- data.frame(alpha_1 = sample(c("2", "3"), 100, replace = TRUE, prob = c(0.9, 0.1)),
                  alpha_2 = sample(c("1", "3"), 100, replace = TRUE, prob = c(0.9, 0.1)))
class(out) <- c("outbreaker_chains", class(out))
get_entropy(out)
#>   alpha_1   alpha_2 
#> 0.5842388 0.4999160