Skip to contents

Computes the mean and standard deviation of an outcome across simulation replicates within each model-by-unit cell. This is the recommended first step before computing inter-model agreement: it collapses intra-model sampling noise so that downstream metrics reflect genuine model differences rather than Monte Carlo variance.

Usage

aggregate_simulations(
  data,
  outcome = "outcome",
  by = c("model", "book_id", "chapter_id", "group")
)

Arguments

data

A data frame with one row per simulation run, containing columns for model identity, unit identifiers, and the outcome variable.

outcome

Character string naming the outcome column to aggregate (default "outcome").

by

Character vector of column names to group by. Must include a column identifying the model (typically "model"). Default c("model", "book_id", "chapter_id", "group").

Value

A tibble with one row per unique combination of by, plus:

mean_{outcome}

Mean of outcome across simulation runs.

sd_{outcome}

Standard deviation across runs.

n_sims

Number of simulation replicates in the cell.

Examples

sim_data <- data.frame(
  model = rep(c("gpt-4o", "gemini-2.5-flash"), each = 40),
  book_id = rep("BookA", 80),
  chapter_id = rep(paste0("ch", 1:4), each = 10, times = 2),
  group = rep(c("Democrat", "Republican"), 40),
  sim = rep(1:10, 8),
  rating = rnorm(80, 60, 10)
)
aggregate_simulations(sim_data, outcome = "rating",
  by = c("model", "book_id", "chapter_id", "group"))
#> # A tibble: 16 × 7
#>    model            book_id chapter_id group      mean_rating sd_rating n_sims
#>    <chr>            <chr>   <chr>      <chr>            <dbl>     <dbl>  <int>
#>  1 gemini-2.5-flash BookA   ch1        Democrat          61.3      1.88      5
#>  2 gemini-2.5-flash BookA   ch1        Republican        61.1     17.3       5
#>  3 gemini-2.5-flash BookA   ch2        Democrat          56.1     11.5       5
#>  4 gemini-2.5-flash BookA   ch2        Republican        66.4     11.8       5
#>  5 gemini-2.5-flash BookA   ch3        Democrat          65.1      9.80      5
#>  6 gemini-2.5-flash BookA   ch3        Republican        53.1      7.02      5
#>  7 gemini-2.5-flash BookA   ch4        Democrat          63.9     11.5       5
#>  8 gemini-2.5-flash BookA   ch4        Republican        61.7      2.69      5
#>  9 gpt-4o           BookA   ch1        Democrat          49.4     12.3       5
#> 10 gpt-4o           BookA   ch1        Republican        61.7      5.86      5
#> 11 gpt-4o           BookA   ch2        Democrat          64.1     10.7       5
#> 12 gpt-4o           BookA   ch2        Republican        52.3     10.5       5
#> 13 gpt-4o           BookA   ch3        Democrat          58.6     13.2       5
#> 14 gpt-4o           BookA   ch3        Republican        58.9      8.54      5
#> 15 gpt-4o           BookA   ch4        Democrat          58.1     10.6       5
#> 16 gpt-4o           BookA   ch4        Republican        64.9      8.21      5