Skip to contents

Aggregates lower-level rows to a chosen unit level, ranks units within each model, and summarizes which units most consistently appear near the top across models. This is useful for questions such as "Which books consistently have the strongest effects across models?"

Usage

summarize_top_units(
  data,
  outcome = "mean_outcome",
  item_by = "book_id",
  rank_within = NULL,
  model_col = "model",
  top_n = 3,
  higher_is_better = TRUE,
  standardize = c("z", "none", "minmax", "max"),
  include_ranks = FALSE,
  drop_missing = TRUE
)

Arguments

data

A data frame with one row per model-by-unit combination.

outcome

Character string naming the score column (default "mean_outcome").

item_by

Character vector identifying the items to rank, e.g. "book" or "book_id".

rank_within

Optional character vector defining separate ranking contexts, e.g. "party" to rank books separately within party.

model_col

Character string naming the model column (default "model").

top_n

Integer. Number of top-ranked items to count for each model.

higher_is_better

Logical. If TRUE (default), larger outcome values receive better ranks. If FALSE, smaller values receive better ranks.

standardize

Character. How to standardize item scores within each model before computing cross-model mean scores. "z" (default) centers and scales scores within model; "none" keeps raw scores; "minmax" rescales scores within model to 0–1; "max" divides scores within model by that model's maximum absolute score. Ranks are unchanged by monotonic standardization, but mean_score and point sizes in plot_top_units() use the standardized scores.

include_ranks

Logical. If TRUE, return a list with both the summary table and the model-level ranks. If FALSE (default), return only the summary table.

drop_missing

Logical. Whether to drop rows with missing model, item, or ranking-context identifiers before aggregating (default TRUE).

Value

A tibble, or a list with summary and ranks when include_ranks = TRUE.

The summary table contains:

rank_within columns

Optional grouping columns used to define separate ranking contexts, such as party.

item_by columns

The ranked item identifiers, such as book.

mean_score

Mean outcome score for the item across models.

score_scale

The score standardization method used for mean_score.

mean_rank

Average rank of the item across models. Lower values indicate more consistently high-ranked items when higher_is_better = TRUE.

overall_mean_rank

When rank_within is supplied, the item's average rank computed without those ranking contexts. This preserves a common item order for subgroup displays.

median_rank

Median rank of the item across models.

top_n_models

Number of models that ranked the item within the top top_n items in its ranking context. For example, if top_n = 3 and top_n_models = 4, then 4 models placed that item in their top 3.

n_models

Number of models with non-missing ranks for the item.

top_n

The top-N threshold used to compute top_n_models.

top_n_label

Compact display label combining top_n_models and n_models, such as "4/5".

When include_ranks = TRUE, the ranks table contains one row per model-by-item combination, including score, rank, and top_n.

Examples

if (FALSE) { # \dontrun{
summarize_top_units(
  agg,
  outcome = "mean_delta_gap",
  item_by = "book",
  rank_within = "party",
  model_col = "model",
  top_n = 3
)
} # }